Building Domain Specific Language Models

N-gram model, RNN, LSTM, AllenNLP
Alexis Perrier
4 weeks · 8-10 hours per week
In this liveProject, you’ll step into the role of a natural language processing data scientist working for Stack Exchange. Stack Exchange runs a network of question-and-answer sites on diverse topics ranging from programming to cooking. Your boss wants you to create language models that are tuned to the particular vocabulary of different Stack Exchange sites. Language is domain specific, for example an insurance company’s documents will use very different terminology than a post on a social media site. Because of this, off-the-shelf NLP models trained on generic text can be inaccurate for specialized domains. Your goal is to build a language model capable of query completion, text generation, and sentence selection for the domain-specific language of the Cross Validated statistics and machine learning site. Challenges you will tackle include preparing your datasets, building and evaluating n-gram word-based language models, and building a character-based language model with AllenNLP.

project author

Alexis Perrier
Alexis Perrier is a data science consultant specialized in predictive modeling and natural language processing. He holds a master?s in probabilities from Sorbonne Universités and a PhD in signal processing from Telecom Paris. He is the author of several books and online courses on data science.


This course is for proficient Python programmers who have experience with text-based machine learning. This course uses Python 3.7. It is recommended that you use the Anaconda distribution of Python and conda for managing the libraries. To begin this liveProject, you will need to be familiar with:

  • Basics of NumPy
  • Basics of pandas
  • Intermediate NLTK
  • Basics of creating neural networks with PyTorch, TensorFlow, or Keras
  • Basics of recurrent neural networks and LSTMs
  • Basics of word embeddings
  • Intermediate seq2seq models, algebra and probabilities, such as matrix manipulation, chain rule, and independence

you will learn

In this liveProject, you’ll learn to build a domain-focused language model using deep learning. You’ll develop skills in Python scripting, and neural networks creation and training. At the end of this project, you will be able to build a foundation for any domain specific NLP system by creating specialized, robust and efficient language models.

  • Python scripting, including object oriented programming
  • Data manipulation with NumPy and pandas
  • Text preprocessing such as pattern removal with regular expressions, text manipulation, and tokenization with NLTK
  • Designing and training recurrent neural networks with PyTorch
  • Scoring and evaluating language models for different tasks
  • Summarizing findings of a data science project effectively


You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Peer support
Chat with other participants within the liveProject platform.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
Book and video resources
Excerpts from Manning books and videos are included, as well as references to other resources.

project outline


Prerequisites Test

Get Started

1. Loading and Preparing the Dataset

1.1. Loading and Preparing the Dataset

Regular Expressions


1.2. Submit Your Work

2. N-gram Language Model

2.1. N-gram Language Model

Building Your Vocabulary with a Tokenizer

2.2. Submit Your Work

3. Deep Learning Language Model

3.1. Deep Learning Language Model

Deep Learning for Text and Sequences

Sequential NLP and Memory

3.2. Submit Your Work

4. Character-based Language Model with AllenNLP

4.1. Character-based Language Model with AllenNLP

Sequential Labeling and Language Modeling

4.2. Submit Your Work


Project Conclusions


placing your order...

Don't refresh or navigate away from the page.
liveProject $35.00 $50.00 self-paced learning
Building Domain Specific Language Models (liveProject) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.