Rakshit Sakhuja

Rakshit Sakhuja is a senior data scientist based in India, with most of his work in machine learning and NLP. He is also pursuing a master’s from the Indian Institute of Technology, Hyderabad. He has spent the last four years working in data-driven companies and is currently engaged in semantic search systems. His research focuses on few-shot object detection in the computer vision domain using transfer learning and meta-learning.

projects by Rakshit Sakhuja

Building Domain-Specific Language Models

4 weeks · 8-10 hours per week · ADVANCED

Included with a Manning Online subscription

catalog / Data Science

In this liveProject, you’ll step into the role of a natural language processing data scientist working for Stack Exchange. Stack Exchange runs a network of question-and-answer sites on diverse topics ranging from programming to cooking. Your boss wants you to create language models that are tuned to the statistical, probabilistic, and technical jargon present in different Stack Exchange sites.

Language is domain-specific—an insurance company’s documents will use very different terminology than a post on a social media site. Because of this, off-the-shelf NLP models trained on generic text can be inaccurate for specialized domains such as healthcare, legal, clinical, and agricultural language. Your goal is to build a language model capable of query completion and larger text generation for Stack Exchange sites. At the end of this project, you will be able to build the foundations of any domain-specific NLP system by creating a robust and efficient language model using statistical and deep learning techniques.

Updated: March 2022

Fully updated to the latest version of AllenNLP
Improved GPU compatibility for training larger models
New help layers with detailed hints and guidance
New preprocessing steps for data preparation
Adjusted prerequisites and libraries