In this series of liveProjects, you’ll take on the role of a data scientist working for an online movie streaming service. Your bosses want a machine learning model that can analyze written customer reviews of your movies, but you discover that the data is biased towards negative reviews. Training a model on this imbalanced data would hurt its accuracy, and so your challenge is to create a balanced dataset for your model to learn from. You'll collect your company’s data by deliberately introducing imbalance to an IMDb (Internet Movie Database) review dataset, use a sampling technique to balance the dataset, then build a machine learning model from the dataset.
This liveProject is for Python programmers interested in common tools for encoding data for NLP. To begin this liveProject, you will need to be familiar with the following:
In this liveProject, you’ll learn the basics of encoding and decoding techniques. These are common techniques for solving natural language processing (NLP) problems.