This title has been retired and is no longer for sale.
In this liveProject, you’ll take on the role of a data scientist working for an online movie streaming service. Your bosses want a machine learning model that can analyze written customer reviews of your movies, but you discover that the data is biased towards negative reviews. Training a model on this imbalanced data would hurt its accuracy, and so your challenge is to create a balanced dataset for your model to learn from. You’ll start by simulating your company’s data by deliberately introducing imbalance to an IMDB (Internet Movie Database) review dataset. You’ll experiment with two different methods for balancing this dataset: using sampling techniques, and generating a new synthetic corpus with deep learning text generation. You’ll build and train a simple machine learning model on each dataset to compare the effectiveness of each approach.
This project is designed for learning purposes and is not a complete, production-ready application or solution.
When you start your liveProject, you get full access to the following books for 90 days.
This liveProject is for intermediate-to-experienced Python programmers. To begin this liveProject, you will need to have hands-on experience with or be familiar with:
- Basics of scikit-learn
- Intermediate TensorFlow 2.0 and/or Keras
- Intermediate NumPy
- Intermediate pandas
- Fundamental statistics for classification
- Basics of gradient descent and SGD
- Basics of loss functions
- Basics of back-propagation
- Basics of overfitting and underfitting
- Basics of kNN
- Basics of Gradient Boosted Decision Trees/GBM
- Basics of classification techniques such as Logistic Regression or SVM
- Intermediate knowledge of neural networks such as RNN, CNN, and GRU
- Basics of comparing classifiers
- Basics of clustering such as Affinity Propagation and Hierarchical Clustering
- Intermediate knowledge of Natural Language Processing concepts, including embedding, tokenization at word or character level, basic one-hot encoding, and basic handling out-of-vocabulary tokenization
- Intermediate knowledge of activation functions for ANNs, such as softmax, sigmoid, and RELU
- Intermediate knowledge of Dropout, Maxpool, and Regularization
- Intermediate knowledge of multi-layer perceptron
you will learn
In this liveProject, you’ll develop natural language processing skills for machine learning models that can determine the sentiment and meaning of raw text. You’ll also learn useful and easily transferable ML techniques to help classify NLP patterns at scale.
- Commonly used text processing/cleansing techniques
- Recommended statistics for model performance and misclassification cost
- Data balance through sampling
- Generating new corpus with deep learning
- Training and testing a deep learning model to classify text
- You choose the schedule and decide how much time to invest as you build your project.
- Project roadmap
- Each project is divided into several achievable steps.
- Get Help
- While within the liveProject platform, get help from other participants and our expert mentors.
- Compare with others
- For each step, compare your deliverable to the solutions by the author and other participants.
- book resources
- Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.