ML Feature Engineering

Train and Score with Raw Data you own this product

This project is part of the liveProject series ML Feature Engineering and Modeling using Python
intermediate Python and scikit-learn • basics of Jupyter Notebook, pandas, and SQL
skills learned
classification with logistic regression • ML pipelines with scikit-learn • generating model pickle files with Joblib
Jayesh Patel
1 week · 8-10 hours per week · BEGINNER

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!


5, 10 or 20 seats+ for your team - learn more

Look inside

In this liveProject, you’ll train and evaluate a machine learning model for diagnosing diabetes, and set up a pipeline for your model to run effectively. You’ll start by exploring sample data, processing features, and performing common feature engineering techniques for treating outliers or missing data. After dividing your dataset into training and testing data, you’ll train a logistic regression model using scikit-learn. You will then retrain the model with a different set of features. Finally, you’ll pick a model for scoring and build a scoring pipeline. You will test your scoring process on a scoring dataset.

This project is designed for learning purposes and is not a complete, production-ready application or solution.

book resources

When you start your liveProject, you get full access to the following books for 90 days.

project author

Jayesh Patel
Jayesh Patel is a strategic big data leader and proven architect who successfully designed complex data processes, architected machine learning pipelines, and developed big data analytics solutions over the past 15+ years. He currently works for Rockstar Games, architecting data-driven big data platforms and artificial intelligence solutions to keep players engaged in Red Dead Redemption II and Grand Theft Auto V. He is an active senior member of the IEEE. His expertise and research in the big data space are well received in numerous international IEEE conferences. He is an editorial board member of a renowned international journal. He actively guides and reviews the research work of other scholars and professors around the world. He completed his master’s from San Diego State University in 2009.


This liveProject is for data scientists and engineers who are familiar with Python, the basics of machine learning, and data modeling. To begin this liveProject you will need to be familiar with the following:

  • Intermediate Python
  • Basics of Jupyter Notebook
  • Basics of pandas
  • Intermediate scikit-learn
  • Basics of SQL
  • Basic file processing
  • Intermediate data processing and feature engineering
  • Intermediate machine learning pipelines
  • Basic understanding of ML development cycles
  • Basic understanding of classification with logistic regression

you will learn

In this liveProject, you’ll master common Python libraries for the important task of machine learning feature engineering.

  • Classification with logistic regression
  • Generating model pickle files with Joblib
  • Build a scalable process to score new data with scikit-learn


You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.

choose your plan


only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Train and Score with Raw Data project for free