Predicting Loan Defaults Using scikit-learn and H2O you own this product

intermediate Python • beginner scikit-learn, pandas, and Matplotlib • plotting and visualization
skills learned
exploratory data analysis • working with pandas DataFrames • feature engineering • machine learning modeling with random forests • optimizing machine learning • model evaluation and comparison • deploying a model in a Python module
Nate George
4 weeks · 8-10 hours per week · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!


5, 10 or 20 seats+ for your team - learn more

Look inside
The Python data science ecosystem is a powerful and open-source toolset utilized daily by thousands of data scientists and machine learning engineers. But with so many Python machine learning libraries to choose from, which tool works best for your needs?

In this liveProject, you’ll go hands-on with the scikit-learn and H2O frameworks, using them both to build working machine learning classifiers. You’ll use raw financial data and the tried-and-true random forest model to predict the chance of financial loan defaults. Once you’ve built your models, you'll compare implementations to find out which works best and evaluate your results against existing hard-coded tools.
This project is designed for learning purposes and is not a complete, production-ready application or solution.

project author

Nathan George
Nate George started his career studying LEDs for his Ph.D. and working on solar cell manufacturing. He then leveraged his programming and mathematics experience to move to data science. Nate has been teaching and developing several data science and math courses at Regis University since 2017, mentors students at Udacity, and has developed a Python machine learning course at DataCamp. Nate's expertise includes data engineering (database technologies such as MongoDB and PostgreSQL and cloud technologies such as GCP and AWS), data science (Python, R, statistics), and machine learning.


This liveProject is for aspiring data scientists and machine learning engineers who want to practice their skills in a real-world environment. To begin this liveProject, you will need to be familiar with:

  • Intermediate Python
  • Beginner Jupyter Notebook
  • Beginner Matplotlib
  • Beginner pandas
  • Beginner scikit-learn
  • Beginner Plotting and visualization
  • Beginner Data munging with pandas


You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.

choose your plan


only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Predicting Loan Defaults Using scikit-learn and H2O project for free