Predicting Loan Defaults Using scikit-learn and H2O

intermediate Python • beginner scikit-learn • pandas • Matplotlib • plotting and visualization
skills learned
xxploratory data analysis • working with pandas DataFrames • feature engineering • machine learning modeling with random forests • optimizing machine learning • model evaluation and comparison • deploying a model in a Python module
Nate George
4 weeks · 8-10 hours per week · INTERMEDIATE

placing your order...

Don't refresh or navigate away from the page.
liveProject liveProjects give you the opportunity to learn new skills by completing real-world challenges in your local development environment. These self-paced projects also come with full liveBook access to select books for 90 days plus permanent access to other select Manning products. $34.99 $49.99 you save: $15 (30%) self-paced learning
FREE domestic shipping on orders of three or more print books
Predicting Loan Defaults Using scikit-learn and H2O (liveProject) added to cart
continue shopping
go to cart

Look inside
The Python data science ecosystem is a powerful and open-source toolset utilized daily by thousands of data scientists and machine learning engineers. But with so many Python machine learning libraries to choose from, which tool works best for your needs?

In this liveProject, you’ll go hands-on with the scikit-learn and H2O frameworks, using them both to build working machine learning classifiers. You’ll use raw financial data and the tried-and-true random forest model to predict the chance of financial loan defaults. Once you’ve built your models, you'll compare implementations to find out which works best and evaluate your results against existing hard-coded tools.
This project is designed for learning purposes and is not a complete, production-ready application or solution.

book resources

When you start your liveProject, you get full access to the following books for 90 days.

project author

Nathan George
Nate George started his career studying LEDs for his Ph.D. and working on solar cell manufacturing. He then leveraged his programming and mathematics experience to move to data science. Nate has been teaching and developing several data science and math courses at Regis University since 2017, mentors students at Udacity, and has developed a Python machine learning course at DataCamp. Nate's expertise includes data engineering (database technologies such as MongoDB and PostgreSQL and cloud technologies such as GCP and AWS), data science (Python, R, statistics), and machine learning.


This liveProject is for aspiring data scientists and machine learning engineers who want to practice their skills in a real-world environment. To begin this liveProject, you will need to be familiar with:

  • Intermediate Python
  • Beginner Jupyter Notebook
  • Beginner Matplotlib
  • Beginner pandas
  • Beginner scikit-learn
  • Beginner Plotting and visualization
  • Beginner Data munging with pandas

you will learn

In this liveProject, you’ll learn core skills of data science and machine learning that are easy to transfer across roles and industries.

  • Exploratory data analysis
  • Working with pandas DataFrames
  • Machine learning feature engineering
  • Machine learning modeling with random forests
  • Optimizing ML and random forest models
  • Model evaluation and comparison
  • Deploying a model in a Python module


You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.