Five-Project Series

Hands-on Data Science with Julia you own this product

prerequisites
basics of Julia • intermediate scikit-learn • intermediate data wrangling
skills learned
tabular data ingestion and integrity validation • clustering data with k-means and DBSCAN algorithms • calling Python modules from Julia
Łukasz Kraiński and Bogumił Kamiński
5 weeks · 4-6 hours per week average · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


In this series of liveProjects you’ll explore the exciting scientific computing language Julia, and tackle common data science tasks with its robust machine learning ecosystem. These insightful and engaging projects work through each stage of a data science pipeline, from data preprocessing to building and training machine learning models. You’ll step into the role of a data scientist for a real estate company and develop hands-on experience with Julia—whether you’re working step by step or dipping into the tasks most relevant to your career.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

here's what's included

Project 1 Data Preprocessing
In this liveProject, you’ll test your data wrangling and data processing skills using the Julia language. You’ll step into the role of a data scientist for a real estate company with a new task from your boss—analyze and clean housing and census data for the marketing and sales teams. You’ll employ the popular Julia package DataFrame.jl as well as powerful statistics related libraries to successfully explore these datasets, and prepare them for machine learning.
Project 2 K-means and DBSCAN Clustering
In this liveProject, you’ll use the Julia language and clustering algorithms to analyze sales data and determine groups of products with similar demand patterns. Clustering is a well-established unsupervised learning technique that’s commonly used to discover patterns and relations in data. You’ll apply k-means and DBSCAN clustering techniques to housing sales data for a retail startup, leveraging your basic Julia skills into mastery of this machine learning task.
Project 3 Dimensionality Reduction with PCA, t-SNE and UMAP
In this liveProject, you’ll use the Julia programming language and dimensionality reduction techniques to visualize housing sales data on a scatter plot. This visualization will allow the marketing team to identify links and demand patterns in sales, and is also a useful tool for noise reduction or variance analysis. You’ll use the popular PCA algorithm to visualize the sales dataset with overlaid clustering assignments from k-means and DBSCAN methods, and then expand Julia’s capabilities by calling Python modules using the PyCall.jl package. This extra flexibility will allow you to explore the t-SNE and UMAP algorithms which have excellent results for high-dimensional datasets.
Project 4 Regression Using GLM and DecisionTree
In this liveProject, you’ll use the Julia language to build a regression-based machine learning model that can predict median house value in a neighborhood. You’ll start out with a simple linear regression model to give you a baseline value for quality metrics created with Julia’s package for Generalized Linear Models. You’ll then tune and assess a random forest model, and compare and contrast the two approaches to pick the best results.
Project 5 Classification with XGBoost
In this liveProject, you’ll use the Julia language to build a classification-based machine learning model that can predict the salary of a customer based on their sociodemographic data. This model will then be used to serve premium advertising to wealthier customers. You’ll build and evaluate XGBoost models with the dedicated Julia XGBoost.jl package, tune the hyperparameters, and assess your model’s capabilities using ROC curve, and measures such as AUC, accuracy, recall, and precision.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Hands-on Data Science with Julia project for free

project authors

Bogumil Kaminski
Bogumił Kamiński is Head of the Decision Analysis and Support Unit and Chairman of the Scientific Council for the Discipline of Economics and Finance at SGH Warsaw School of Economics. He also holds a position of adjunct professor at the Data Science Laboratory at Ryerson University and is affiliated with Fields Institute (Computational Methods in Industrial Mathematics Laboratory). In the Julia community, he is the owner of the JuliaData organization and a member of JuliaStats and JuliaLang organizations on GitHub. He also contributes to the community as the top answerer for the [julia] tag on Stack Overflow.
Lukasz Krainski
Łukasz Kraiński is a research assistant at the Decision Analysis and Support Unit at SGH Warsaw School of Economics. He is a certified cloud engineer with expertise in Azure and GCP cloud platforms. You can find him at tech conferences speaking about MLOps and AI (MLinPL 2019, PositivTech 2020, Data Driven Innovation 2020). Łukasz is also an active developer and maintainer of Julia packages (CGE.jl, SmartTransitionSim.jl).

Prerequisites

This liveProject is for experienced data scientists and data analysts who are interested in building their skills in Julia. To begin this liveProject, you will need to be familiar with the following:


TOOLS
  • Basics of Jupyter notebook
  • Basics of Julia and intermediate knowledge of another high-level programming language such as Python or R
TECHNIQUES
  • Intermediate data wrangling
  • Intermediate data visualization
  • Basics of bootstrapping
  • Basic usage of command pipelines
  • Basic usage of functions and control flow
  • Basic errors and correlation analysis

you will learn

In this liveProject, you’ll learn to use the powerful Julia language and its rapidly developing ecosystem to perform essential data preprocessing tasks.


  • Tabular data ingestion and integrity validation
  • Exploratory data analysis using descriptive and graphical techniques
  • Feature selection and feature engineering
  • Data cleaning and preprocessing

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.