In this series of liveProjects you’ll explore the exciting scientific computing language Julia, and tackle common data science tasks with its robust machine learning ecosystem. These insightful and engaging projects work through each stage of a data science pipeline, from data preprocessing to building and training machine learning models. You’ll step into the role of a data scientist for a real estate company and develop hands-on experience with Julia—whether you’re working step by step or dipping into the tasks most relevant to your career.
These projects are designed for learning purposes and are not complete, production-ready applications or solutions.
here's what's included
Project 1 Data Preprocessing
In this liveProject, you’ll test your data wrangling and data processing skills using the Julia language. You’ll step into the role of a data scientist for a real estate company with a new task from your boss—analyze and clean housing and census data for the marketing and sales teams. You’ll employ the popular Julia package DataFrame.jl as well as powerful statistics related libraries to successfully explore these datasets, and prepare them for machine learning.
Project 2 K-means and DBSCAN Clustering
In this liveProject, you’ll use the Julia language and clustering algorithms to analyze sales data and determine groups of products with similar demand patterns. Clustering is a well-established unsupervised learning technique that’s commonly used to discover patterns and relations in data. You’ll apply k-means and DBSCAN clustering techniques to housing sales data for a retail startup, leveraging your basic Julia skills into mastery of this machine learning task.
Project 3 Dimensionality Reduction with PCA, t-SNE and UMAP
In this liveProject, you’ll use the Julia programming language and dimensionality reduction techniques to visualize housing sales data on a scatter plot. This visualization will allow the marketing team to identify links and demand patterns in sales, and is also a useful tool for noise reduction or variance analysis. You’ll use the popular PCA algorithm to visualize the sales dataset with overlaid clustering assignments from k-means and DBSCAN methods, and then expand Julia’s capabilities by calling Python modules using the PyCall.jl package. This extra flexibility will allow you to explore the t-SNE and UMAP algorithms which have excellent results for high-dimensional datasets.
Project 4 Regression Using GLM and DecisionTree
In this liveProject, you’ll use the Julia language to build a regression-based machine learning model that can predict median house value in a neighborhood. You’ll start out with a simple linear regression model to give you a baseline value for quality metrics created with Julia’s package for Generalized Linear Models. You’ll then tune and assess a random forest model, and compare and contrast the two approaches to pick the best results.
Project 5 Classification with XGBoost
In this liveProject, you’ll use the Julia language to build a classification-based machine learning model that can predict the salary of a customer based on their sociodemographic data. This model will then be used to serve premium advertising to wealthier customers. You’ll build and evaluate XGBoost models with the dedicated Julia XGBoost.jl package, tune the hyperparameters, and assess your model’s capabilities using ROC curve, and measures such as AUC, accuracy, recall, and precision.
This liveProject is for experienced data scientists and data analysts who are interested in building their skills in Julia. To begin this liveProject, you will need to be familiar with the following:
- Basics of Jupyter notebook
- Basics of Julia and intermediate knowledge of another high-level programming language such as Python or R
- Intermediate data wrangling
- Intermediate data visualization
- Basics of bootstrapping
- Basic usage of command pipelines
- Basic usage of functions and control flow
- Basic errors and correlation analysis
you will learn
In this liveProject, you’ll learn to use the powerful Julia language and its rapidly developing ecosystem to perform essential data preprocessing tasks.
- Tabular data ingestion and integrity validation
- Exploratory data analysis using descriptive and graphical techniques
- Feature selection and feature engineering
- Data cleaning and preprocessing