Bogumil Kaminski

Bogumił Kamiński is a lead developer of DataFrames.jl, the core package for data manipulation in the Julia ecosystem. He has over 20 years of experience delivering data science projects for corporate customers. Bogumił also has over 20 years of experience teaching data science at the undergraduate and graduate levels.

products by Bogumil Kaminski

Julia for Data Analysis: Analyzing the social network of GitHub developers

  • Course duration: 59m

How does analyzing the social network of GitHub web and machine learning developers help to predict their neighbors' tags?

Analyzing Lichess Puzzles Database with Julia

  • Course duration: 50m

An analysis of the relationship between the chess puzzle difficulty and popularity, from fetching the data from the web and uncompressing it to building a prediction model and visualization with Julia.

Julia for Data Analysis

  • December 2022
  • ISBN 9781633439368
  • 472 pages
  • printed in black & white
  • available in Korean

Julia for Data Analysis teaches you how to handle core data analysis tasks with the Julia programming language. You’ll start by reviewing language fundamentals as you practice techniques for data transformation, visualizations, and more. Then, you’ll master essential data analysis skills through engaging examples like examining currency exchange, interpreting time series data, and even exploring chess puzzles. Along the way, you’ll learn to easily transfer existing data pipelines to Julia.

Hands-on Data Science with Julia

5 weeks · 4-6 hours per week average · INTERMEDIATE

In this series of liveProjects you’ll explore the exciting scientific computing language Julia, and tackle common data science tasks with its robust machine learning ecosystem. These insightful and engaging projects work through each stage of a data science pipeline, from data preprocessing to building and training machine learning models. You’ll step into the role of a data scientist for a real estate company and develop hands-on experience with Julia—whether you’re working step by step or dipping into the tasks most relevant to your career.

Classification with XGBoost

1 week · 4-6 hours per week · INTERMEDIATE

In this liveProject, you’ll use the Julia language to build a classification-based machine learning model that can predict the salary of a customer based on their sociodemographic data. This model will then be used to serve premium advertising to wealthier customers. You’ll build and evaluate XGBoost models with the dedicated Julia XGBoost.jl package, tune the hyperparameters, and assess your model’s capabilities using ROC curve, and measures such as AUC, accuracy, recall, and precision.

Regression Using GLM and DecisionTree

1 week · 6-8 hours per week · INTERMEDIATE

In this liveProject, you’ll use the Julia language to build a regression-based machine learning model that can predict median house value in a neighborhood. You’ll start out with a simple linear regression model to give you a baseline value for quality metrics created with Julia’s package for Generalized Linear Models. You’ll then tune and assess a random forest model, and compare and contrast the two approaches to pick the best results.

Dimensionality Reduction with PCA, t-SNE and UMAP

1 week · 4-6 hours per week · INTERMEDIATE

In this liveProject, you’ll use the Julia programming language and dimensionality reduction techniques to visualize housing sales data on a scatter plot. This visualization will allow the marketing team to identify links and demand patterns in sales, and is also a useful tool for noise reduction or variance analysis. You’ll use the popular PCA algorithm to visualize the sales dataset with overlaid clustering assignments from k-means and DBSCAN methods, and then expand Julia’s capabilities by calling Python modules using the PyCall.jl package. This extra flexibility will allow you to explore the t-SNE and UMAP algorithms which have excellent results for high-dimensional datasets.

K-means and DBSCAN Clustering

1 week · 4-6 hours per week · INTERMEDIATE

In this liveProject, you’ll use the Julia language and clustering algorithms to analyze sales data and determine groups of products with similar demand patterns. Clustering is a well-established unsupervised learning technique that’s commonly used to discover patterns and relations in data. You’ll apply k-means and DBSCAN clustering techniques to housing sales data for a retail startup, leveraging your basic Julia skills into mastery of this machine learning task.

Data Preprocessing

1 week · 4-6 hours per week · INTERMEDIATE

In this liveProject, you’ll test your data wrangling and data processing skills using the Julia language. You’ll step into the role of a data scientist for a real estate company with a new task from your boss—analyze and clean housing and census data for the marketing and sales teams. You’ll employ the popular Julia package DataFrame.jl as well as powerful statistics related libraries to successfully explore these datasets, and prepare them for machine learning.