Mike Shakhomirov

Mike Shakhomirov is a lead data engineer at The World’s Online Festival. He has an MBA and MIT diploma in big data and social analytics and is a Google Cloud Certified Professional Data Engineer. He is a passionate and digitally focused individual with an abundance of drive and enthusiasm, loving the challenges the full mix of digital marketing can offer. He is also an official writer for publications such as Towards Data Science and The Startup, with more than 50 published articles on various topics. He writes about data engineering, machine learning, and AI in digital marketing.

projects by Mike Shakhomirov

Become a Data Engineer with AWS

3 weeks · 5-7 hours per week average · BEGINNER

Step into the shoes of a data engineer working for a mobile game development studio. The company’s data architecture includes an Amazon Athena data lake and an AWS Redshift data warehouse. The board has requested data insights based on user behavior data. You’ll create data pipelines that provide improved OLAP analytics based on user engagement data, build in-app user recommendations based on purchase preferences, and implement a data-driven decision-making process.

In the first liveProject, you’ll create a batch-processing data pipeline using AWS RDS, AWS S3, and Amazon Athena to learn one of the most cost-effective data platform design patterns. Next, you’ll build a simple yet reliable data streaming pipeline that prevents resource shortages and transforms data in real-time (while it’s still relevant), ensuring more accurate data. Lastly, you’ll use Amazon Personalize to create an ML data pipeline that provides product recommendations tailored to users’ data. By the end of the series, you’ll have learned data platform design concepts, business intelligence (BI) concepts, and the extract, transform, load (ETL) process using infrastructure as code, plus you’ll have valuable firsthand experience using popular AWS data transformation and processing tools to build data pipelines.

Pricing

Most of the services used in this liveProject series are available under the AWS Free Tier. However, the Free Tier doesn't cover RDS DB instances launched with Amazon Aurora, Amazon RDS for Microsoft SQL Server, or Oracle database engines. AWS RDS may incur charges if left running. Be sure to delete all associated RDS instances and backup images. Total charges should be under $2 for the series. Please check the AWS Pricing Calculator for more details and cost estimates.

ML Pipeline with Amazon Personalize

1 week · 6-8 hours per week · BEGINNER

Help your company’s messenger application provide better product recommendations for its customers. As a data engineer at the company, your task is to create a machine learning (ML) pipeline using the Amazon Personalize service. You’ll use CloudFormation templates to create a repository for the required AWS infrastructure resources, and AWS Glue to transform the raw user engagement data. Using Amazon Personalize, you’ll import a dataset and create and train the Amazon Personalize ML model for your users’ recommendations. To complete the project, you’ll create a workflow to train your Amazon Personalize recommendation solution using AWS Step Functions and user engagement events. When you’re done, you’ll have designed an ML pipeline using the Amazon Personalize API that provides product recommendations that suit your users best.

Data Streaming in AWS

1 week · 4-6 hours per week · BEGINNER

As a data engineer for a mobile game development studio, your task is to create a data streaming pipeline that collects and processes large streams of data records in real-time for lightning-fast analytics. Your company’s modern data platform architecture includes an Amazon Athena data lake and an AWS Redshift data warehouse solution. To store files, you’ll create an AWS S3 bucket, and you’ll create an AWS Kinesis delivery stream by using the boto3 library to connect to AWS Kinesis endpoints and send event data to the service. You’ll provision AWS Redshift resources and connect them to your AWS Kinesis Data Stream to analyze user behavior data to understand the user's journey inside the app. When you’re done, you’ll have a simple yet reliable data streaming pipeline that prevents resource shortages and transforms data in real-time—while it’s still relevant—ensuring more accurate data.

Data Pipeline with Amazon Athena

1 week · 5-7 hours per week · BEGINNER

Congratulations! You’ve just been hired as a data engineer for a mobile game development studio. The company’s modern data platform architecture includes an Amazon Athena data lake and an AWS Redshift data warehouse solution. Your task is to enable batch processing of revenue transaction data by creating an end-to-end data pipeline, connecting various data sources—including user engagement events, stage controls, and public chat messaging—to the lake house solution. Using AWS CloudFormation, you’ll provision the resources required for the data pipeline. You’ll connect a MySQL data source to the AWS S3 Data Lake and transform data in the data lake using Amazon Athena. You’ll wrap up the project by creating a dynamic analytics dashboard with AWS QuickSight. When you’re done, you’ll have built a batch-processing data pipeline, start to finish, using Amazon Athena.

Build a Data Warehouse in the Multi Cloud

4 weeks · 6-9 hours per week average · INTERMEDIATE

In this series of liveProjects, you’ll build a complete data warehouse solution that can serve essential data to a wide team of data, finance, and marketing departments across your company. Your company’s data is growing faster than ever, so your solution needs to scale. You’ll also need to account for your hybrid stack, with data and resources spread across both AWS and Google Cloud Platform. Each liveProject in this series is standalone, covering another essential task of building and migrating to a cloud data platform.

Business Intelligence with BigQuery

1 week · 8-10 hours per week · INTERMEDIATE

In this liveProject, you’ll create a Google Data Studio report that can visualise your datasets for the purpose of business intelligence. You’ll connect up numerous data sources using BigQuery to feed directly into your new BI dashboard. You’ll design and implement revenue and daily spend metrics and line graphs, and scorecards for top users.

Build a Data Pipeline with BigQuery

1 week · 8-10 hours per week · INTERMEDIATE

In this liveProject, you’ll create a data pipeline that can enrich source data with currency exchange rates and converted totals. As this pipeline is updated daily, you’ll need to create a scheduled query to enrich your data. You’ll learn to load essential data into BigQuery from both CSVS and JSON, and use it to generate a daily business intelligence report for your colleagues.

Extract Data from MySQL

1 week · 6-8 hours per week · INTERMEDIATE

In this liveProject, you’ll build an extraction pipeline to transfer data from a legacy server-side database in AWS MySQL into your new cloud platform. You’ll learn how to create an integration data layer, and develop a maintainable and flexible data intrastate that is both easy to work with and easy to scale.

Load Data into BigQuery

1 week · 4-6 hours per week · INTERMEDIATE

In this liveProject, you’ll use Google Cloud Platform’s BigQuery tool to transfer data from a legacy data lake to new cloud storage on GCP. You’ll start by setting up your Google Cloud developer account, then utilize Google Web Console to create a Cloud Storage bucket and load data into a BigQuery table. You’ll learn how to handle different file formats, and manage resources with Google Cloud Shell.