Gaurav Bhardwaj

Gaurav Bhardwaj has almost two decades of experience designing and developing enterprise software for large-scale data processing and machine learning. Currently working as a Big Data Architect for an IT consulting firm, he helps clients build mature data platforms (on-prem, cloud, and hybrid) and develops solutions involving large-scale data processing, data management and governance, machine learning models, and more. He has also authored official documentation for Apache HBase coprocessor.

projects by Gaurav Bhardwaj

Stream Processing with Kafka and Spark

3 weeks · 5-7 hours per week average · BEGINNER

Welcome to Free Power Corporation Limited (FPCL), a London-based energy company looking for a solution to deal with surging energy costs. FPCL has installed Smart Meters, which generate energy readings every thirty minutes, in households across London. As a data engineer for FPCL, you’ll create a Kafka cluster and ingest the real-time Smart Meter data into it. You’ll use Spark to read, clean, join, and process the data, adding logic to handle potential real-world problems like data loss and duplicate data. To meet the different business requirements of various FPCL teams, you’ll also perform advanced stream processing on the data streams. By the end of this series of liveProjects, you’ll have the experience and skills to ingest large amounts of data and perform complex analysis on it in real time using Apache Kafka and Spark.

Advanced Stream Processing

1 week · 8-10 hours per week · BEGINNER

You’re the star data engineer at Free Power Corporation Limited (FPCL). The London-based power company is interested in gaining insight into its customers’ energy usage patterns, and it’s up to you to deliver a data-rich solution that satisfies the requirements of FPCL’s various teams. You’ll create a streaming Spark application to read the consumer event stream from Kafka, you’ll add information that helps the teams determine when data was generated, ingested, and processed, and you’ll write logic to reorder any late or out-of-order data. To provide vital household energy consumption statistics to the sales and electrical engineering teams, you’ll join Kafka data streams and perform complex computations on the resulting stream. To be sure your solution is ready for the teams to use, you’ll test it on the local Spark cluster. When you’ve finished, you’ll have learned advanced stream processing skills that empower you to meet the different business requirements of various enterprise departments.

Real-time Data Processing

1 week · 4-6 hours per week · BEGINNER

As part of an endeavor to better handle surging energy prices, Free Power Corporation Limited (FPCL) has a Kafka cluster that ingests large amounts of consumer energy data. As a data engineer for FPCL, you’re already familiar with the data, so the London-based power company has tasked you with building a streaming solution that processes the data as soon as it’s available. Using Apache Spark, you’ll create an application to read the data from the Kafka streams, and you’ll save the streams to a data lake. Using a Spark API, you’ll prepare the data for analysis by performing aggregation on the fly. You’ll join the real-time stream with the static data, enriching it with customer details and enabling FPCL’s research team to gain insights about customer energy consumption patterns. When you’re done, FPCL will be better equipped to deal with rising energy costs, and you’ll have hands-on experience building a real-time data processing solution using Apache Spark and Kafka.

Ingest Consumer Data

1 week · 4-6 hours per week · BEGINNER

As a first step in dealing with surging energy prices, Free Power Corporation Limited (FPCL) has installed Smart Meters, which generate energy readings every thirty minutes, in households across London in order to analyze consumers’ energy usage. As a new data engineer for the power company, your task is to ingest the data from the Smart Meter readings and stream it to FPCL data centers for processing. Using the Kafka command-line tool, you’ll create topics in a Kafka cluster for storing the data, and you’ll create partitions for distributing the load within the topics. You’ll add logic to deal with potential problems such as data loss and duplicate records, and you’ll add a method to convert the energy readings to the widely used, easy-to-parse JSON format before the final step of ingesting the data. When you’re finished, FPCL will have pertinent data for analyzing energy consumption patterns, and you’ll have practical experience using Kafka to ingest large amounts of data.