Three-Project Series

Stream Processing with Kafka and Spark you own this product

prerequisites
intermediate Scala • basic shell • basic Kafka • basic Spark
skills learned
set up Kafka Cluster • write Kafka Producer • connect Spark to Kafka • basic stream processing • complex stream processing
Gaurav Bhardwaj
3 weeks · 5-7 hours per week average · BEGINNER

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • share your subscription with another person
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Welcome to Free Power Corporation Limited (FPCL), a London-based energy company looking for a solution to deal with surging energy costs. FPCL has installed Smart Meters, which generate energy readings every thirty minutes, in households across London. As a data engineer for FPCL, you’ll create a Kafka cluster and ingest the real-time Smart Meter data into it. You’ll use Spark to read, clean, join, and process the data, adding logic to handle potential real-world problems like data loss and duplicate data. To meet the different business requirements of various FPCL teams, you’ll also perform advanced stream processing on the data streams. By the end of this series of liveProjects, you’ll have the experience and skills to ingest large amounts of data and perform complex analysis on it in real time using Apache Kafka and Spark.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

The project is taking a very good progressive way to bring the user from basics to advanced covering the foundations of Kafka.

Georges Michel, founder and president, Paaneah, LLC.

book resources

When you start each of the projects in this series, you'll get full access to the following book for 90 days.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Stream Processing with Kafka and Spark project for free

It's a very good project to learn Spark Streaming with Kafka. Very well executed with simple steps.

Rambabu Posa, data engineer, Sai Aashika Consultancy Limited

For me, it is a definite game changer as I can now say I have real-life experience with event streaming as my current company (due to regulatory constraints) cannot adopt new technologies on a whim.

Monil Chheda, engineering manager, eClinicalWorks

project author

Gaurav Bhardwaj

Gaurav Bhardwaj has almost two decades of experience designing and developing enterprise software for large-scale data processing and machine learning. Currently working as a Big Data Architect for an IT consulting firm, he helps clients build mature data platforms (on-prem, cloud, and hybrid) and develops solutions involving large-scale data processing, data management and governance, machine learning models, and more. He has also authored official documentation for Apache HBase coprocessor.

Prerequisites

These liveProjects are for intermediate Scala developers and data engineers with basic knowledge of distributed computing technologies such as Apache Spark. To begin these liveProjects you’ll need to be familiar with the following:

TOOLS
  • Basic Apache Kafka
  • Basic Scala
TECHNIQUES
  • Data Ingestion
  • Real-time stream processing

you will learn

In this liveProject series, you’ll learn to use Kafka and Spark to ingest, stream, and process large amounts of data.

  • Install a local Kafka cluster
  • Create a topic in the Kafka cluster
  • Determine and create the appropriate number of partitions for each topic
  • Configure Spark Streaming to read data from Kafka
  • Write and run streaming jobs
  • Save a data stream to a data lake
  • Enrich a data stream
  • Write a Spark stream to a Kafka topic to be used by other systems
  • Handle late data and data arriving out-of-order
  • Join two streams
  • Use arbitrary stateful processing for advanced steam processing
  • Deploy and test on a local cluster

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.