Three-Project Series

Event-Driven Data Pipeline with Python and Kafka you own this product

prerequisites
intermediate Python • beginner Docker • intermediate database management
skills learned
event-driven architecture • real-time messaging • diagramming tools • containers • basic database development • packaging in Python • Faust library • web scraping
Robert Koch and Shane Smith-Sahnow
4 weeks · 4-6 hours per week average · BEGINNER

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • share your subscription with another person
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Welcome to the Piper Data Concepts (PDC) team! You’re a member of its development team, and a Fortune 1000 client has asked you to modernize its workflow process, which just happens to be PDC’s specialty. In this liveProject series, you’ll review the client’s 15-year-old batch-based system, identify issues and bottlenecks, and determine what’s needed to transform its workflow into a more reactive, extensible, and dynamic system. To create observability, you’ll build an event-driven data pipeline with Kafka, use Python Poetry to package the project, write Python code using the Faust library to communicate with Kafka, and store the consumed data in a PostgreSQL database.

Your final goal will be to enable the client’s staff to gather workflow process information in real-time. You’ll write Python code that consumes messages from Kafka and prepares them for storing in the database, create Postgres queries to get the aggregated data, and build reports in CSV files to be read by visualization tools. When you’re done with these projects, your client’s workflow will be more resilient, responsive, and plugin-ready—and you’ll have a solid understanding of event-driven architecture.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

here's what's included

Project 1 Python and Kafka

Step into the role of a developer at Piper Data Concepts (PDC), a company that specializes in helping Fortune 1000 companies improve their workflows. Your task is to review the 15-year-old workflow architecture of one of your clients, Trade Data Systems. You’ll identify issues and bottlenecks, then determine what’s needed to transform its workflow into a more modern, responsive architecture. To accomplish this, you’ll set up a development environment with Docker using Kafka, Python, and Postgres. As you go, you’ll deploy a Kafka cluster and write Python code using the Faust library to seamlessly process pre-defined business events.

Project 2 Observability

Put on your platform architect hat! You’re a member of the development team at Piper Data Concepts (PDC), and your client is looking to modernize its workflow. An existing benchmarked development environment, made up of Kafka, Python, and Postgres, is at your disposal. Now it’s time to start conceptualizing the new and improved workflow. You’ll use Kafka to create an event-driven data pipeline, review and understand business requirements, use Python Poetry to package the project, write Python code using the Faust library to communicate with Kafka, and store the consumed data in a PostgreSQL database.

Project 3 Automate Reports

As a member of the development team at Piper Data Concepts, you’ll carry out the final steps of a workflow-improvement project: enabling your client’s staff to gather workflow process information in real-time. Several prototypes have been built, and the client’s workflow is more resilient than ever. You’ll write Python code that consumes messages from Kafka and prepares them for storing in the database, create Postgres queries to access the aggregated data, and build reports in CSV files to be read by visualization tools—and ultimately, your client’s staff. When you’re done, your client’s modern system will provide a feedback loop, enable external API access to status updates, and be ready for more specialized services to be plugged in later, with no code changes.

book resources

When you start each of the projects in this series, you'll get full access to the following book for 90 days.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Event-Driven Data Pipeline with Python and Kafka project for free

project authors

Robert Koch
Rob Koch is a Principal Data Engineer at Slalom Build, and one of the community leaders of DeafintheCloud.com. He helps drive cloud-native architecture, blogs about migrating to the cloud and use of Lambdas, and has a passion for data- and event-driven systems. Having earned five AWS certifications (Cloud Practitioner, Big Data Specialty, DevOps Engineer Associate, SysOps Administrator Associate, and Solution Architect Associate), Robert is actively involved in the development community in Denver, often speaking at Denver Dev Day and the AWS Denver Meetup. Robert’s goal is to help the community understand the advantages of migrating to the cloud, being cloud-native, and having “serverless” applications and databases.
Shane Smith-Sahnow
Shane Smith-Sahnow is a software engineer at Netlify. He has spent time at Github and New Relic working on large scale Kafka applications, developing analytical data pipelines, and shipping large scale features.

Prerequisites

This liveProject is for programmers interested in learning the concepts and skills used in event-driven development and its implementation. To begin these liveProjects you’ll need to be familiar with the following:

TOOLS
  • Intermediate Python
  • Kafka
  • Postgres
  • Docker Desktop
  • Faust
  • Poetry
TECHNIQUES
  • Kafka streaming
  • Architecture design and review
  • Consume and produce real-time payloads
  • Generate reports

you will learn

In this liveProject series, you’ll learn how to assemble disparate pieces into one cohesively assembled event-driven pipeline using Python.

  • Event-driven architecture
  • Real-time messaging
  • Diagramming tools
  • Container use
  • Basic database development
  • Packaging in Python
  • Faust library
  • Web scraping

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants and our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.