Build a Real-time Layer for Streaming Data

you own this product
prerequisites
Streaming data concepts and event-driven architectures • AWS services (Kinesis, Glue, S3, OpenSearch, Bedrock) • Python programming and SQL • Data lakes and reference data integration • Orchestration tools and Infrastructure as Code (IaC) • Stream processing with Apache Flink and Zeppelin
skills learned
Real-time data ingestion with AWS Kinesis and Firehose • Stream preprocessing via Lambda with S3 storage • Automated infrastructure for data pipelines • Data transformation using PySpark and Glue Streams • AWS Glue Catalog metadata management • Data exploration with AWS Athena • Streaming analytics with Apache Flink/Zeppelin • OpenSearch optimization for data storage • Amazon Bedrock for generative AI solutions • IaC templates for scalable streaming infrastructure
Gianluigi Mucciolo
4 weeks · 6-8 hours per week average · INTERMEDIATE

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Nexstellar Corporation is one of the leading messaging and multimedia service providers in Italy, and they want to make their customer experience even better! They’ve recruited you to spearhead an ambitious project to implement a Lambda architecture on AWS. Over four interconnected projects, you'll build a state-of-the-art real-time data processing pipeline that transforms how modern businesses handle and leverage massive data streams. Starting with data ingestion using Amazon Kinesis, you'll progress through increasingly complex challenges: creating a robust data lake, implementing streaming analytics with Apache Flink, and finally developing an intelligent system using OpenSearch and Amazon Bedrock. You'll master cutting-edge cloud technologies like AWS Glue, Apache Zeppelin, and serverless architectures while learning to turn raw data into actionable insights that drive business decisions

This series uses Amazon Web Services, which should cost less than 18 USD for the whole series, including cleanup.

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

“Its clarity and practicality shine, delivering on its promise of teaching real-time data streaming and Amazon Bedrock integration with valuable hands-on experience.”

Pradeep Kumar Muthukamatchi, Principal Cloud Architect, Microsoft

here's what's included

Project 1 Data Ingestion and Preprocessing

In this liveProject, you'll step into the role of a data engineering specialist at Nexstellar Corp to tackle the challenge of transforming the company's data processing capabilities. As Nexstellar seeks to evolve from batch to real-time data analysis, you'll develop a streaming data pipeline using Amazon Kinesis and AWS Lambda. You'll design a system that can rapidly ingest, transform, and store streaming data, enabling the multimedia service provider to make lightning-fast, data-driven decisions. By the end of this project, you'll have constructed a robust, scalable streaming solution that turns raw data into actionable insights!

This project uses Amazon Web Services, which should cost less than 2 USD for the whole project, including cleanup.

Project 2 Data Lake and Reference Data Management

In this liveProject, you'll transform raw streaming data into a powerful, integrated data lake using cutting-edge tools like Apache Spark Structured Streaming, AWS Glue, and Amazon Athena. You'll develop a robust system that can process streaming information, update existing datasets, and enable lightning-fast analytics that give your company a competitive edge in the fast-moving multimedia market.

This project uses Amazon Web Services, which should cost less than 3 USD for the whole project, including cleanup.

Project 3 Data for Streaming Analytics

In this liveProject, you'll harness the power of Apache Flink to build a real-time data transformation pipeline for Nexstellar Corp. You'll master reading data from Amazon Kinesis using an enhanced fan-out configuration, write SQL queries in Apache Zeppelin, and create real-time visualizations. Your challenge: Deploy a fully operational streaming analytics application on AWS, starting from a Zeppelin notebook, and conduct a comprehensive comparison between AWS Glue Streaming and Apache Flink.

This project uses Amazon Web Services, which should cost less than 7 USD for the whole project, including cleanup.

Project 4 Generative AI for Data Exploration

In this liveProject, you'll transform Nexstellar Corp's data architecture by building a sophisticated OpenSearch solution powered by Amazon Bedrock. You'll extract embedding features from logs, create a dynamic Knowledge Base, and develop an intelligent agent with robust generative AI guardrails. Your challenge: design a flexible database system that can efficiently store, query, and analyze real-time data while seamlessly integrating with existing infrastructure. By project's end, you'll have the skills to turn complex data streams into actionable insights that drive business innovation!

This project uses Amazon Web Services, which should cost less than 5 USD for the whole project, including cleanup.

books resources

When you start each of the projects in this series, you'll get full access to the following books for 90 days.

choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Build a Real-time Layer for Streaming Data project for free

“I learned a lot more than I expected!”

Stephane Daigle, consulting engineer and architect for Autodesk, Autodesk

project author

Gianluigi Mucciolo
Gianluigi Mucciolo is a highly skilled computer engineer who specializes in AWS technologies and agile methodologies. As an AWS Authorized Instructor and Cloud Technical Principal, he is dedicated to advancing cloud professionals’ knowledge and participates in community-building initiatives. With a strong background in Artificial Intelligence and Big Data, Gianluigi constantly seeks growth opportunities. A team player, he excels in both collaborative and independent work settings. In his free time, Gianluigi enjoys intellectual discussions, reading, and connecting with nature for inspiration.

Prerequisites

This liveProject series is for engineers who want to build a Data Lake Lambda architecture with AWS Fully Managed Services. To begin this liveProject series, you will need to know the following:


TOOLS
  • Basics of Amazon Web Services
  • Basics of Python
  • Basics of SQL
  • Basics of Infrastructure as Code (IaC) using CloudFormation
  • Basics of Apache Flink and Zeppelin
  • Basics of Continuous Integration and Continuous Delivery
TECHNIQUES
  • Basics of Infrastructure Automation
  • Basics of Lambda Architecture
  • Stream Processing
  • Basics of RAG and Agent creation in Amazon Bedrock

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from fellow participants and even more help with paid sessions with our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.