Data Ingestion and Preprocessing

you own this product
prerequisites
Basic understanding of event-driven architectures and streaming data concepts • Familiarity with AWS Kinesis for real-time data ingestion • Knowledge of Python and SQL for preprocessing logic
skills learned
Setting up real-time data ingestion with AWS Kinesis Data Streams • Using AWS Lambda for preprocessing streaming data • Automating infrastructure setup with IaC templates using AWS CloudFormation • Configuring Kinesis Dynamic Partitioning for optimized data storage • Storing preprocessed data in Amazon S3 as Parquet files • Monitoring streaming data ingestion and processing with AWS CloudWatch

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Look inside

In this liveProject, you'll step into the role of a data engineering specialist at Nexstellar Corp to tackle the challenge of transforming the company's data processing capabilities. As Nexstellar seeks to evolve from batch to real-time data analysis, you'll develop a streaming data pipeline using Amazon Kinesis and AWS Lambda. You'll design a system that can rapidly ingest, transform, and store streaming data, enabling the multimedia service provider to make lightning-fast, data-driven decisions. By the end of this project, you'll have constructed a robust, scalable streaming solution that turns raw data into actionable insights!

This project uses Amazon Web Services, which should cost less than 2 USD for the whole project, including cleanup.

This project is designed for learning purposes and is not a complete, production-ready application or solution.

project author

Gianluigi Mucciolo
Gianluigi Mucciolo is a highly skilled computer engineer who specializes in AWS technologies and agile methodologies. As an AWS Authorized Instructor and Cloud Technical Principal, he is dedicated to advancing cloud professionals’ knowledge and participates in community-building initiatives. With a strong background in Artificial Intelligence and Big Data, Gianluigi constantly seeks growth opportunities. A team player, he excels in both collaborative and independent work settings. In his free time, Gianluigi enjoys intellectual discussions, reading, and connecting with nature for inspiration.

prerequisites

This liveProject is for engineers who want to build a Data Lake Lambda architecture with AWS Fully Managed Services. To begin this liveProject, you will need to know the following:


TOOLS
  • Basics of Amazon Web Services
  • Basics of Python
  • Basics of SQL
TECHNIQUES
  • Basics of Infrastructure Automation
  • Basics of Lambda Architecture

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from fellow participants and even more help with paid sessions with our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Data Ingestion and Preprocessing project for free