Data Lake and Reference Data Management

you own this product
prerequisites
Understanding of data lakes and their architecture • Familiarity with AWS Glue and S3 for data management • Knowledge of data cataloging and data organization
skills learned
Setting up AWS Glue Streams for real-time data ingestion • Automating infrastructure creation and release pipelines with IaC templates • Transforming and storing real-time data in Amazon S3 using Apache PySpark • Configuring and managing AWS Glue Data Catalog for metadata management • Utilizing AWS Athena for querying and exploring data stored in the Data Lake

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Look inside

In this liveProject, you'll transform raw streaming data into a powerful, integrated data lake using cutting-edge tools like Apache Spark Structured Streaming, AWS Glue, and Amazon Athena. You'll develop a robust system that can process streaming information, update existing datasets, and enable lightning-fast analytics that give your company a competitive edge in the fast-moving multimedia market.

This project uses Amazon Web Services, which should cost less than 3 USD for the whole project, including cleanup.

This project is designed for learning purposes and is not a complete, production-ready application or solution.

project author

Gianluigi Mucciolo
Gianluigi Mucciolo is a highly skilled computer engineer who specializes in AWS technologies and agile methodologies. As an AWS Authorized Instructor and Cloud Technical Principal, he is dedicated to advancing cloud professionals’ knowledge and participates in community-building initiatives. With a strong background in Artificial Intelligence and Big Data, Gianluigi constantly seeks growth opportunities. A team player, he excels in both collaborative and independent work settings. In his free time, Gianluigi enjoys intellectual discussions, reading, and connecting with nature for inspiration.

prerequisites

This liveProject is for engineers who want to build a Data Lake Lambda architecture with AWS Fully Managed Services. To begin this liveProject, you will need to know the following:


TOOLS
  • Basics of Amazon Web Services
TECHNIQUES
  • Basics of Infrastructure Automation
  • Basics of Lambda Architecture

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from fellow participants and even more help with paid sessions with our expert mentors.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.
book resources
Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Data Lake and Reference Data Management project for free