Learn AI Data Engineering in a Month of Lunches you own this product

David Melillo

MEAP began September 2025
Last updated October 2025
Publication in Summer 2026 (estimated)

ISBN 9781633435728
225 pages (estimated)

Included with a Manning Online subscription

printed in black & white

catalog / Data Science / AI

table of content

Part 1: Core Concepts of Data Engineering with AI

1 Before You Begin

1.1 Why AI Matters to Data Engineering

1.2 Is This Book for You?

1.2.1 The Many Uses for AI

1.2.2 The Many Flavors of AI

1.3 How to Use This Book

1.3.1 The Main Chapters

1.3.2 Hands-on Labs

1.3.3 Chapter Setup Files

1.4 Setting Up Your Environment

1.4.1 Installing PostgreSQL and pgAdmin

1.4.2 Installing Jupyter Lab for Python Work

1.4.3 Creating an OpenAI Account

1.5 Being Immediately Effective with AI and Data Engineering

2 Advantages & Disadvantages of Using a Coding Companion

2.1 Mental Model: The Data Engineer and the Coding Companion

2.2 Advantages of Using an AI/LLM Coding Companion

2.2.1 Rapid Code Generation for Data Engineering Tasks

2.3 Disadvantages of Using a Coding Companion

2.3.1 Introduction to the Pagila Dataset

2.3.2 Example: Asking a Simple Question

2.4 Lab

2.5 Lab Answers

3 Using a Coding Companion with SQL

3.1 Zero-Shot Prompting

3.2 Few-Shot Prompting

3.3 Chain-of-Thought Prompting

3.4 Self-Consistency Prompting

3.5 Tree-of-Thought Prompting

3.6 Role-Playing, Domain Priming, Prompt Chaining and Beyond

3.7 Lab

3.8 Lab Answers

4 Using a Coding Companion with Python

4.1 Interacting with APIs Using AI Coding Companions & Python

4.1.1 Fetching Data from an API

4.1.2 Enhancing API Calls with AI Coding Companions and API Documentation

4.2 Unnesting Complex JSON Objects with AI Companions & Python

4.2.1 Simple Example: Flattening a Single Nested Field

4.2.2 Complex Example: Extracting Deeply Nested & Combined Fields

4.3 Using AI to Implement Regex Patterns

4.3.1 Extracting Phone Numbers from Text

4.3.2 Normalizing Phone Numbers with Regex and AI

4.3.3 Extracting Number Components into a DataFrame

4.4 Lab

4.5 Lab Answers

5 Using the OpenAI API in Data Workflows

5.1 Initial Setup and Data Extraction

5.2 Preprocessing Articles

5.3 Using ChatGPT for Sentiment Analysis

5.3.1 Understanding the ChatGPT API and Chat Completions Endpoint

5.3.2 Raw API Response Processing

5.4 Iteration - Normalizing Sentiment Output, Logging & Consolidation

5.4.1 Normalizing Sentiment Output

5.4.2 Logging & Consolidation

5.5 Lab

5.6 Lab Answers

Part 2: Data Cleaning & Transformation Pipelines with AI

6 AI & Data Quality

6.1 Identifying Data Quality Issues

6.2 Fixing Data Quality Issues

6.2.1 Understanding Data Classes

6.2.2 Using response_format

6.2.3 Working with Multiple Messages

6.3 Fixing Structural and Format Issues

6.4 Lab

6.5 Lab Answers

7 AI & Advanced Data Transformations

8 AI & The Data Lifecycle

9 Data Cleaning and Transformation Pipelines in Practice

Part 3: Generating Data with AI

10 Introduction to Web Scraping

11 Identifying Opportunities for AI-Generated Data

12 Handling Unstructured Data with AI

13 Data Scraping & AI

Part 4: Data Cleaning & Transformation Pipelines with AI

14 Introduction to Agentic Workflows for Data Engineers

15 Generating Subject Matter Expertise with AI

16 SME and Agentic Workflows: Decision Paths and Data Activation

17 Practical Application: AI-Driven Outreach for Marketing and Sales

Appendices

Appendix A: Setting Up Your Environment

Appendix B: Prompt Engineering Reference

Appendix C: Using the OpenAI API

Appendix D: Dataset Index

Appendix E: Troubleshooting Common Errors

Overview

1 Before You Begin

Artificial intelligence is presented as a defining technological shift, finally delivering on earlier promises thanks to abundant data, scalable compute, and practical applications. Its impact is broad—touching creative work, education, and software development—and raises questions about how roles might change. Rather than predicting replacement, the chapter positions AI as a force multiplier that augments human judgment and creativity. Even as agentic systems evolve, the core message is to use AI to eliminate drudgery so people can focus on higher‑value thinking and problem‑solving.

For data engineers, AI matters because it automates repetitive and infrastructure-heavy tasks, enabling a tighter focus on business logic, insight, and impact. Modern language models act as coding companions that write, scaffold, and review code; debug pipelines; and compare tooling options, hinting at a unified, language-first development workflow. The chapter places data engineering within the broader data ecosystem alongside analysts and data scientists, showing how AI accelerates each persona—from translating natural language to SQL to suggesting features, detecting anomalies, and structuring messy inputs. The takeaway: treat AI as a versatile multi-tool that speeds delivery, improves quality, and clarifies where human oversight is essential.

The book targets data engineers, analysts, data scientists, and builders who want to move beyond simple prompts toward programmatic ingestion, transformation, and enrichment at scale. It surveys everyday AI uses and then highlights data engineering applications such as automated cleansing, feature extraction, synthetic data generation, NLP enrichment, and governance. Readers get a pragmatic view of the evolving model landscape and how to select tools for specific use cases. The learning path follows a “Month of Lunches” cadence with hands-on labs, supported by practical setup guides, and a lightweight environment including SQL, Python, PostgreSQL, Jupyter, and an AI API key—offering an accessible, practice-first route to AI-enhanced data workflows.

Being Immediately Effective with AI and Data Engineering

This book is about practical application. While many books dive deep into LLM architectures and AI theory, this book is about making you effective immediately.

By the end of the first few chapters, you’ll be using AI to generate and validate SQL queries, clean and transform datasets, extract insights from unstructured data, automate feature engineering, and integrate AI into your data pipelines. This book is designed to be hands-on, applied, and immediately useful. Let’s get started!

FAQ

Does this book claim AI will replace developers and data teams?

No. It does not try to settle that debate. The book treats AI as an amplifier of human intelligence—best used to remove drudgery so professionals can focus on creativity, critical thinking, and problem-solving. Even as agentic systems evolve, the advantage goes to those who integrate AI into their workflows.

Why does AI matter specifically to data engineering?

AI offloads abstract and repetitive infrastructure work, letting data engineers move closer to business value. Instead of spending time on configuration and orchestration, engineers can focus on data logic, insights, and impact delivered through code and data products.

How can AI help me write code and build data pipelines?

AI acts as a coding companion: it generates and refactors code, scaffolds pipelines, provides natural-language interfaces to libraries (pandas, NumPy, scikit-learn), critiques prompts, debugs, and compares implementation options. This points toward unified, language-driven developer workflows.

How does AI assist data engineers, data scientists, and analysts differently?

- Data Engineers: automate pipeline steps, serve as a coding companion, impute/flag data issues, convert unstructured to structured data.
- Data Scientists: suggest/automate feature engineering, speed EDA, summarize trends, prototype and refine models.
- Data Analysts: translate plain English to SQL, automate summaries, streamline dashboards with auto-insights, flag anomalies and trends.

Who is this book for, and what prior knowledge helps?

It’s for data engineers, analysts, data scientists, and AI enthusiasts who want to go beyond simple prompts and IDE helpers to programmatic data ingestion, transformation, and enrichment at scale. Familiarity with SQL, Python, and basic AI concepts helps, but the book is practical and hands-on.

What kinds of AI applications will I learn—beyond chatbots?

You’ll build programmatic workflows for ingestion, transformation, and enrichment. Applications include data cleansing and transformation, structured extraction from unstructured sources, synthetic data generation, NLP tasks (entity recognition, sentiment), and data governance (anomaly detection, policy enforcement).

How is the book structured and how should I use it?

It follows the Month of Lunches format: about 40 minutes of reading plus 20 minutes of practice per chapter. Early chapters cover coding companions and prompt engineering; the middle focuses on transformations, feature extraction, and automation; advanced chapters explore structured extraction, agentic workflows, and programmatic AI.

What hands-on resources and setup files are included?

Nearly every chapter has a short lab. Each chapter also has a dedicated setup guide in the GitHub repo’s setup/ directory with prerequisites, installation steps, environment variables, API key management, datasets, and troubleshooting. Example: Chapter 1’s setup walks through cloning the repo, installing dependencies, configuring env vars, verifying your OpenAI API connection, and includes sample data and Jupyter notebooks.

What do I need to install before starting, and where are the instructions?

- PostgreSQL and pgAdmin: https://github.com/dave-melillo/data_eng_ai/blob/main/setup/postgres_setup.md
- Jupyter Lab: https://github.com/dave-melillo/data_eng_ai/blob/main/setup/jupyter_setup.md
- OpenAI API key: https://github.com/dave-melillo/data_eng_ai/blob/main/setup/openai_setup.md

Which LLMs does the book emphasize, and what are notable alternatives?

The book primarily uses OpenAI’s GPT models due to their strong tooling and fit for data engineering. Alternatives include Anthropic Claude (safety, long context), Google Gemini via Vertex AI (GCP integration, multimodal), Meta LLaMA (open-source), Mistral (lightweight/fast), xAI Grok (real-time/web), Cohere Command R (RAG), and AI21 Jurassic (structured outputs). Each has trade-offs in cost, openness, tooling, and production readiness.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$55.99 $27.99

you save $28.00 (50%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$55.99 $27.99

you save $28.00 (50%)

eBook

pdf, ePub, online

$55.99 $27.99

you save $28.00 (50%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more