AI Engineering in Practice you own this product

Richard Davies, Rafael Fischer

MEAP began May 2024
Last updated February 2026
Publication in Fall 2026 (estimated)

ISBN 9781633436305
225 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Korean, Russian

catalog / Data Science / AI

resources: Book forum

table of content

1 AI Engineering - The Blueprint

1.1 What is AI Engineering?

1.1.1 From Prompts to Production Systems

1.1.2 When You Need AI Engineering vs. Simple Prompts

1.2 Why AI Engineering Delivers Results

1.2.1 Customer Support at Scale

1.2.2 Document Intelligence in Legal Services

1.2.3 Workflow Automation in Operations

1.3 The Blueprint: How Production AI Systems Work

1.3.1 Complete System Architecture

1.3.2 Following the Transaction

1.3.3 The Five Engineering Layers

1.3.4 Diagnosing System Failures

1.4 What You’re Building Toward

1.5 Summary

2 Foundation Models: Language & Embedding

2.1 Introduction to Foundation Models

2.1.1 Why are they called Foundation Models?

2.1.2 Applications and Possibilities of Foundation Models

2.1.3 Key Characteristics

2.1.4 Types of Foundation Models

2.1.5 Why This Matters for Developers

2.2 Architecture of a Foundation Model

2.2.1 Pre Training

2.2.2 Post Training

2.2.3 Inference: How Models Generate Answers

2.2.4 Distribution of Foundation Models

2.3 Challenges and Trade-offs of Foundation Models

2.4 Embedding Models

2.4.1 What are embeddings?

2.4.2 How embeddings are used?

2.4.3 Practical Applications of Embeddings

2.4.4 Vector Storage and Search

2.4.5 Embedding Model Training

2.4.6 Embedding vs Language Models

2.4.7 Key Considerations

2.5 Conclusion

2.6 Summary

PART 1: FUNDAMENTALS OF PROMPT ENGINEERING

3 Prompt Design: Structural Elements

3.1 Instructions

3.1.1 Practical Example 1: Generating a Shakespearean Sonnet

3.1.2 Constraints

3.1.3 Practical Example 2: Creating a Recipe (with Constraints)

3.1.4 Hands-On Practice

3.2 Context

3.2.1 Practical Example 1: Generating a News Article

3.2.2 Practical Example 2: Generating Dialogue for a Science Fiction Novel

3.2.3 Hands-On Practice

3.3 Input Parameters

3.3.1 Practical Example 1: Sentiment Analysis

3.3.2 Practical Example 2: Personalized Product Recommendation

3.3.3 Hands-On Practice

3.4 Output Format

3.4.1 Output Indicator Pattern

3.4.2 Practical Example 1: Generating a Product Review

3.4.3 Template Pattern (Aka. Fill-in-the-Blanks Pattern)

3.4.4 Practical Example 2: Generating a Recipe

3.4.5 Hands-On Practice

3.5 Delimiters

3.5.1 The Delimiter Pattern

3.5.2 Practical Example 1: Generating Article Summaries

3.5.3 Practical Example 2: Answering Questions Based on a Document

3.5.4 Hands-On Practice

3.6 Combining Structural Elements

3.6.1 Practical Example 1: Generating Product Descriptions

3.6.2 Practical Example 2: Writing News Article Summaries

3.6.3 Hands-On Practice

3.7 Summary

4 Prompt Design: Linguistic Elements

4.1 Precision

4.1.1 Practical Example 1: Social Media Content Creation

4.1.2 Practical Example 2: Visual Content Generation

4.1.3 Hands-On Practice

4.2 Directness

4.2.1 Practical Example 1: Job Application Follow-Up Email

4.2.2 Practical Example 2: Business Logo Design

4.2.3 Hands-On Practice

4.3 Brevity

4.3.1 Practical Example 1: Customer Feedback Analysis

4.3.2 Practical Example 2: Visual Content Generation

4.3.3 Hands-On Practice

4.4 Additional Linguistic Considerations

4.4.1 Positive versus Negative Framing

4.4.2 Tone and Register Calibration

4.4.3 Visual Content Framing

4.4.4 Question versus Statement Formatting

4.5 Summary

5 Prompt Patterns

5.1 Role Assignment Pattern

5.1.1 Practical Example: Email Subject Line Optimization

5.1.2 Hands-On Practice

5.2 Delimiter Pattern

5.2.1 Practical Example: Brand Positioning Analysis

5.2.2 Hands-On Practice

5.3 Template Pattern

5.3.1 Practical Example: Competitive Analysis Reports

5.3.2 Hands-On Practice

5.4 Tail Generation Pattern

5.4.1 Practical Example: Customer Service Consistency

5.4.2 Hands-On Practice

5.5 Self-Reflection Pattern

5.5.1 Practical Example: Study Plan Development

5.5.2 Hands-On Practice

5.6 Inversion Pattern

5.6.1 Practical Example: Marketing Strategy Validation

5.6.2 Hands-On Practice

5.7 Refinement Pattern

5.7.1 Practical Example: Study Explanation Improvement

5.7.2 Hands-On Practice

5.8 Style Transfer Pattern

5.8.1 Practical Example: Technical Explanation Adaptation

5.8.2 Hands-On Practice

5.9 Comment Driven Generation

5.9.1 Practical Example: Research Paper Introduction Refinement

5.9.2 Hands-On Practice

5.10 Summary

6 Prompt Templates

6.1 Practical Example 1: Product Description Generator

6.1.1 Prompt Template Components

6.1.2 Prompt Template

6.1.3 Prompt (Variables Injected)

6.2 Practical Example 2: Personalized Workout Plan Generator

6.2.1 Prompt Template Components

6.2.2 Prompt Template

6.2.3 Prompt (Variables Injected)

6.3 Hands-On Practice

6.3.1 Exercise 1: Customer Support Email

6.3.2 Exercise 2: Market Research Analysis

6.3.3 Exercise 3: Technical Documentation Template

6.3.4 Exercise 4: Investment Analysis Template

6.3.5 Exercise 5: Content Marketing Strategy Template

6.3.6 Exercise 6: Learning Module Design Template

6.4 Summary

7 Prompt Types

7.1 Chat Model Architecture

7.1.1 Three Message Types

7.1.2 What You Design: System and User Prompts

7.2 System Prompts

7.2.1 Purpose and Principles

7.2.2 Quick Reference: Mapping Chapter 3 Elements to Chat Prompts

7.2.3 Practical Example: Linear Issue Assistant

7.2.4 Practical Example: Healthcare Patient Summary Assistant

7.2.5 Hands-On Practice

7.3 User Prompts

7.3.1 Types and Templates

7.3.2 Practical Example: Customer Testimonial to Case Study Assistant

7.3.3 Hands-On Practice

7.4 Prompt Decomposition Framework

7.4.1 The Core Question

7.4.2 Worked Example: Decomposing Step-by-Step

7.4.3 Quick Decisions

7.4.4 When to Use Architectural Separation

7.4.5 Common Mistakes

7.4.6 Hands-On Practice

7.5 Summary

8 Prompt Sampling

9 Advanced Prompt Patterns

10 Prompt Security & Guardrails

11 Prompt Management & Observability

PART 2: APPLIED AI ENGINEERING

12 Text Generation

13 Workflows I – Chaining (Part 1)

14 Workflows II – Routing (Part 2)

15 Retrieval Augmented Generation (RAG)

16 Agents

17 Evaluation

18 Optimization

19 Context Engineering

20 Vibe Coding

Overview

1 AI Engineering - The Blueprint

This chapter defines AI Engineering as the disciplined extension of software engineering to problems involving modern AI, clearly separating it from Prompt Engineering. Where prompt techniques help you communicate with models, AI Engineering delivers production reliability by adding architecture, validation, and operations. The narrative contrasts a chatbot failure that arose from weak engineering practices with large-scale successes that paired models with rigorous routing, grounding, guardrails, and monitoring. It also highlights the demo-to-production gap: prototypes that look good in isolation often collapse under real-world constraints like latency, cost, scale, security, and policy compliance.

The chapter presents a practical blueprint built from five cooperating layers. Prompt routing steers each request to the most appropriate (and cost-effective) model; Retrieval Augmented Generation grounds answers in authoritative sources via semantic search; structured Prompt Engineering standardizes behavior and output formats; agents handle multi-step, tool-using workflows; and operational infrastructure provides evaluation, monitoring, security, and lifecycle management. A step-by-step walkthrough of a customer support query shows how routing curbs spend, RAG reduces hallucinations, structured prompts enforce tone and format, automated validation checks compliance and citations, and confidence scoring triggers human escalation—producing fast, accurate, and auditable outcomes.

Beyond architecture, the chapter offers decision rules and diagnostics. Simple prompting is sufficient for low-stakes, human-reviewed tasks; AI Engineering is warranted when outputs integrate with systems, must be consistent, carry risk, need to scale economically, or face adversarial inputs. Failures map to missing layers: cost overruns indicate routing gaps, hallucinations point to absent grounding, inconsistency suggests prompt issues, policy violations reveal weak validation, multi-step breakdowns need better workflows or agents, and injection incidents call for stronger security. The chapter closes by outlining a learning path from robust prompting to end-to-end production systems, equipping you with a reusable mental model to build reliable, scalable, and cost-efficient AI applications.

The Demo-to-Production Gap

Production AI System Architecture

Summary

Ad-hoc prompting collapses at production scale - Air Canada's chatbot hallucinated policies costing $3.2M, while Klarna's engineered system handled 2.3M conversations monthly through systematic architecture, not better prompts.
The demo-to-production gap emerges at scale - single-case success fails when serving thousands daily, exposing edge cases, context limits, cost explosions, and security vulnerabilities invisible in testing.
Even simple tasks hide engineering complexity - product descriptions need parameterized templates, structured schemas, validation frameworks, and performance monitoring to sustain quality beyond initial demos.
Production reliability comes from layered defenses - routing cuts costs 60-80%, RAG eliminates hallucinations through verified grounding, validation catches errors like the $3,650 in unauthorized gift cards promised to 73 customers.
Behind successful interactions lies invisible infrastructure - Sarah's two-minute payment resolution required routing, knowledge retrieval, synthesis guardrails, validation, and confidence scoring that simpler approaches cannot provide.
This blueprint transforms isolated techniques into production systems - you'll build architectures that prevent Air Canada's disasters while achieving Klarna's scale, handling thousands of daily interactions with measurable reliability.

FAQ

What is AI Engineering?

AI Engineering is software engineering that incorporates modern AI (LLMs, embeddings, vector databases) to solve problems involving unstructured data such as text, images, or audio. It applies the same disciplines as traditional engineering—architecture, testing, error handling, monitoring—while extending them with AI-specific patterns like retrieval, routing, validation, and structured generation.

How is AI Engineering different from Prompt Engineering?

Prompt Engineering focuses on communicating effectively with language models. AI Engineering builds production systems around those prompts: routing requests to the right models, grounding outputs with retrieval, validating and monitoring quality, integrating with APIs and databases, and managing cost, latency, and reliability at scale.

When do I need AI Engineering instead of simple prompting?

Use AI Engineering when: - Outputs feed systems (databases, workflows, APIs) and need structure - Consistent quality is required across thousands of users - Failures have consequences (customer-facing, financial, legal/medical) - Cost matters at scale and must be optimized - Security risks exist (prompt injection, data exfiltration)

What are the five architectural layers in the blueprint?

- Prompt Routing: Sends each query to the most suitable, cost-effective resource - Retrieval Augmented Generation (RAG): Grounds responses in authoritative sources - Prompt Engineering: Structures instructions, context, and output formats - Autonomous Agents: Orchestrate multi-step, tool-using workflows - Operational Infrastructure: Evaluation, monitoring, security, and lifecycle management

What is the demo-to-production gap?

Prototype prompts often work on single examples but break at scale. Production requires architecture to: - Ensure consistent quality and tone - Manage context limits and edge cases - Control latency and throughput - Optimize cost per query - Add validation, testing, logging, and monitoring

How does prompt routing reduce cost and improve reliability?

Routing analyzes intent and complexity, sending simple requests to cheaper, faster models and complex ones to advanced models. Typical results: - 60–80% cost reduction by avoiding unnecessary premium-model calls - Lower latency for simple queries - More predictable budgets and capacity planning

How does Retrieval Augmented Generation (RAG) cut hallucinations?

RAG retrieves relevant, authoritative documents (policies, product data, manuals) and injects them into the prompt. Benefits: - Factual grounding with citations - Current, domain-specific information vs. relying on model training data - Lower error rates and fewer invented policies or details

What validation and monitoring do production AI systems need?

Examples include: - Policy compliance checks against source documents - Hallucination detection (LLM-as-a-judge or rule-based assertions) - Tone/brand voice verification - Citation existence and quote accuracy checks - Confidence scoring with human escalation below thresholds - Continuous evaluation for drift, latency, and cost

How do I scale a good prompt into a production pipeline?

Move from ad-hoc prompting to an engineered flow: - Parameterize templates and enforce structured outputs (e.g., JSON schemas) - Add validation (schema, range, cross-field rules) and retry logic - Integrate with databases/APIs for automated insertion and workflows - Use routing to manage costs and context limits - Monitor quality, latency, and spend continuously

What outcomes can this blueprint deliver in practice?

Organizations report: - Large cost savings via routing and automation - Faster response/resolution times with consistent quality - Reduced hallucinations through RAG and validation layers - Higher customer satisfaction and lower manual workload - Safe scaling with observability, guardrails, and human-in-the-loop where needed

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $31.19

you save $16.80 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $31.19

you save $16.80 (35%)

eBook

pdf, ePub, online

$47.99 $31.19

you save $16.80 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more