Overview

1 AI Engineering - The Blueprint

AI Engineering turns promising demos into dependable products by extending classic software engineering with AI-specific practices. The chapter distinguishes prompt engineering—how to talk to models—from AI engineering—how to build systems around them with architecture, testing, validation, and monitoring. It contrasts failures that arise from ad-hoc prompting (like policy hallucinations in customer chatbots) with results achieved through disciplined engineering (as in large-scale assistants that match human-level satisfaction while optimizing cost and latency). The theme is reliability at scale: balancing quality, speed, cost, and throughput while handling real-world variability and risk.

The blueprint centers on five integrated layers that a production query traverses: prompt routing to choose the right resources and control cost; retrieval-augmented generation to ground answers in authoritative knowledge; structured prompting to standardize tone, format, and behavior; validation to enforce policy compliance and catch hallucinations; and operational infrastructure to observe, evaluate, and maintain the system. A worked example follows a customer payment issue through semantic retrieval, structured response synthesis with citations, automated checks (policy, grounding, tone, and citation integrity), confidence scoring, and escalation paths—illustrating how each layer prevents specific, costly failure modes while isolating faults for faster fixes.

The chapter also clarifies when to move beyond simple prompting: whenever outputs drive workflows or databases, quality must be consistent, mistakes carry consequences, costs must be managed at volume, or security threats are present. It provides a diagnostic lens that maps symptoms to missing layers—routing for runaway costs, RAG for hallucinations, prompt design for inconsistency, validation for policy breaches, workflow/agents for multi-step failures, and security hardening for adversarial inputs. Finally, it outlines a learning path: first mastering prompt engineering as the interface layer, then composing full systems with routing, RAG, agents, and production-grade ops—delivering reliable, scalable AI applications grounded in solid engineering.

The Demo-to-Production Gap
Production AI System Architecture

Summary

  • Ad-hoc prompting collapses at production scale - Air Canada's chatbot hallucinated policies costing $3.2M, while Klarna's engineered system handled 2.3M conversations monthly through systematic architecture, not better prompts.
  • The demo-to-production gap emerges at scale - single-case success fails when serving thousands daily, exposing edge cases, context limits, cost explosions, and security vulnerabilities invisible in testing.
  • Even simple tasks hide engineering complexity - product descriptions need parameterized templates, structured schemas, validation frameworks, and performance monitoring to sustain quality beyond initial demos.
  • Production reliability comes from layered defenses - routing cuts costs 60-80%, RAG eliminates hallucinations through verified grounding, validation catches errors like the $3,650 in unauthorized gift cards promised to 73 customers.
  • Behind successful interactions lies invisible infrastructure - Sarah's two-minute payment resolution required routing, knowledge retrieval, synthesis guardrails, validation, and confidence scoring that simpler approaches cannot provide.
  • This blueprint transforms isolated techniques into production systems - you'll build architectures that prevent Air Canada's disasters while achieving Klarna's scale, handling thousands of daily interactions with measurable reliability.

FAQ

What is AI Engineering?AI Engineering is software engineering that incorporates modern AI techniques (LLMs, embeddings, vector databases) to solve problems with unstructured data. It keeps core engineering discipline—scalable architecture, testing, error handling, monitoring—while extending it with AI-specific patterns like retrieval, validation, and routing.
How is AI Engineering different from Prompt Engineering?Prompt Engineering focuses on communicating effectively with models. AI Engineering builds production systems around those prompts: architectures for reliability, routing for cost, RAG for grounding, validation for quality control, and operational monitoring for scale and maintainability.
When should I move from simple prompting to full AI Engineering?Use simple prompting for personal productivity, drafts, and brainstorming where humans review outputs. Move to AI Engineering when: - Outputs integrate with systems (DBs, APIs, workflows) - Consistent quality is required at scale - Failures have consequences (customer, legal, financial) - Costs matter across many requests - Security threats exist (prompt injection, data leaks)
What are the five architectural layers in the blueprint?- Prompt Routing: match requests to the right model/path based on complexity and topic - RAG (Retrieval Augmented Generation): ground answers in authoritative documents via semantic search - Prompt Engineering: structured, reusable templates that control behavior and format - Autonomous Agents: multi-step, tool-using workflows for complex tasks - Operational Infrastructure: evaluation, monitoring, security, cost optimization, lifecycle management
What is the demo-to-production gap, and why do simple demos fail at scale?Demos hide constraints that surface at scale: inconsistent quality, context window limits, cost blowups, edge cases, and missing error handling. Production requires architecture—templates, parameterization, structured outputs, validation, monitoring, and routing—to keep quality and costs predictable across thousands of interactions.
How does prompt routing reduce costs without hurting quality?Routing sends simple queries to fast, inexpensive models and reserves advanced models for complex cases. Typical savings are 60–80%. In the chapter’s example, routing kept monthly costs near ~$2,580 versus ~$15,000 if every query hit premium models.
What is RAG and how does it reduce hallucinations?RAG retrieves relevant, current knowledge (e.g., policies, product docs) using semantic search in a vector database and injects it into the prompt. Grounding answers in authoritative sources—and requiring citations—dramatically reduces fabricated or outdated claims.
What validation and guardrails are needed before responses reach users?A robust validation layer can include: - Policy compliance checks against source documents - Hallucination detection (LLM-as-judge) requiring source-backed claims - Tone and brand voice verification - Citation existence and accuracy checks - Confidence scoring with auto-escalation to humans below a threshold
How do I diagnose common production failures using the blueprint?- Costs too high: routing inefficiency → improve model selection - Hallucinations: missing/weak RAG → ground in verified sources - Variable quality: prompt gaps → strengthen structure, formats, examples - Policy violations/risks: inadequate validation → add guardrails and checks - Multi-step/task failures: weak workflow design → use chaining/agents - Security issues: gaps in defenses → apply sanitization and privilege separation
What real results can AI Engineering deliver?Systematic architecture enables measurable gains: - Customer support: faster resolutions, higher CSAT, ~40–60% cost reduction via routing and automation - Legal analysis: time cut from ~40 hours to ~2 hours per contract with structured extraction, chaining, and human review - Operations automation: near real-time data entry with ~98% accuracy using schemas, validation, and API integration These outcomes come from combining routing, RAG, structured prompts, validation, and operational monitoring.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Engineering in Practice ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Engineering in Practice ebook for free