table of content

1 The rise of AI agents

1.1 Defining agents and agentic thinking

1.1.1 Understanding agent, assistant, and LLM patterns

1.1.2 Thinking like agents: Sense-plan-act-learn

1.1.3 Agents act with tools

1.2 Introducing the Model Context Protocol

1.3 Understanding the five functional layers of an agent

1.3.1 The agent persona

1.3.2 Agent tools and actions

1.3.3 Agent reasoning and planning

1.3.4 Agent knowledge and memory

1.3.5 Agent evaluation and feedback

1.4 Advancing to multi-agent systems

1.4.1 The agent flow assembly line

1.4.2 Agent orchestrations (hub-and-spoke)

1.4.3 Agent collaboration (teams of agents)

1.5 Next steps

2 Core components: Large language models, prompting, and agents

2.1 Understanding large language models

2.1.1 LLMs: Probabilistic token machines

2.1.2 What is a token?

2.1.3 Tuning temperature, top-p, and more

2.2 Controlling LLMs with prompt engineering (agent persona)

2.2.1 Applying core prompt techniques

2.2.2 Thinking like an LLM

2.2.3 Avoiding common prompt pitfalls

2.3 Building an agent with OpenAI Agents

2.3.1 Building a minimal agent

2.3.2 Setting the agent model and other parameters

2.3.3 Controlling inputs and typed outputs

2.3.4 Tracing agents

2.4 Enhancing agents through tool integration

2.4.1 Providing agents with tools

2.4.2 Tracing agentic tool use

2.5 Exercises

3 Actions with Model Context Protocol for AI agents

3.1 Understanding MCP fundamentals for agent development

3.1.1 The standardization problem MCP solves

3.1.2 MCP architecture: Clients, servers, and services

3.1.3 Core components: Tools, resources, and prompts

3.1.4 MCP deployment patterns for agents

3.1.5 MCP powers the functional agent layers

3.2 Getting started with MCP servers

3.2.1 Coding up an MCP server for Claude

3.2.2 Using the MCP inspector

3.2.3 Understanding MCP transport types

3.2.4 From desktop to agents: The key differences

3.3 Using MCP servers for agents

3.3.1 Using agents with local MCP servers over STDIO

3.3.2 Using local MCP servers over SSE with agents

3.3.3 Connecting to the standard MCP servers

3.4 Building MCP servers for agents

3.4.1 Converting tools to an MCP server

3.4.2 Consuming MCP servers locally or remotely

3.5 Exercises

4 Architecting and building multi-agent systems

4.1 Architecting multi-agent systems

4.1.1 Decision-making and control patterns

4.1.2 Communicating with shared memory, message passing, and MCP

4.1.3 Channeling multi-agent coordination strategies

4.2 Balancing agents with agentic flows

4.2.1 Transforming agents to agent flows

4.2.2 Building an agent-to-agent flow

4.2.3 Agency and decision-making in agent flows

4.3 Understanding handoffs in agent flows

4.3.1 Agent-to-agent flow with handoffs

4.3.2 Visualizing agent flows

4.3.3 Monitoring the handoff

4.4 Validating agent flows with guardrails

4.4.1 Implementing input and output guardrails

4.4.2 Using agents as guardrails

4.4.3 Adding guardrails for pass-off agent flows

4.5 Exercises

5 Agent reasoning and planning

5.1 Understanding LLM reasoning and planning

5.1.1 Chain-of-thought reasoning

5.1.2 Reasoning, acting, observing: The ReAct paradigm

5.1.3 Planning with LLMs

5.2 Instructing agents to reason and plan

5.2.1 Applying CoT to an agent

5.2.2 Implementing ReAct with agents

5.3 Advanced reasoning patterns with agents

5.3.1 Tree-of-thought

5.3.2 Reflexion

5.3.3 Selecting the right pattern for your agents

5.4 Utilizing the sequential thinking MCP server

5.4.1 Unchaining the sequential thinking server

5.4.2 Revisiting time travel problems with sequential thinking

5.4.3 Advanced reasoning with sequential thinking

5.5 Exercises

6 Working with memory and knowledge RAG for agents

6.1 Understanding retrieval in AI applications

6.1.1 The basics of RAG

6.1.2 Delving into semantic search and document indexing

6.1.3 Applying vector similarity search

6.2 Vector databases and similarity search

6.2.1 Demystifying document embeddings

6.2.2 Querying document embeddings from Chroma DB

6.3 Building practical RAG knowledge agents

6.3.1 Everything begins with search and relevance

6.3.2 Building a vector search RAG agent

6.3.3 Building a hybrid search RAG agent

6.4 Adding memory to agents with MCP

6.4.1 Understanding memory form and agent function

6.4.2 Attaching a graph database for memory using MCP

6.4.3 Creating hybrid memory systems with MCP

6.4.4 Semantic augmented memory and applications to semantic, episodic, and procedural memory

6.4.5 Uncluttering memory with compression and forgetting

6.5 Exercises

7 Building robust agents with evaluation and feedback

7.1 Introducing agent evaluation and feedback

7.2 Implementing test-driven agent development

7.2.1 Exploring TDAD in practice

7.2.2 Coding and testing the RAG agent

7.2.3 Refactoring the agent

7.2.4 Extending evaluation with an agent evaluator

7.3 Employing grounding, critic, and evaluation agents

7.3.1 Reviewing the grounding agent

7.3.2 Grounding the RAG agent

7.3.3 Implementing grounding agents as guardrails

7.3.4 Understanding the role of rubrics in evaluation

7.3.5 Building a rubric critic agent

7.4 Phoenix for evaluation and feedback

7.4.1 Connecting to Phoenix

7.4.2 Adding metadata and session tracking

7.4.3 Experimenting with evaluators

7.4.4 Providing feedback with annotations

7.5 Exercises

8 Deploying agents and agentic systems

8.1 Strategies for consuming agents

8.1.1 Embedding real-time voice agents into web applications

8.1.2 Hosting agents through an API

8.1.3 Consuming an agent web service in a web application

8.2 Dockerizing agent systems

8.2.1 Containerizing an agent microservice

8.2.2 Orchestrating agentic systems with Docker Compose

8.2.3 Externalizing local agent microservices

8.3 Considering advanced deployment strategies

8.3.1 Choosing a runtime: Edge, API, or event-driven

8.3.2 The three “wires” of communication

8.3.3 Practical multi-agent topologies that adapt well

8.3.4 State, memory, and idempotency

8.3.5 Release engineering for agents (prompts, tools, models)

8.3.6 Observability matters

8.3.7 Reliability patterns: Timeouts, fallbacks, and budgets

8.3.8 Cost control and model routing

8.4 Security, safety, and governance in production

8.4.1 A quick threat model for agentic systems

8.4.2 Identity and access for people, services, and agents

8.4.3 Secrets and configuration management

8.4.4 Tool safety: Sandboxing and egress control

8.4.5 Prompt-injection and data-exfiltration defenses

8.4.6 Safety and policy enforcement

8.5 Exercises

9 Understanding the agentic loop

9.1 Peeling back the three agentic loop layers

9.1.1 Layer 1: The inner loop (sense-plan-act-learn)

9.1.2 Layer 2: The task loop

9.1.3 Layer 3: The meta loop

9.2 Layer 2: Looping with a deep research agent

9.2.1 Creating the initial state and plan

9.2.2 Adding the tools

9.2.3 Understanding iteration body output

9.2.4 The termination gate

9.2.5 Coding the deep research loop

9.2.6 Synthesizing the final output

9.2.7 When to use an agentic loop

9.2.8 Building a repetitive task loop agent

9.3 Layer 3: Multi-agent orchestration loops

9.4 Building collaborative agentic loops

9.5 Exercises

10 Exploring the cognitive agent that thinks, monitors, and adapts

10.1 Understanding agent cognition and metacognition as engineering concepts

10.1.1 The five failure modes of capable-but-not-cognitive agents

10.1.2 From reasoning primitives to cognitive architecture

10.1.3 Defining cognition for agents

10.1.4 Defining metacognition for agents

10.1.5 Three theoretical foundations

10.2 Mapping the mind into a cognitive agent architecture

10.2.1 Architecture overview

10.2.2 The cognitive workspace

10.2.3 The perception module

10.2.4 The planning module

10.2.5 The execution module

10.2.6 The evaluation module

10.2.7 The attention module

10.2.8 The memory module and the MCP memory server

10.3 Building and running the cognitive agent

10.3.1 The cognitive loop

10.3.2 A complete cognitive agent with MCP

10.3.3 Walkthrough: Watching the cognitive cycle in action

10.3.4 Confidence-gated execution

10.3.5 Stagnation detection and strategy pivoting

10.3.6 Knowledge boundary awareness

10.3.7 Emergent behaviors

10.4 Measuring cognitive capability and looking ahead

10.4.1 Cognitive efficiency metrics

10.4.2 Before and after: Measuring the effect

10.4.3 The road to more general agents

10.5 Exercises

11 Tips for building agentic systems

11.1 Field-tested tips organized by the five agentic layers

11.1.1 The core layer: Persona

11.1.2 Tools and agent actions

11.1.3 Reasoning and planning

11.1.4 Knowledge and memory

11.1.5 Evaluation and feedback

11.2 Tips for building a customer support agent

11.3 Tips for building a RAG agent system

11.4 Tips for building a deep research agent system

Appendixes

Appendix A: Setting up the sample code repository

A.1 Cloning the repository

A.2 Creating a Python environment

A.3 Installing dependencies and configuring the environment

A.3.1 Path A: Using the VS Code debugger

A.3.2 Path B: Installing manually with pip

A.3.3 Configuring the OpenAI API key

A.4 Running the sample code

A.4.1 Running a sample

A.4.2 Troubleshooting common problems

A.4.3 Keeping your setup healthy

Appendix B: Node.js setup for local MCP servers

B.1 Installing Node.js

B.1.1 Installing Node.js on Windows

B.1.2 Installing Node.js on macOS

B.1.3 Installing Node.js on Linux or WSL

B.2 Verifying your Node and npx installation

B.2.1 Checking the installed versions

B.2.2 How npx finds and caches packages

B.3 Running an MCP server with npx

B.3.1 Anatomy of the npx command

B.3.2 Running the filesystem MCP server

B.3.3 Wiring the server into an MCP client

B.4 Troubleshooting and keeping Node healthy

B.4.1 Common issues

B.4.2 Clearing the npx cache

B.4.3 Updating Node

Overview

8 Deploying agents and agentic systems

The chapter explains how agent systems move from demos into real applications by choosing the right way to consume and deploy them. It compares embedding an agent directly in a client app, exposing it as an API-backed service, or using one agent as a tool for another through protocols and agent-to-agent communication. The main idea is to match the deployment style to the job: embedded agents work well for fast, interactive experiences, while service-based and tool-based approaches fit longer-running or more complex workflows.

It then shows how containerization and orchestration make agent systems easier to manage at scale. Using Docker, agents can be packaged as microservices, upgraded independently, and run locally or in more scalable environments. Docker Compose extends this by letting multiple agent services work together as a single stack, and tunneling tools can temporarily expose local systems for demos and testing. The chapter emphasizes that a simple browser agent can remain responsive while delegating heavier work, such as image generation, to backend services.

The final part focuses on production concerns that become essential once agents are deployed for real users. It recommends picking the simplest runtime that satisfies latency needs, using clear communication paths, keeping state and memory disciplined, and designing idempotent tools for caching and replay. It also stresses release engineering, observability, reliability patterns, cost control, and strict security practices such as least privilege, secret management, sandboxing tools, prompt-injection defenses, and external policy enforcement. Overall, the chapter treats agents as software systems that need the same operational rigor as any other production service.

shows three simple patterns for deploying and consuming agents. From embedded agents, a microservice API is accessible or used as a tool through other agents.

Connecting to a real-time model using a RealTime Agent object in a web browser. Allows for vocal interaction with the agent hosted in the browser.

connecting the real-time voice agent to the API image generation agent as a tool and then generating images.

There are several ways afrontend agent may consume containerized microservice agents as tools through an API or as MCP servers.

Docker Desktop interface for managing containers, allowing a user to start/stop containers, delete containers, and images.

shows a set of containers orchestrated through a Docker Compose file.

illustrates how external tunneling options can expose locally running agent services to external users. The Actor represents an external network user accessing an agent service. First the user browses to the tunneling service address and then routed to the a developers local machine.

a helpful decision flowchart for deciding agent deployments.

The practical front-door agent deployment pattern used for user-facing agents and applications

Summary

Agent consumption drives deployment: embed for ultra‑low latency UX, wrap as a synchronous API for request/response tasks, or run as event‑driven workers for long jobs and retries.
Realtime agents in the browser (WebRTC/WebSocket) deliver barge‑in speech, token streaming, and the most responsive experiences—keep tools simple or proxy them server‑side.
Microservices + containers cleanly separate concerns; agents make ideal microservices because they’re self‑contained and easy to scale, swap, and version.
Dockerizing agent APIs standardizes runtime and dependencies; Compose lets you stand up multi‑agent stacks (UI, worker agents, tool services) with one command.
External tunneling (e.g., localtunnel) turns local prototypes into shareable demos without full cloud deployment—useful for POCs and quick pilots.
Choose the “wire” by latency and fit: WebRTC/WebSockets for realtime, HTTP+SSE for streamed request/response, and message buses for decoupled background work.
Front‑door/orchestrator patterns route user intents to specialized worker agents; keep the front‑door light and push complexity into typed, well‑scoped workers.
State and idempotency matter: store short‑term chat state separately from long‑term knowledge, and make tool calls idempotent to enable caching, replay, and resilience.
Release engineering applies to agents: version prompts, tools, and models; promote with gates; pin exact model/tool versions for reproducibility and incident debugging.
Observability is non‑negotiable: trace from UI → gateway → agent → tools → model; track latency, cost, and success metrics; prefer structured logs with PII redaction.
Reliability patterns—timeouts, fallbacks, circuit breakers, and graceful degradation—keep systems useful even when tools or models misbehave.
Cost control comes from routing by intent, trimming context, and caching deterministic results—lower tokens often means lower latency, too.
Security, safety, and governance must be built‑in: threat‑model surfaces, enforce least privilege, manage secrets correctly, sandbox tools, and defend against prompt‑injection with schema‑first tool contracts and instruction hierarchies.
With deployment patterns, observability, and safety in place, agents graduate from demos to dependable, production‑ready systems.

FAQ

What are the main ways to consume and deploy agents in this chapter?

Agents are presented in three common deployment patterns: embedded directly in the application, hosted as a microservice behind an API, or consumed as a tool by other agents through MCP or A2A. The right choice depends on latency, separation of concerns, and how long-running the agent work is.

When is it a good idea to embed an agent inside a web application?

Embedding works best for simple, self-contained, real-time experiences, especially browser-based interactions like voice agents. It is less suitable for long-running tasks, multiple-agent systems, or cases where you want stronger separation between frontend and backend concerns.

Why would you host an agent as an API microservice instead of embedding it?

Hosting an agent behind an API is better when you want separation of concerns, reusable backend logic, or support for longer-running tasks such as image generation and RAG workflows. It also makes the agent easier to call from other apps or agents.

What is the advantage of connecting a browser-based agent to a backend agent as a tool?

This pattern lets the user keep interacting in real time while the browser agent delegates slow or complex work to a backend service. It combines a responsive user experience with the flexibility of long-running tool execution in the background.

Why are agents good candidates for containerization with Docker?

Agents are often isolated and self-contained, so they map well to containers. Docker makes them easier to package, deploy, scale, upgrade, and manage as microservices, especially when combined with orchestration tools like Docker Compose or Kubernetes.

What does Docker Compose add to an agentic system?

Docker Compose lets you define and run multiple containers together as one stack. That makes it easier to orchestrate a full multi-agent system locally, including a frontend agent, image agent, and other worker services, without manually starting each one.

When should you use external tunneling services like localtunnel or ngrok?

Use tunneling for quick demos, proofs of concept, testing, or debugging when you do not want to deploy to the cloud. It is convenient for temporary external access, but it is not ideal for robust production systems.

How should you choose between edge, API, and event-driven worker runtimes?

Choose the runtime based on latency and task style. Edge is best for low-latency conversational UX, API microservices are good for normal request/response workloads, and event-driven workers are best for long-running or bursty tasks that need retries or concurrency control.

What are the key production concerns for state, observability, and reliability?

Store short-term conversation state in fast storage like Redis or PostgreSQL, keep long-term knowledge in vector stores, and design tools to be idempotent when possible. Also add tracing, metrics, and logs, plus timeouts, fallbacks, circuit breakers, and graceful degradation so failures do not surface directly to users.

What security and governance practices are recommended for production agents?

Use least-privilege access, keep secrets out of browser code and container images, sandbox tools, restrict filesystem and network access, and treat all user input as untrusted. For high-risk actions, add content filtering, human-in-the-loop approvals, and policy enforcement outside the agent prompt itself.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $30.23

you save $17.76 (37%)

include audio $24.99 $15.74

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $30.23

you save $17.76 (37%)

include audio $24.99 $15.74

eBook

pdf, ePub, online

$47.99 $30.23

you save $17.76 (37%)

include audio $24.99 $15.74

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more