AI Agents in Action, Second Edition you own this product

Intelligent workflows with LLMs, MCP, A2A, and more

Micheal Lanham

MEAP began November 2025
Last updated March 2026
Publication in Summer 2026 (estimated)

ISBN 9781633434530
325 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Complex Chinese

catalog / Data Science / AI / AI Agents

resources: Source code Book forum Source code on Github

table of content

1 The rise of AI agents

1.1 Defining agents and agentic thinking

1.1.1 Understanding agent/assistant and LLM patterns

1.1.2 Thinking like agents

1.1.3 Agents act with tools

1.2 Introducing the Model Context Protocol (MCP)

1.3 Understanding the functional layers of an agent

1.3.1 The Agent Persona

1.3.2 Agent Actions & Tools

1.3.3 Agent Reasoning & Planning

1.3.4 Agent Knowledge & Memory

1.3.5 Agent Evaluation & Feedback

1.4 Advancing onto multi-agent systems

1.4.1 The agent-flow assembly line

1.4.2 Agent orchestrations (hub-and-spoke)

1.4.3 Agent collaboration (teams of agents)

1.5 Summary

2 Core components: Large Language Models, prompting, and agents

2.1 Understanding Large Language Models

2.1.1 LLMs: Probabilistic Token Machines

2.1.2 What is a token?

2.1.3 Tuning Temperature, Top P, and more

2.2 Controlling LLMs with prompt engineering (Agent Persona)

2.2.1 Applying core prompt techniques

2.2.2 Thinking like an LLM

2.2.3 Avoiding common prompt pitfalls

2.3 Building an agent with OpenAI Agents

2.3.1 Building a minimal agent

2.3.2 Setting the Agent Model and other parameters

2.3.3 Controlling inputs and typed outputs

2.3.4 Tracing agents

2.4 Enhancing agents through tool integration

2.4.1 Providing agents with tools

2.4.2 Tracing agentic tool use

2.5 Exercises

2.6 Summary

3 Actions with Model Context Protocol for AI agents

3.1 Understanding MCP fundamentals for agent development

3.1.1 The standardization problem MCP solves

3.1.2 MCP architecture: Clients, servers, and services

3.1.3 Core components: Tools, resources, and prompts

3.1.4 MCP deployment patterns for agents

3.1.5 MCP powers the functional agent layers

3.2 Getting started with MCP Servers

3.2.1 Coding up an MCP Server for Claude

3.2.2 Using the MCP inspector

3.2.3 Understanding MCP transport types

3.2.4 From desktop to agents: the key differences

3.3 Actioning MCP servers for Agents

3.3.1 Actioning local MCP servers over STDIO with agents

3.3.2 Actioning local MCP servers over SSE with agents

3.3.3 Connecting to the standard MCP servers

3.4 Building MCP servers for agents

3.4.1 Converting tools to an MCP server

3.4.2 Consuming MCP servers locally or remotely

3.5 Exercises

3.6 Summary

4 Architecting and building multi-agent systems

4.1 Architecting multi-agent systems

4.1.1 Decision-making for agent systems

4.1.2 Communicating with shared-memory, message-passing, and MCP

4.1.3 Channeling multi-agent coordination strategies

4.2 Balancing agents with agentic flows

4.2.1 Transforming agents to agent flows

4.2.2 Building an Agent-to-Agent flow

4.2.3 Agency and decision making in agent flows

4.3 Understanding handoffs in aAgent flows

4.3.1 Agent-to-agent flow with handoffs

4.3.2 Visualizing agent flows

4.3.3 Monitoring the handoff

4.4 Validating agent flows with guardrails

4.4.1 Implementing input and output guardrails

4.4.2 Using agents as guardrails

4.4.3 Adding guardrails to pass off agent flows

4.5 Exercises

4.6 Summary

5 Agent Reasoning and Planning

5.1 Understanding LLM Reasoning and Planning

5.1.1 Chain of Thought Reasoning

5.1.2 ReAct Paradigm (Reasoning + Acting + Observing)

5.1.3 Planning with LLMs

5.2 Instructing agents to reason and plan

5.2.1 Applying CoT to an Agent

5.2.2 Implementing ReAct with Agents

5.3 Advanced reasoning with agents

5.3.1 Tree of Thought

5.3.2 Reflexion

5.3.3 Selecting the right pattern for your agents

5.4 Utilizing the Sequential Thinking MCP Server

5.4.1 Unchaining the Sequential Thinking Server

5.4.2 Revisiting time travel problems with Sequential Thinking

5.4.3 Advanced reasoning with sequential thinking

5.5 Exercises

5.6 Summary

6 Working with memory and knowledge RAG for agents

6.1 Understanding retrieval in AI applications

6.1.1 The basics of retrieval augmented generation (RAG)

6.1.2 Delving into semantic search and document indexing

6.1.3 Applying vector similarity search

6.2 Vector databases and similarity search

6.2.1 Demystifying document embeddings

6.2.2 Querying document embeddings from Chroma

6.3 Building practical RAG knowledge agents

6.3.1 Everything begins with search and relevance

6.3.2 Building a vector search RAG agent

6.3.3 Building a hybrid search RAG agent

6.4 Adding memory to agents with MCP

6.4.1 Understanding memory form and agent function

6.4.2 Implementing a graph database for memory using MCP

6.4.3 Creating hybrid memory systems with MCP

6.4.4 Semantic augmented memory and applications to semantic, episodic, and procedural memory

6.4.5 Uncluttering memory with compression and forgetting

6.5 Exercises

6.6 Summary

7 Building robust agents with evaluation and feedback

7.1 Introducing agent evaluation and feedback

7.2 Implementing test-driven agent development

7.2.1 Exploring TDAD in practice

7.2.2 Coding and testing the RAG agent

7.2.3 Refactoring the agent

7.2.4 Extending evaluation with an agent evaluator

7.3 Employing grounding, critic, and evaluation agents

7.3.1 Reviewing the grounding agent

7.3.2 Grounding the RAG agent

7.3.3 Implementing grounding agents as guardrails

7.3.4 Understanding the role of rubrics in evaluation

7.3.5 Building a rubric critic agent

7.4 Phoenix for evaluation and feedback

7.4.1 Connecting to Phoenix

7.4.2 Adding metadata and session tracking

7.4.3 Experimenting with evaluators

7.4.4 Providing feedback with Annotations

7.5 Exercises

7.6 Summary

8 Deploying agents and agentic systems

8.1 Strategies for consuming agents

8.1.1 Embedding real-time voice agents into web applications

8.1.2 Hosting agents through an API

8.1.3 Consuming an agent web service in a web application

8.2 Dockerizing agent systems

8.2.1 Containerizing an agent microservice

8.2.2 Orchestrating agentic systems with Docker Compose

8.2.3 Externalizing local agent microservices

8.3 Considering advanced deployment strategies

8.3.1 Choosing a runtime: edge, API, or event-driven

8.3.2 The three “wires” of communication

8.3.3 Practical multi-agent topologies that adapt well

8.3.4 State, memory, and idempotency

8.3.5 Release engineering for agents (prompts, tools, models)

8.3.6 Observability matters

8.3.7 Reliability patterns: timeouts, fallbacks, and budgets

8.3.8 Cost control and model routing

8.4 Security, safety, and governance in production

8.4.1 A quick threat model for agentic systems

8.4.2 Identity and access—for people, services, and agents

8.4.3 Secrets and configuration management

8.4.4 Tool safety: sandboxing and egress control

8.4.5 Prompt-injection and data-exfiltration defenses

8.4.6 Safety and policy enforcement

8.5 Exercises

8.6 Summary

9 Engaging GPT Assistants

10 Exploring collaborative agent systems

11 Troubleshooting

Overview

2 Core components: Large Language Models, prompting, and agents

This chapter establishes the core building blocks of effective AI agents: large language models as the probabilistic “brain,” prompt engineering as the primary control surface, and an agents SDK to orchestrate roles, tools, and workflows. It frames agents as systems that turn raw language generation into directed, auditable action by combining sound LLM fundamentals with disciplined prompting and structured interfaces. Readers are guided from conceptual grounding to hands‑on construction, assembling a minimal agent and iteratively expanding it with determinism controls, typed I/O, tool use, and execution tracing.

The chapter first demystifies LLMs as token predictors, explaining tokenization, training/inference dynamics, and why input size and format affect cost and uncertainty. It shows how parameters such as temperature, top‑p, and penalties shape variability and length, while stressing that these knobs only influence behavior—true reliability comes from well‑crafted prompts. A practical prompt toolbox follows: define a clear persona, front‑load directives, use delimiters, be specific, add examples, encourage step‑by‑step reasoning, state positive rules, remove ambiguity, pick suitable models/settings, and iterate. It also warns against common pitfalls—overlong or contradictory instructions, micro‑prompt fragmentation, inconsistent delimiters, and over‑specification—encouraging structured, workflow‑oriented prompts that “think like an LLM,” and noting that agents, unlike single LLM calls, can persist through loops and decisions.

Building on this foundation, the chapter uses the OpenAI Agents SDK to implement a minimal research planner, then tunes model settings for consistency and cost control. It introduces strongly typed outputs (via data models) to reduce variability and prevent brittle handoffs between steps, advocating strict schemas over permissive parsing. Execution tracing is highlighted as a cornerstone for debugging and optimization. Finally, the chapter equips agents with tools through lightweight decorators, explains tool‑chaining patterns, and outlines guardrails: limit tool count to cap overhead and complexity, anticipate failures and retries, and grant only the authority you can safely accept. The result is a pragmatic recipe for assembling reliable, extensible agents: prompt‑driven, parameter‑aware, strictly typed, tool‑enabled, and thoroughly traced.

The training cycle and inference/generation process of a large language model. On the left is the training process of the model where documents are ingested to train the LLM in a first pass. On the right, a user enters a prompt which is first tokenized and fed into the model which then outputs probabilities it uses to sample and produce the next token.

A comparison of tokenization of regular text compared to JSON.

The various parameters that can be used to modify an LLMs output. The right side of the figure provides an expanded view of the models predicting and sampling the tokens to generate output. On the left we see the various parameters that can be used to alter the sampling of the next token within a model.

A complex workflow showing what a search researcher may perform. The workflow illustrates the tasks an agent may undertake, complete with decision points (circles) and the flow from one task to another. (The figure is overly complex just to demonstrate the workflow.)In the figure, the workflow illustrates how we want an LLM or agent to perform a series of tasks and how it considers the output to make decisions based on the task results. The workflow details are not critical, and the complexity is used to demonstrate how complex instruction prompts can be constructed by following good prompt engineering practices. (Don’t be alarmed if you can’t follow this figure; it is just a toy example of complexity to the extreme.)

Comparing how an agent workflow may run with and without typed outputs/inputs. At the top the agent does not use strict outputs and could respond in any fashion, increasing variability when passing output to the next agent. Conversly, at the bottom the agent provides strict outputs which reduces variability and improves clarify for agents or processes receiving the output.

An example of the OpenAI Traces screen for reviewing your agent execution. At top you can see the workflow/trace from the agent and below that the specific details about calls to the underlying LLM including the inputs, model, tokens, instructions and output from the call

The Traces page for the Deep Research Workflow. This step represents the Research Planner agent making a call to the LLM and receiving a JSON output of a plan (list of tasks) to perform to achieve a research goal.

The various patterns for an agent to consume tools. Tools may be internal code functions or external connections to MCP servers, hosted locally or remotely and connected through using MCP protocols.

OpenAI Traces page showing the various tool calls and LLM responses executed by the agent

Summary

Large Language Models are probabilistic token-predictors; understanding tokenization and probability drives effective cost, context, and quality control.
Text length ≠ token length—measure tokens (e.g., with tiktoken or the Agents SDK telemetry) to keep budgets and context windows in check.
Generation knobs such as temperature, top-p, max_tokens, and penalty terms let you trade off creativity, consistency, and expense for each agent role.
Carefully crafted prompts follow the basic rules of clear persona, front-loaded instructions, structured delimiters, few-shot examples, and chain-of-thought, steering LLMs toward reliable, on-spec output.
Well-structured prompts avoid common pitfalls (over-complexity, contradictions, ambiguous delimiters, or variable output) and make agents safer and cheaper.
The OpenAI Agents SDK turns a prompt into a runnable agent; you can pin specific models and parameter settings to match the agent’s task profile.
Typed input/output schemas (Pydantic) eliminate brittle string parsing and keep multi-agent workflows stable despite the LLM’s stochastic nature.
Built-in tracing in the OpenAI API exposes every LLM interaction and tool invocation, giving vital observability for debugging and optimization.
Granting agents tools—local functions now, MCP-hosted services later—provides true agency; limit the tool list to reduce token overhead and failure risk.
Tool chaining lets an agent sequence multiple tool calls autonomously; trace data reveals the decision path and highlights performance bottlenecks.
Combining prompt engineering, model tuning, typed schemas, tracing, and curated tool sets yields production-ready agents that can confidently plan, reason, and execute deep research tasks.

FAQ

What does it mean that LLMs are “probabilistic token machines”?

LLMs read input text as tokens (numeric IDs) and predict the next token based on learned probabilities. During training, the model minimizes error between predicted and expected tokens over billions of examples (often refined via RLHF). At inference, it repeatedly samples the next token until an end-of-sequence token is reached. The “intelligence” comes from the internal probability landscape learned during training, not from true understanding.

What is a token, and why does tokenization matter for cost and reliability?

A token is a word or word-piece the model uses as its atomic unit. Tokenization can make structured text like JSON much longer (in tokens) than the same content written as prose, so text length is not a reliable proxy for token count. More tokens usually increase cost (output tokens often cost more than input) and can raise uncertainty. Use tools like tiktoken or the OpenAI Agents SDK’s built-in accounting to measure tokens.

How do temperature, top-p, presence/frequency penalties, and max_tokens affect output?

Temperature scales token probabilities (higher = more random/creative; lower = more deterministic). Top-p limits sampling to a nucleus of likely tokens for tighter control. Presence and frequency penalties reduce repetition and encourage novelty. Max_tokens caps response length to control cost and verbosity. In practice, adjust a few knobs to match the task and let prompting do most of the steering.

Which generation settings should I usually change for agents?

Most agents only need temperature (often 0.0 for consistency, higher for creativity) and max_tokens (to prevent rambling and control cost). Leave other parameters at defaults unless you have a specific issue (e.g., repetition). Remember that settings influence output but do not guarantee it; even temperature 0 can produce slight variation.

Why is prompt engineering essential, not hype?

Good prompts make outputs more predictable, safe, and cost-efficient. Core techniques include: assign a clear role/persona, front‑load instructions and use delimiters, be specific about length/audience/objective, provide few‑shot examples, request step‑by‑step reasoning, emphasize positive instructions, eliminate ambiguity with numeric bounds, pick the right model/settings, and iterate. If you use typed outputs via the OpenAI Agents SDK, you often don’t need to hard-specify output formats in the prompt.

What common prompt pitfalls should I avoid?

Avoid overlong, multi-topic prompts (split into steps/roles), contradictory rules (“be concise” vs “explain in depth”), and too many micro-prompts that cause latency. Keep delimiters consistent, don’t overload with dozens of constraints in one shot, and remember temperature 0 reduces but does not eliminate variability. Use guardrails, retries, and structured outputs to stabilize pipelines.

How do I build a minimal agent with the OpenAI Agents SDK?

Define clear instructions, create an Agent with a name and those instructions, then run it with Runner.run_sync(input=...). Print result.final_output. Start simple (e.g., a plan with five concise steps) and iterate. Optionally set model and model_settings for consistency and length control.

Why and how should I use typed outputs (strict JSON) with agents?

Typed outputs reduce variability, prevent parsing errors, and allow safe handoffs between agents. Define a Pydantic BaseModel (or TypedDict structures) and pass output_type=YourModel to the Agent. If you hit strict JSON errors, fix the schema (e.g., prefer a list of TypedDict items over dict[int, str], and forbid extra fields) rather than disabling strict mode, which can hide bugs.

What is tracing, and how does it help in agent development?

Tracing (enabled by default with the OpenAI API) records each agent step, including LLM inputs/outputs, tokens, settings, and tool calls. View traces in the OpenAI Dashboard to debug, optimize, and audit workflows. Use a trace("Your Workflow Name") context manager to label and group runs for easier analysis.

How do I give agents tools, and how many should I register?

Expose functions as tools (e.g., with a decorator) and register them via tools=[...]. Be explicit in prompts about when and why to use each tool. Limit tools to what’s necessary (often 5–10) because each tool adds token overhead, complexity, and failure points—and grants decision power to the agent. Expect and analyze tool chaining in traces to verify correct sequencing and performance.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $33.59

you save $14.40 (30%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $33.59

you save $14.40 (30%)

eBook

pdf, ePub, online

$47.99 $33.59

you save $14.40 (30%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more