Build an AI Agent (From Scratch) you own this product

Agents that reason, plan, and act autonomously

Jungjun Hur and Younghee Song

July 2026
ISBN 9781633434615
336 pages

Included with a Manning Online subscription

printed in black & white

available in Korean

catalog / Data Science / AI / AI Agents

print book available Jul 28, 2026

ePub + liveBook available Jul 28, 2026

resources: Source code Book forum Source code on GitHub Register your pBook for a free eBook

table of content

Part 1: Building your first LLM agent

1 What is an AI agent?

1.1 The age of AI agents

1.2 Understanding LLM agents

1.2.1 What is an LLM?

1.2.2 What is an LLM Agent?

1.3 Workflow versus agent

1.3.1 Workflow: Developer-defined flow

1.3.2 Agent: LLM-directed flow

1.3.3 Combining workflows and agents in practice

1.4 Tasks that require agents

1.4.1 Tasks that require an LLM

1.4.2 Conditions for using agents

1.4.3 GAIA: An agent gym

1.5 Context engineering

1.5.1 Why agents fail

1.5.2 From prompt engineering to context engineering

1.5.3 Bigger context is not always better

1.5.4 Five context engineering strategies

1.5.5 The journey of this book

1.6 Prerequisites for reading this book

1.7 Summary

2 The brain of AI agents: LLMs

2.1 Choosing LLMs for agents

2.1.1 Starting with closed LLMs

2.1.2 Expanding to open LLMs

2.1.3 Essential LLM capabilities for agents

2.2 LLM API basics for building agents

2.2.1 Setting up the development environment

2.2.2 Getting started with the OpenAI API

2.2.3 Unifying providers with LiteLLM

2.2.4 Conversation management: Handling stateless APIs

2.2.5 Structured output

2.2.6 Asynchronous calls

2.3 Enhancing agent intelligence: Prompt engineering

2.3.1 The role of system prompts

2.3.2 Guidelines for agent prompts

2.4 Experiencing LLM limitations with GAIA

2.4.1 Why GAIA? Setting goals for agent development

2.4.2 Experiment setup

2.4.3 Results and analysis: Why LLMs need tools

2.5 Summary

3 Enabling actions: Tool use

3.1 LLM tools

3.1.1 Why do we need LLM tools?

3.1.2 Types of LLM tools

3.2 How LLMs use tools

3.2.1 How tool calling works

3.2.2 How can an LLM choose tools

3.2.3 Guidelines for effective tool calling

3.3 Building tools and tool definitions for LLMs

3.3.1 Implementing a web search tool

3.3.2 Converting to tool definitions

3.3.3 End-to-end tool execution

3.3.4 The challenges of custom tools

3.4 MCP: Standardizing tools

3.4.1 The core of MCP: Server-client architecture

3.4.2 Hands-on: Running an MCP server

3.4.3 Understanding the MCP client

3.4.4 Hands-on: Implementing an MCP server

3.5 Summary

4 Implementing a basic ReAct agent

4.1 How ReAct agents work

4.1.1 The think-act cycle

4.1.2 From text parsing to tool calling

4.2 Agent architecture overview

4.2.1 The completed agent

4.2.2 Information flow: The core design

4.2.3 Components we need to build

4.3 ExecutionContext: The agent’s central storage

4.3.1 What happens during agent execution?

4.3.2 Implementing ExecutionContext

4.4 Tool abstraction

4.4.1 Why we need a unified tool interface

4.4.2 BaseTool: The foundation

4.4.3 FunctionTool: Wrapping functions

4.4.4 Integrating MCP Tools

4.5 LLM Communication layer

4.5.1 Why a communication layer?

4.5.2 LlmRequest: Selecting what to send

4.5.3 LlmResponse: Standardizing what we receive

4.5.4 LlmClient: The provider adapter

4.5.5 Putting it together

4.6 Implementing the agent

4.6.1 Agent class structure

4.6.2 The run() Method

4.6.3 The step() method

4.6.4 The think() and act() methods

4.7 Adding structured output

4.7.1 The approach: Tools as output formatters

4.7.2 Modifying the agent

4.7.3 Using structured output in practice

4.8 Testing with the GAIA benchmark

4.8.1 From LLM to agent

4.8.2 Results

4.9 Summary

Part 2: Developing advanced agent capabilities

5 Building knowledge bases with RAG

5.1 The problem of using internal data

5.1.1 The simple case: Single file

5.1.2 What if there are multiple files?

5.1.3 What if the data is large or extensive?

5.2 Types of search methods

5.2.1 Keyword search

5.2.2 Vector search

5.2.3 Graph search

5.2.4 Structure-based search

5.3 Practicing vector search

5.3.1 Embedding: Converting text to vectors

5.3.2 Chunking: Dividing long text into search units

5.3.3 Implementing vector search

5.3.4 Exercise: Finding relevant information from web search results

5.4 Structure-based search

5.4.1 Preparing the GAIA dataset

5.4.2 Implementing file system tools

5.4.3 Connecting tools to the agent

5.4.4 Solving GAIA zip file problems

5.5 Extending agents with callbacks

5.5.1 The need for agent extension

5.5.2 Implementing tool callbacks

5.5.3 Human in the loop: Tool execution approval

5.5.4 Compressing search results

5.6 Summary

6 Adding memory to your agent

6.1 The anatomy of agent memory

6.1.1 Limitations of the current memory architecture

6.1.2 Context engineering and memory

6.2 Managing context during execution

6.2.1 Separating storage from presentation

6.2.2 Sliding window strategy

6.2.3 Token counting

6.2.4 Compaction strategy

6.2.5 Summarization strategy

6.2.6 Hierarchical context management

6.3 Continuous execution: Session and state management

6.3.1 The session class

6.3.2 Managing sessions with SessionManager

6.3.3 Integrating sessions into the agent

6.3.4 Basic example: Multi-turn conversation

6.3.5 Data structures for tool confirmation

6.3.6 Extending tools for confirmation

6.3.7 Implementing pause and resume in the agent

6.3.8 Complete example: Human-in-the-loop workflow

6.4 Long-term memory: Accumulating knowledge across sessions

6.4.1 The structure of long-term memory

6.4.2 Information extraction: Structured output

6.4.3 Building a vector store with ChromaDB

6.4.4 Implementing TaskMemoryManager

6.4.5 Retrieving memories

6.5 Summary

7 Planning and reflection for complex tasks

7.1 Giving agents time to think

7.1.1 The limitations of ReAct

7.1.2 How human experts work

7.1.3 Why time to think matters

7.2 Planning: Setting direction

7.2.1 When is planning necessary?

7.2.2 Implementing the planning tool

7.2.3 Planning tool usage example

7.2.4 Extension directions

7.3 Reflection: Checking and correcting

7.3.1 When is reflection necessary?

7.3.2 Implementing the reflection tool

7.3.3 The real value of reflection: Failure recovery

7.3.4 Running an agent that uses reflection for research synthesis

7.4 Integrating planning and reflection

7.4.1 Failure modes and solutions

7.5 Summary

8 Empowering agents with code execution

8.1 Giving agents a computer

8.1.1 Limitations of predefined tool approaches

8.1.2 What code execution brings

8.1.3 The effectiveness of code-based actions

8.2 Connecting code environments

8.2.1 Why sandboxes are necessary

8.2.2 Introduction to E2B and basic usage

8.2.3 Connecting the sandbox to the agent

8.2.4 Testing the code execution agent

8.3 Porting tools to sandboxes

8.3.1 The need for tool portability

8.3.2 Extending FunctionTool

8.3.3 Implementing the Wikipedia Revision tool

8.3.4 Handling sandbox tools in the Agent

8.3.5 Practical testing

8.4 Workspace: Using file systems and CLI

8.4.1 The concept of Workspace

8.4.2 Implementing Workspace tools

8.4.3 Practical example: Analyzing an Excel file

8.5 Agent skills

8.5.1 The agent skills concept

8.5.2 Progressive tool information disclosure

8.5.3 Implementation

8.5.4 Practical example: PDF merge skill

8.5.5 Hierarchical tool structure

8.6 Summary

9 Orchestrating multi-agent systems

9.1 Why multi-agent?

9.2 Three collaboration patterns

9.3 Agent workflow: Defining order in code

9.3.1 Workflow design principles

9.3.2 Sequential: Step-by-step execution

9.3.3 Parallel: Concurrent execution

9.3.4 Loop: Iterative execution

9.3.5 Composing workflows

9.4 Agent as Tool: Calling agents as tools

9.4.1 Making agents into tools

9.4.2 Implementing AgentTool

9.4.3 Context isolation

9.4.4 Practical example: Researcher and coder collaboration

9.5 Agent transfer pattern

9.5.1 Differences between agent as tool and transfer

9.5.2 Transfer is also a tool

9.5.3 Agent tree

9.5.4 Implementing the transfer tool

9.5.5 Implementing agent transfer

9.6 A2A: Collaborating across networks

9.6.1 The core of A2A: Agent card and task request/response

9.6.2 Server: Exposing an agent to the network

9.6.3 Client: Calling remote agents

9.7 Summary

10 Evaluating agents

10.1 Observing an agent

10.1.1 Pillars of observability: Metrics, traces, and logs

10.1.2 Generating, collecting, and exporting telemetry: OpenTelemetry

10.2 Building datasets and establishing evaluation criteria

10.2.1 What should we evaluate?

10.2.2 Creating a dataset

10.2.3 Analyzing errors

10.2.4 Designing rubrics and metrics

10.3 Evaluating with LLM-as-a-Judge

10.3.1 Type of LLM-as-a-Judge

10.3.2 Building a rubric-based evaluation system

10.4 Operations: CI/CD and continuous improvement

10.4.1 Evaluation-gated deployment

10.4.2 Improving the test set and evaluator: Agent quality flywheel

10.5 Summary

Appendix

Appendix A: OpenAI API Key

Overview

7 Planning and reflection for complex tasks

This chapter explains why purely reactive agents struggle with complex tasks and how planning and reflection give agents “time to think.” A ReAct-style loop is adaptable because it repeatedly observes, chooses a tool, and acts, but it can lose direction, repeat work, use partial information, or fail to recover from tool errors. The chapter compares this with how human experts work: they first break a problem into parts, proceed step by step, pause to evaluate progress, and adjust when new information or failures appear.

Planning is presented as a way to set direction by decomposing a complex request into manageable tasks and recording that task list in the agent’s context. The chapter implements a simple planning tool that stores tasks with statuses such as pending, in progress, and completed, allowing the LLM to reference the plan as it decides what to do next. It emphasizes that planning should be used selectively: it is valuable for multi-step research or tasks requiring information from multiple sources, but wasteful for simple questions or obvious procedures. The implementation stays intentionally simple, relying on the LLM’s language abilities rather than heavy code logic.

Reflection is introduced as a complementary mechanism for checking and correcting progress during execution. A reflection tool records the agent’s assessment of what it has learned, whether it is on track, whether errors require a new approach, and whether a final answer is ready. Reflection is especially useful after meaningful steps, when tools fail, when combining conflicting information, or before giving a final response. Together, planning and reflection form a cycle: plan, act, reflect, and replan when needed. This cycle helps agents avoid common reactive failures by maintaining direction, verifying completeness, preserving progress, and recovering from mistakes.

The roles of planning and reflection in AI agents.

The Planning-Reflection cycle in agent execution.

Summary

Planning and reflection give agents "time to think." Instead of reacting moment by moment, agents plan before acting and check after acting. This grants metacognition: the ability to examine their own process.
Planning decomposes complex problems into clear, manageable units. When "1. Research Kipchoge's record, 2. Research Moon distance, 3. Calculate time" is recorded in the context, the LLM references this plan to maintain direction across multiple steps.
Reflection is valuable when problems arise, not when everything goes smoothly. When tools fail or results are unexpected, Reflection enables cause analysis and alternative strategies instead of repeating the same failures.
Planning and reflection form a complementary cycle. Planning provides direction, reflection checks the direction, and when necessary, it triggers re-planning. Neither works in isolation.
From a context engineering perspective, both are generation strategies. Planning adds "what to do next" to the context, while reflection adds "evaluation and direction so far." These texts influence the LLM's subsequent decisions.

FAQ

Why do reactive ReAct agents struggle with complex tasks?

ReAct agents decide what to do based only on the current observation, which makes them adaptable but short-sighted. On complex multi-step tasks, they can lose direction, repeat searches, fail to use information they already found, answer with partial information, or keep retrying failed tool calls until reaching max_steps.

What do planning and reflection add to an AI agent?

Planning and reflection give agents “time to think.” Planning happens before acting and decomposes a complex problem into manageable tasks. Reflection happens during or after execution and checks whether the agent is making progress, whether the approach is still valid, and whether the plan needs to change.

How is planning similar to how human experts work?

Human experts usually do not start complex work immediately. A researcher first breaks a question into subproblems, and an experienced developer often writes a specification or implementation plan before coding. Planning gives the agent a similar structure: it identifies what must be done, in what order, and how to know when each part is complete.

When should an agent use a planning tool?

An agent should use planning for complex questions that require multiple research steps or combining information from different sources. For example, calculating how long it would take Eliud Kipchoge to reach the Moon requires finding his marathon record, calculating his pace, finding the Moon’s distance, and then calculating the travel time.

Planning is usually unnecessary for simple single-search questions, such as “What’s the weather in Seoul today?”, or tasks with obvious procedures, such as translating a short text.

How does the chapter implement a simple planning tool?

The chapter implements a create_tasks tool that receives a list of tasks and returns them as formatted text. Each task has a content field and a status field. The statuses are pending, in_progress, and completed.

The returned task list is recorded in the agent’s context as a tool result, so the LLM can refer to it in later steps like a to-do list.

Why does the planning tool regenerate the entire task list instead of editing one task at a time?

Regenerating the whole task list keeps the implementation simple. Partial updates, such as “mark task 3 as completed,” require extra logic to track indices and synchronize state. For typical plans with 5–10 tasks, rewriting the full plan costs few extra tokens and avoids many state-management bugs.

What is reflection in an AI agent?

Reflection is the act of pausing during execution to evaluate progress. It helps the agent ask questions such as: “What have I learned so far?”, “Am I close to the original goal?”, “Is this approach working?”, and “Do I need to change direction?”

Reflection records an evaluation in the context, which then influences the agent’s next decision.

When should an agent use reflection?

Reflection is useful after completing a meaningful step, when a tool fails, when multiple pieces of information need to be synthesized, or before giving a final answer. These checkpoints help the agent avoid drifting, recover from errors, resolve conflicting information, and verify that all required information has been gathered.

Reflection should not be used after every single tool call, because that creates unnecessary overhead.

How is reflection different from summarization?

Summarization compresses context, usually when the context becomes too long. Its main purpose is reducing token usage. Reflection is different: it is selectively triggered when the agent needs to check direction, analyze an error, synthesize results, or verify readiness for a final answer. Its purpose is decision support, not compression.

How do planning and reflection work together?

Planning and reflection form a cycle. First, the agent creates a plan and executes tools according to that plan. Then it reflects on the results. If things are going well, it continues to the next task. If the current plan is no longer valid, reflection can set need_replan=True, encouraging the agent to call create_tasks again and revise the plan.

Planning mainly helps the agent look ahead, while reflection helps it look back and correct course.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $30.23

you save $17.76 (37%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $30.23

you save $17.76 (37%)

eBook

pdf, ePub, online

$47.99 $30.23

you save $17.76 (37%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more