Overview

1 What is an AI agent?

This opening chapter sets the stage for building practical AI agents by surveying today’s agent landscape and laying out the mental model you’ll use throughout the book. Instead of relying on frameworks that obscure inner workings, the text advocates building agents from scratch so you can diagnose failures, understand each moving part, and make principled design choices. It orients you to what agents are, when they add value, how to measure progress, and the core discipline—context engineering—that ultimately determines whether an agent succeeds or fails.

The chapter defines an LLM agent as LLM + tools + loop: the model serves as the decision-making core, selects and invokes tools to act in the world, and iterates until goals are met. It contrasts agents with developer-defined workflows, explaining that workflows optimize predictability while agents trade control for autonomy and adaptability. In practice, many systems combine both, embedding agentic nodes within structured pipelines. The chapter also provides criteria for deciding when to use agents: prefer simpler logic where possible; reach for LLMs when handling unstructured, diverse inputs; and use agents when step counts are unpredictable and the task’s value justifies higher cost, latency, and error propagation risks.

To evaluate real progress, the chapter introduces GAIA, a benchmark of multi-step, tool-using tasks that serve as an “agent gym” for iterative improvement. It then shifts from prompt engineering to context engineering—treating all information the model sees (prompts, history, retrieved data, tool outputs) as working memory to be curated deliberately. Many failures stem not from model limits but from missing or poorly organized context, and bigger context isn’t always better. The chapter outlines five strategies—Generation, Retrieval, Write, Reduce, and Isolate—that you will apply across subsequent chapters while incrementally adding tools, memory, planning, workspaces, multi-agent patterns, and evaluation, building a robust, debuggable agent from the ground up.

Example of a language model’s generalization capability.
User requests flow through the research agent, which branches into multiple searches and synthesis.
The LLM Agent's decision loop is an iterative process of LLM decision-making and tool use.
Progression of agency levels in LLM applications.
LLMs can only produce accurate, high-quality responses when sufficient information is provided in the context.
Even with large context windows, longer inputs can degrade model performance(Source: https://research.trychroma.com/context-rot).
An overview of the journey through the book

Summary

  • AI agents span a wide spectrum, from personal assistants like ChatGPT and Claude to customer-facing agents and specialized tools like Claude Code and Cursor. All share a common foundation: LLMs as their decision-making core.
  • An LLM agent consists of three elements: the LLM (brain), tools (means of interacting with the external world), and a loop (iterative process until goal completion). The LLM decides which tool to use and when to stop.
  • Workflows are developer-defined execution flows where LLMs perform specific steps. Agents are LLM-directed flows where the model dynamically determines its own process. Production systems often combine both approaches.
  • Use agents when tasks require multiple unpredictable steps, provide sufficient value to justify costs, and allow for error detection. The GAIA benchmark provides ideal practice problems for agent development.
  • Context engineering is the discipline of providing the right information at the right time. Five strategies (Generation, Retrieval, Write, Reduce, Isolate) form the framework for building effective agents throughout this book.

FAQ

What does this book mean by an “AI agent”?An AI (LLM) agent is a program that autonomously decides what to do next and when to stop based on its current context and goals. It uses a Large Language Model as the decision-making core, calls external tools to act in the world, and iterates in a loop until the task is complete.
Why build agents from scratch instead of using frameworks?Agent development is fundamentally about debugging failures. Frameworks speed up prototyping but can obscure what’s happening inside. Building from scratch helps you understand each component, diagnose issues (LLM misunderstanding, tool errors, missing info), and develop a mental model you can apply to any framework later.
What kinds of AI agents exist today?- Personal agents: general-purpose assistants that can search, read files, generate code or images, and adapt to your preferences.
- Customer-facing agents: operate on behalf of businesses, access policies/data, follow rules, and handle tasks like refunds and troubleshooting.
- Specialized agents: domain-focused (e.g., coding, deep research) that run longer, often asynchronously, and can integrate with enterprise data.
What is an LLM and why is it suited to be an agent’s “brain”?An LLM is trained to predict the next token over massive text corpora, which yields strong generalization across tasks (zero-/few-shot). Newer “reasoning” variants plan before answering. This flexible understanding and planning ability let LLMs choose actions, interpret results, and adapt across diverse problems.
What are the core components of an LLM agent?- LLM (brain): interprets context and decides next actions and when to stop.
- Tools: functions/APIs to search the web, run code, query databases, etc.
- Loop: the iterative structure that repeats assess → act → observe until goals are met.
How does the agent loop work?1) The LLM evaluates the current context and decides if a tool is needed.
2) The chosen tool executes (e.g., search, code, DB query).
3) Tool results are fed back into context to update understanding.
4) The LLM decides to continue or stop; if not done, it repeats the loop.
How is an agent different from a workflow, and when should I use each?- Workflows: developer-defined paths; predictable, easier to reason about, cheaper; LLMs handle specific steps (single calls, chains, routers).
- Agents: LLM-directed flow; choose tools/steps dynamically; handle unpredictable complexity but cost more and add latency.
In practice, combine both: use workflows for structure and embed agents where flexibility is most valuable.
What tasks actually require an LLM and an agent?- Use an LLM when dealing with unstructured data (text, images, audio) or highly varied inputs that resist fixed rules.
- Use an agent when tasks need multi-step reasoning, external tools, and an unknown number of steps.
Consider trade-offs: higher cost, more latency, and error propagation vs. the task’s value and ability to detect/correct mistakes.
What is GAIA and why is it used in this book?GAIA (General AI Assistants) is a benchmark of tasks that naturally require agents: multi-step reasoning, search, calculations, and synthesis. It offers clear answers for fast feedback, a suitable difficulty range for iterative improvement, and minimal domain knowledge requirements—making it ideal for measuring progress as you add agent capabilities.
What is context engineering (vs. prompt engineering), and how do I manage context effectively?Prompt engineering focuses on instructions; context engineering designs the full information the LLM sees (system/user prompts, history, tool results, retrieved docs). Bigger context isn’t always better due to effects like context rot/lost-in-the-middle. Five strategies help:
- Generation: let the LLM create plans/reflections that structure context.
- Retrieval: bring in external knowledge (search, DB, files, vectors).
- Write: persist important info to memory/workspace for reuse.
- Reduce: summarize/filter to keep only what matters.
- Isolate: separate tasks/tools or agents to keep contexts focused.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build an AI Agent (From Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build an AI Agent (From Scratch) ebook for free