Overview

1 What is an AI agent?

This chapter sets the stage for building AI agents from first principles. It surveys today’s agent landscape—from personal assistants to customer-facing and specialized systems—and explains why understanding the inner workings matters more than picking a framework. The core idea is that modern agents are powered by large language models that act as the system’s decision-making center; the chapter introduces how this differs from simple LLM use, clarifies when to choose agents versus predefined workflows, and frames GAIA as a practical benchmark. It also spotlights context engineering as the discipline that most directly determines whether an agent succeeds or fails.

At the heart of an agent is a simple recipe: LLM + tools + loop. The model evaluates context, decides which tool to call (if any), observes results, and repeats until it deems the task complete. This autonomy distinguishes agents from traditional software (fixed paths) and from single LLM calls or chains (developer-directed flows). The chapter maps a spectrum from workflow patterns (single call, chain, router) to true agency (multi-step tool use and even tool creation), then provides a decision framework for when agents are warranted: tasks with unstructured or diverse inputs, unpredictable step counts, sufficient value to justify cost and latency, and manageable error risks. In practice, robust systems blend both worlds—using workflows for structure and predictability while embedding agents where flexibility and open-ended reasoning pay off.

To measure progress, the chapter adopts GAIA, a dataset of multi-step, web-and-calculation-heavy tasks that naturally require agentic behavior, enabling fast feedback and iterative improvement. It explains that most real-world failures arise less from model intelligence limits than from missing or noisy information in context, motivating a shift from prompt engineering to context engineering—curating what the model sees, when, and in what form. Because larger contexts can degrade performance, the chapter outlines five strategies to keep information focused and useful: generation (plans, reflections), retrieval (external knowledge), write (persistent memory and scratchpads), reduce (summarization and pruning), and isolate (separate workspaces or specialized agents). The book’s journey then builds a basic agent loop, adds tools, retrieval, memory, planning, and workspaces, extends to multi-agent setups, and concludes with monitoring and evaluation for real-world readiness.

Example of a language model’s generalization capability.
User requests flow through the research agent, which branches into multiple searches and synthesis.
The LLM Agent's decision loop is an iterative process of LLM decision-making and tool use.
Progression of agency levels in LLM applications.
LLMs can only produce accurate, high-quality responses when sufficient information is provided in the context.
Even with large context windows, longer inputs can degrade model performance(Source: https://research.trychroma.com/context-rot).
An overview of the journey through the book

Summary

  • AI agents span a wide spectrum, from personal assistants like ChatGPT and Claude to customer-facing agents and specialized tools like Claude Code and Cursor. All share a common foundation: LLMs as their decision-making core.
  • An LLM agent consists of three elements: the LLM (brain), tools (means of interacting with the external world), and a loop (iterative process until goal completion). The LLM decides which tool to use and when to stop.
  • Workflows are developer-defined execution flows where LLMs perform specific steps. Agents are LLM-directed flows where the model dynamically determines its own process. Production systems often combine both approaches.
  • Use agents when tasks require multiple unpredictable steps, provide sufficient value to justify costs, and allow for error detection. The GAIA benchmark provides ideal practice problems for agent development.
  • Context engineering is the discipline of providing the right information at the right time. Five strategies (Generation, Retrieval, Write, Reduce, Isolate) form the framework for building effective agents throughout this book.

FAQ

What is an AI agent in this book?An AI agent (used interchangeably with “LLM agent”) is a program that uses a Large Language Model as the decision-making core, calls external tools to act on the world, and runs in a loop until its goal is achieved. In short: LLM (brain) + tools (actions) + loop (autonomy to continue or stop based on context and goals).
Why build agents from scratch instead of using a framework?Agent development is largely debugging failures. To diagnose misinterpretations, bad tool outputs, or missing information, you need to understand the internals. Building each component yourself develops a transferable mental model that makes any agent system—framework-based or not—easier to reason about, fix, and improve.
How does the agent loop work, and why is it needed?The loop repeats: (1) the LLM evaluates context and decides whether a tool is needed, (2) the tool runs, (3) results are added back into context, and (4) the LLM decides to continue or stop. It’s necessary because the number and kind of steps can’t be known in advance for many real tasks.
How do LLM agents differ from plain LLMs and from traditional software?Traditional software follows predefined paths; a plain LLM generates text but can’t act or iterate autonomously. An LLM agent dynamically chooses actions, uses tools, and iterates until the goal is met—deciding both what to do next and when to stop based on evolving context.
What kinds of AI agents are common today?Three broad types: personal agents (general-purpose assistants that can search, analyze, code, and create), customer-facing agents (operate within business rules to support, transact, and comply), and specialized agents (e.g., coding or deep research) that handle domain-specific, often asynchronous, multi-step work. All are powered by LLMs.
How should I decide between a workflow and an agent?Workflows are developer-defined and predictable (single calls, chains, routers). Agents are LLM-directed, use tools in a multi-step loop, and may even create tools. In practice, combine them: embed agents as nodes inside workflows to balance flexibility with structure, control costs, and enable graceful fallbacks.
When does a task require an LLM at all?Use an LLM when the task involves unstructured data (text, images, audio) and high input diversity that’s hard to anticipate. If inputs are predictable and outputs are deterministic, traditional code or specialized models are usually cheaper, faster, and more reliable.
When does a task justify an agent instead of a single call or chain?Choose an agent when step count and path are hard to predict (complexity), when task value outweighs added cost and latency, and when error cost is manageable and detectable. Remember that agents increase API calls, latency, and error propagation compared to simpler workflows.
What is GAIA, and why does this book use it?GAIA (General AI Assistants) is a benchmark of multi-step questions that require search, retrieval, and calculation—natural fits for agents. The book uses GAIA for quick, objective feedback, to practice the observe–analyze–improve cycle, and to track how each technique (tools, memory, planning, etc.) measurably improves agents.
What is context engineering (vs. prompt engineering), and what strategies improve context quality?Context engineering designs all information the model sees—system/user prompts, history, tool outputs, and retrieved docs—at the right time and in the right form. Bigger context isn’t always better (context rot, “lost in the middle”). Five strategies help: - Generation: plans, reflections that structure context - Retrieval: web/DB/files/vector search to add needed info - Write: persist working memory, results, and code - Reduce: summarize, filter, or delete to focus attention - Isolate: separate tasks/tools or use multiple agents to keep contexts clean

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build an AI Agent (From Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build an AI Agent (From Scratch) ebook for free