Overview

1 What is an AI agent?

Recent advances in large language models and retrieval-augmented generation have set the stage for a surge of interest in AI agents—systems that use an LLM as a reasoning engine to autonomously decide and act toward a goal. An LLM agent differs from a plain LLM or traditional software by combining two core capabilities: tool use (search, code execution, database access, APIs) and dynamic decision-making. It operates in an iterative loop—reason about context, choose and execute a tool, observe results, and decide whether to continue or stop—allowing it to tackle tasks of unpredictable complexity and synthesize up-to-date information rather than relying solely on static training data.

Agency exists on a spectrum, from simple LLM calls and fixed chains to routers, tool use, multi-step control, and even tool creation via code generation. Deciding when to use an LLM—and when to elevate to an agent—requires weighing task characteristics and business trade-offs. Agents shine on context-sensitive, multi-step problems over unstructured data that require external tools and adaptive planning, but they introduce higher cost, latency, and the risk of compounding errors. Practical criteria include task complexity (unknown number of steps), the value of the outcome relative to cost, and the consequences and detectability of mistakes. Used judiciously, agents can automate research, analysis, and decision support for individuals and businesses alike.

The book adopts a from-scratch, code-first path to build intuition: start with a basic ReAct agent (reason–act–observe) that selects and invokes tools, then enhance it with knowledge tools and long-term memory via RAG, add planning to chart efficient action sequences, and incorporate reflection for self-correction and iterative improvement. It then scales to multi-agent systems that divide complex work among specialized roles and closes with essential practices for monitoring and evaluation—tracing decisions, auditing tool calls, measuring cost and latency, and using task-level and human-in-the-loop assessments. By understanding these foundations independent of frameworks, readers gain the insight to use any toolkit effectively and design robust, novel agents.

Example of a language model’s generalization capability.
User requests flow through the research agent, which branches into multiple searches and synthesis.
The LLM Agent's decision loop is an iterative process of LLM decision-making and tool use.
Progression of agency levels in LLM applications.
Progressive enhancement of agent architecture from basic LLM-tool integration (chapters 2-4) through advanced capabilities like memory, planning, and reflection (chapters 5-8) to multi-agent coordination (chapter 9).
Basic agent operational cycle demonstrating autonomous reasoning and tool use.
Agent with cognitive capabilities
Multi-agent roles for a software development task.

Summary

  • An LLM agent is a program that determines the next action and evaluates goal completion (i.e., termination conditions) based on the output of an LLM, in accordance with a given objective and context.
  • LLMs excel at generalization—handling diverse tasks without requiring task-specific retraining through capabilities such as few-shot learning, zero-shot learning, and reasoning with thinking tokens.
  • Agency levels range from simple code execution to fully autonomous systems, with increasing autonomy as the LLM gains more control over task execution, action selection, and option determination.
  • You should use an LLM agent for tasks that involve unstructured data analysis, require multiple unpredictable steps, have sufficient value to justify computational costs, and allow for error detection and correction.
  • Building agents from scratch provides a deep understanding of core principles, clear visibility into the context flow, improved debugging capabilities, and the flexibility to create custom solutions that go beyond framework limitations.

FAQ

What is a Large Language Model (LLM)?An LLM is a model trained on vast text corpora to predict the next token in a sequence. This simple objective yields powerful behaviors like few-shot and zero-shot generalization, enabling the model to follow instructions and adapt to new tasks. While many modern models are multimodal, this chapter focuses on text-based LLMs and their role as the “reasoning engine” in agents. Recent “reasoning” variants also plan or analyze before answering.
What is an LLM agent?An LLM agent uses an LLM as its decision-making core to autonomously choose actions and execute them to achieve a goal. It doesn’t just generate text; it interacts with tools, evaluates progress, and decides when to stop. In practice, it iteratively reasons about the context, picks a tool, observes results, and repeats until the objective is met.
How do LLM agents differ from plain LLMs and traditional software?Two key differences: tool use and dynamic decision-making. Unlike plain LLM calls, agents act in the world via tools (search, code execution, databases, APIs). Unlike traditional software with fixed flows, agents choose their next step based on context, taking as many or as few steps as needed to reach the goal.
What is the ReAct pattern and how does a basic agent operate?ReAct stands for Reason + Act. A basic agent loops through: (1) reason with the LLM to decide the next action, (2) execute the selected tool, (3) observe and feed results back to the LLM, and (4) terminate when the goal is satisfied. This simple loop underpins most agent frameworks and enables adaptive, stepwise problem solving.
What kinds of tools can an agent use, and why do they matter?Tools include web search, document reading, code execution, database queries, file operations, API calls, and communications (email/messaging). Tools extend the LLM beyond text prediction, giving access to fresh data, precise computation, and real-world actions—critical for practical tasks like research, analysis, or automation.
What are the levels of agency in LLM applications?Agency progresses from: traditional code, single LLM calls, chains (fixed sequences), routers (LLM chooses next step), tool use (LLM selects external functions), multi-step control (LLM decides to continue/stop), up to tool creation (the LLM writes new code/tools). As autonomy increases, flexibility grows—but so do complexity and risk.
When should I use an LLM agent (and when not)?First ask if an LLM is needed at all—LLMs shine with unstructured data and diverse inputs; deterministic tasks are better handled with regular code. Use an agent when steps are unpredictable, the task’s value justifies extra cost/latency, and errors are tolerable and detectable. Otherwise, a single LLM call or hard-coded logic is cheaper and faster.
How do planning, long-term memory, and reflection make agents smarter?Planning gives a roadmap of subtasks to reduce thrashing and inefficiency. Long-term memory (often via RAG) stores and retrieves relevant past knowledge or user preferences to improve accuracy and personalization. Reflection adds explicit self-review, enabling error analysis, strategy updates, and continuous improvement across attempts.
What are multi-agent systems and when are they useful?Multi-agent systems split a complex goal into specialized roles (e.g., PM, designer, engineer, QA) that collaborate. This reduces prompt overload, narrows tool choices per agent, and mirrors team workflows. They’re useful for large, heterogeneous tasks—but require orchestration and can introduce unpredictability if interactions are unconstrained.
How are agents monitored and evaluated in practice?Monitoring logs prompts/responses, tool calls, decisions, interactions, and state changes to support debugging, cost/latency analysis, and audits. Evaluation blends benchmarks with real-world testing: end-to-end outcomes, mid-stage checks for failure points, and human review for qualities like clarity, usefulness, and safety.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build an AI Agent (From Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build an AI Agent (From Scratch) ebook for free