1 What is an AI agent?
Recent breakthroughs in large language models (LLMs) and retrieval-augmented generation have catalyzed a surge of interest in AI agents—systems that use an LLM as a reasoning engine to autonomously decide and act toward a goal. Unlike traditional software or a single LLM call, agents pair language understanding with tool use and dynamic decision-making, enabling them to research, compute, and interact with external systems. This power comes with trade-offs: multi-step reasoning raises cost and latency, and early mistakes can cascade, so choosing when to use an agent versus simpler logic or a single LLM call is a practical, business-critical decision.
An LLM agent operates in an iterative loop of reasoning, action, and observation, continually deciding what to do next, invoking tools, incorporating results, and determining when to stop. Agency exists on a spectrum—from simple LLM calls and fixed chains to routers, tool use, multi-step control, and even tool creation—each level granting more autonomy. The chapter outlines when LLMs are warranted (unstructured data, diverse inputs) and when full agents are justified (uncertain step counts, high task value, manageable error risk and detectability), illustrating with research and multi-step problem-solving scenarios that mirror real-world workflows.
The book takes a from-scratch path: begin with a basic ReAct agent that selects and uses tools; then layer in knowledge access via RAG for long-term memory, planning to map tasks into subtasks, and reflection for self-correction and continual improvement. It scales to multi-agent systems that divide complex work among specialized roles, and closes with monitoring and evaluation practices essential for debugging, performance, and reliability. Readers are guided through hands-on Python implementations, using common APIs and cost-aware techniques, to build intuition and confidence that transfer across frameworks and real-world deployments.
Example of a language model’s generalization capability.
User requests flow through the research agent, which branches into multiple searches and synthesis.
The LLM Agent's decision loop is an iterative process of LLM decision-making and tool use.
Progression of agency levels in LLM applications.
Progressive enhancement of agent architecture from basic LLM-tool integration (chapters 2-4) through advanced capabilities like memory, planning, and reflection (chapters 5-8) to multi-agent coordination (chapter 9).
Basic agent operational cycle demonstrating autonomous reasoning and tool use.
Agent with cognitive capabilities
Multi-agent roles for a software development task.
Summary
- An LLM agent is a program that determines the next action and evaluates goal completion (i.e., termination conditions) based on the output of an LLM, in accordance with a given objective and context.
- LLMs excel at generalization—handling diverse tasks without requiring task-specific retraining through capabilities such as few-shot learning, zero-shot learning, and reasoning with thinking tokens.
- Agency levels range from simple code execution to fully autonomous systems, with increasing autonomy as the LLM gains more control over task execution, action selection, and option determination.
- You should use an LLM agent for tasks that involve unstructured data analysis, require multiple unpredictable steps, have sufficient value to justify computational costs, and allow for error detection and correction.
- Building agents from scratch provides a deep understanding of core principles, clear visibility into the context flow, improved debugging capabilities, and the flexibility to create custom solutions that go beyond framework limitations.
FAQ
What is a Large Language Model (LLM)?
An LLM is a language model trained on vast text corpora to predict the next token in a sequence. Modern LLMs generalize across tasks (zero/few-shot), can reason before answering, and in this book we focus on text-based LLMs as the “reasoning engine” for agents.What is an LLM agent?
An LLM agent is a system that uses an LLM as its reasoning engine to autonomously decide actions and execute them to achieve goals. It determines what to do next from goals and context, takes actions, observes results, and decides when to stop.How is an LLM agent different from a plain LLM or traditional software?
- Plain LLMs generate text but don’t act. Agents can use external tools (APIs, code, search, DBs) to affect the world.- Traditional software follows fixed logic. Agents make dynamic, context-driven decisions, choosing tools and paths at runtime.
How does the agent’s decision loop work?
The process iterates: (1) the LLM evaluates context and decides if/which tool is needed, (2) the tool runs, (3) results are added back to context, (4) the LLM decides to continue or stop. This reason–act–observe loop scales from simple to complex tasks.What are the levels of “agency” in LLM applications?
Agency increases as the LLM controls more decisions: Code (fully scripted), LLM Call (single step), Chain (fixed multi-step), Router (LLM picks next step), Tool Use (LLM selects tools), Multi-step (LLM decides to continue/terminate), and Tool Creation (LLM writes new tools/code).When should I use an LLM instead of traditional code?
Use an LLM when tasks involve unstructured data (text, images, audio) or diverse, unpredictable inputs. If inputs/outputs are predictable and rules are stable, deterministic code or simpler ML is cheaper, faster, and more reliable.When do I need a full LLM agent instead of a single LLM call?
Choose an agent when tasks require multiple steps, dynamic branching, tool use, or unknown step counts. Check: (1) Task complexity, (2) Value vs. higher cost/latency, (3) Error cost and detectability. If a single prompt suffices (e.g., translate/summarize), an agent is overkill.What is a basic ReAct agent and what tools can it use?
A ReAct agent alternates Reasoning (LLM decides next action), Action (run a tool), and Observation (feed results back). Typical tools include web search, code execution, calculators, database queries, file and API operations, messaging/email, and more.How do memory, planning, and reflection make agents smarter?
- Memory: Short-term tracks the current run; long-term stores user/task knowledge via RAG so agents recall preferences and reuse successes.- Planning: The agent drafts a step-by-step plan to reach goals efficiently.
- Reflection: The agent reviews outcomes, diagnoses errors, adjusts strategy, and updates knowledge.
Build an AI Agent (From Scratch) ebook for free