Overview

1 What are LLM Agents and Multi-Agent Systems?

This chapter introduces why large language models (LLMs) need agentic systems and how multi-agent collaboration extends their utility. While LLMs can plan and describe actions, they cannot execute them; LLM agents supply the orchestration that turns intent into actions by coordinating tools, gathering results, and synthesizing answers. The text surveys real-world applications—report generation, web and deep research, agentic RAG, coding, and computer use—and motivates when multi-agent systems (MAS) outperform single agents by dividing complex work into focused subtasks. It also previews practical standards that enable richer ecosystems of capabilities and inter-agent cooperation, and sets the stage for building everything from scratch to gain a deep, implementation-level understanding.

At the core, an LLM agent couples a backbone LLM with tool-calling and planning. The agent runs a processing loop that iterates through sub-steps: plan next actions, call tools, integrate results, and adapt the plan until completion, producing a visible “trajectory” useful for debugging and evaluation. Capabilities expand with more and better tools, including third-party ones via Anthropic’s Model Context Protocol; collaboration across heterogeneous agents is standardized by Google’s Agent2Agent protocol. The chapter highlights design patterns that improve reliability and efficiency—memory modules to recall prior tool results and trajectories, and human-in-the-loop checkpoints to prevent cascading errors—alongside cautions about hallucinations and trade-offs between speed and oversight. It distinguishes LLM agents from reinforcement learning agents and specialized Large Action Model agents, clarifying objectives and training regimes.

The roadmap builds a full framework, llm-agents-from-scratch, in four stages. First come the foundations: interfaces for tools and LLMs, a custom LLMAgent with a processing loop, and an initial agent, followed by support for local open-source models. Next, the framework becomes MCP-ready and uses MCP-backed tools to implement a deep research agent. The third stage adds human-in-the-loop controls and memory for faster, more accurate task execution. Finally, the framework incorporates A2A and multi-agent coordination to construct MAS that combine specialized agents—for example, assembling retrieval and synthesis agents to produce financial reports—equipping readers to use existing ecosystems confidently or craft tailored solutions.

The applications for LLM agents are many, including agentic RAG, report generation, deep search and computer use, all of which can benefit from MAS.
An LLM agent is comprised of a backbone LLM and its equipped tools.
LLM agents utilize the planning capability of backbone LLMs to formulate initial plans for tasks, as well as to adapt current plans based on the results of past steps or actions taken towards task completion.
An illustration of the tool-equipping process, where a textual description of the tool that contains the tool’s name, description and its parameters is provided to the LLM agent.
The tool-calling process, where any equipped tool can be used.
A mental model of an LLM agent performing a task through its processing loop, where tool calling and planning are used repeatedly. The task is executed through a series of sub-steps, a typical approach for performing tasks.
An LLM agent that has access to memory modules where it can store key information of task executions and load this back into its context for future tasks.
A mental model of the LLM agent processing loop that has memory modules for saving and loading important information obtained during task execution.
An LLM agent processing loop with access to human operators. The processing loop is effectively paused each time a human operator is required to provide input.
Multiple LLM agents collaborating to complete an overarching task. The outcomes of each LLM agent’s processing loop are combined to form the overall task result.
A first look at the llm-agents-from-scratch framework that we’ll build together.
A simple UML class diagram that shows two classes from the llm-agents-from-scratch framework. The BaseTool class lives in the base module, while the ToolCallResult lives in the data_structures module. The attributes and methods of both classes are indicated in their respective class diagrams and the relation between them is also described.
A UML sequence diagram that illustrates how the flow of a tool call. First, an LLM agent prepares a ToolCall object and invokes the BaseTool, which initiates the processing of the tool call. Once completed, the BaseTool class constructs a ToolCallResult which then gets sent back to the LLM agent.
The build plan for our llm-agents-from-scratch framework. We will build this framework in four stages. In the first stage, we’ll implement the interfaces for tools and LLMs, as well as our LLM agent class. In the second stage, we’ll make our LLM agent MCP compatible so that MCP tools can be equipped to the backbone LLM. In stage three, we will implement the human-in-the-loop pattern and add memory modules to our LLM agent. And, in the fourth and final stage, we’ll incorporate A2A and other multi-agent coordination logic into our framework to enable building MAS.

Summary

  • LLMs have become very powerful text generators that have been applied successfully to tasks like text summarization, question-answering, and text classification, but they have a critical limitation in that they cannot act; they can only express an intent to act (such as making a tool call) through text. That’s where LLM agents come in to bring in the ability to carry out the intended actions.
  • Applications for LLM agents are many, such as report generation, deep research, computer use and coding.
  • With MAS, individual LLM agents collaborate to collectively perform tasks.
  • Many applications for LLM agents can further benefit from MAS. In principle, MAS excel when complex tasks can be decomposed into smaller subtasks, where specialized LLM agents outperform general-purpose LLM agents.
  • LLM agents are systems comprised of an LLM and tools that can act autonomously to perform tasks.
  • LLM agents use a processing loop to execute tasks. Tool calling and planning capabilities are key components of that processing loop.
  • Protocols like MCP and A2A have helped to create a vibrant LLM agent ecosystem that is powering the growth of LLM agents and their applications. MCP is a protocol developed by Anthropic that has paved the way for LLM agents to use third-party provided tools.
  • A2A is a protocol developed by Google to standardize how agent-to-agent interactions are conducted in MAS.
  • Building an LLM agent requires infrastructure elements like interfaces for LLMs, tools, and tasks.
  • We’ll build LLM agents, MAS, and all the required infrastructure from scratch into a Python framework called llm-agents-from-scratch.

FAQ

What is an LLM agent, and why aren’t plain LLMs enough?Plain LLMs only generate text; they can describe a plan but cannot execute actions. An LLM agent surrounds an LLM with orchestration, tools, and logic that turn the model’s intentions and tool-call requests into real actions and deliver results back to the user.
What capabilities must the backbone LLM have to power an agent effectively?An effective backbone LLM needs two key capabilities: planning (to outline and adapt steps toward a goal) and tool-calling (to select tools and provide parameters in a structured request, often JSON). These enable the agent to choose sensible next actions and iterate based on prior results.
How does tool-calling work in an LLM agent?- The agent equips the LLM with a catalog of tools described in text (name, purpose, parameters).
- The LLM emits a structured tool-call request (e.g., JSON) with tool name and arguments.
- The application executes the tool, captures results, and feeds them back to the LLM for synthesis and next steps.
What is the processing loop in an LLM agent?The processing loop is the repeated cycle where the agent: synthesizes progress, plans the next step, performs tool calls, and evaluates whether to continue or stop. This often produces a trajectory (or rollout) capturing plans, tool calls, and intermediate results—useful for debugging, evaluation, and improvement.
Where are LLM agents and MAS used in the real world?- Report generation and synthesis (with monitoring to mitigate hallucinations)
- Web search and deep research (multi-step browsing, reasoning, reporting)
- Agentic RAG (querying internal knowledge stores)
- Coding assistants and code-generation teams (with sandboxed interpreters)
- Computer use/RPA-like tasks (controlling apps/OS to complete workflows)
When should I use a multi-agent system (MAS) instead of a single agent?Use MAS when a complex task can be decomposed into specialized subtasks where focused agents outperform a generalist. MAS are especially helpful for parallel or staged workflows (e.g., domain-specific summarization plus structured report writing, or front-end and back-end coding agents collaborating).
What are MCP and A2A, and why do they matter?- MCP (Model Context Protocol) standardizes how agents access third-party tools and resources, enabling a vibrant ecosystem of plug-and-play capabilities.
- A2A (Agent2Agent) standardizes agent-to-agent communication so agents built on different frameworks can collaborate, enabling richer MAS workflows.
How does memory improve LLM agents?Memory modules let agents save useful artifacts (e.g., past trajectories, tool-call results) and load them into context for future tasks. This reduces redundant work, speeds up execution, and can improve accuracy by reusing relevant prior knowledge.
What is the human-in-the-loop pattern, and what are its trade-offs?Humans can review or approve plans, intervene at critical steps, and validate final outputs to prevent cascading errors. The trade-off is latency: the loop pauses for human input, so you must balance accuracy/risk tolerance with execution speed.
How do LLM agents differ from RL agents and LAM agents?- RL agents are trained to optimize rewards via policies interacting with environments; LLM agents repurpose pretrained LLMs to plan and call tools through text without learning an explicit optimal policy for each task.
- LAM agents use Large Action Models specialized for action prediction in narrow domains (e.g., GUI operations), while LLM agents are more general-purpose but rely on robust planning and tool-calling.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Multi-Agent System (from Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Multi-Agent System (from Scratch) ebook for free