Overview

1 What are LLM Agents and Multi-Agent Systems?

Large language models are excellent at articulating plans in natural language, but they cannot execute those plans on their own. LLM agents fill this gap by turning an LLM’s intentions into actions through coordinated tool use, enabling end-to-end task completion. These systems already power applications such as report generation, web and deep research, retrieval-augmented generation, software development support, and computer-use automation. When tasks grow complex, multi-agent systems combine specialized agents to collaborate on decomposed subtasks, often improving quality and efficiency over a single general-purpose agent.

Under the hood, an LLM agent pairs a backbone LLM with tools and relies on two prerequisite capabilities: planning and tool-calling. The agent runs a processing loop that repeatedly plans next steps, selects and invokes tools with structured requests, and synthesizes results before adapting the plan. Captured “trajectories” of plans, tool calls, and outcomes aid analysis and debugging. Effectiveness is further improved with patterns like memory (to reuse past results and speed execution) and human-in-the-loop checkpoints (to prevent cascading errors), while emerging protocols such as the Model Context Protocol standardize access to third-party tools and Agent2Agent enables inter-agent collaboration.

Multi-agent systems shine when tasks can be split into focused roles—for example, separate agents for retrieval, analysis, and report synthesis—though they introduce new coordination challenges and potential failure modes. This chapter also lays out a practical roadmap for building an agent framework from scratch: define base interfaces for tools and LLMs; implement an agent and its processing loop; add MCP compatibility for external tools; incorporate memory and human oversight; and finally integrate A2A and multi-agent coordination to build robust, collaborative systems. The book takes a hands-on approach so readers gain a working, end-to-end understanding of LLM agents and MAS.

The applications for LLM agents are many, including agentic RAG, report generation, deep search and computer use, all of which can benefit from MAS.
An LLM agent is comprised of a backbone LLM and its equipped tools.
LLM agents utilize the planning capability of backbone LLMs to formulate initial plans for tasks, as well as to adapt current plans based on the results of past steps or actions taken towards task completion.
An illustration of the tool-equipping process, where a textual description of the tool that contains the tool’s name, description and its parameters is provided to the LLM agent.
The tool-calling process, where any equipped tool can be used.
A mental model of an LLM agent performing a task through its processing loop, where tool calling and planning are used repeatedly. The task is executed through a series of sub-steps, a typical approach for performing tasks.
An LLM agent that has access to memory modules where it can store key information of task executions and load this back into its context for future tasks.
A mental model of the LLM agent processing loop that has memory modules for saving and loading important information obtained during task execution.
An LLM agent processing loop with access to human operators. The processing loop is effectively paused each time a human operator is required to provide input.
Multiple LLM agents collaborating to complete an overarching task. The outcomes of each LLM agent’s processing loop are combined to form the overall task result.
A first look at the llm-agents-from-scratch framework that we’ll build together.
A simple UML class diagram that shows two classes from the llm-agents-from-scratch framework. The BaseTool class lives in the base module, while the ToolCallResult lives in the data_structures module. The attributes and methods of both classes are indicated in their respective class diagrams and the relation between them is also described.
A UML sequence diagram that illustrates how the flow of a tool call. First, an LLM agent prepares a ToolCall object and invokes the BaseTool, which initiates the processing of the tool call. Once completed, the BaseTool class constructs a ToolCallResult which then gets sent back to the LLM agent.
The build plan for our llm-agents-from-scratch framework. We will build this framework in four stages. In the first stage, we’ll implement the interfaces for tools and LLMs, as well as our LLM agent class. In the second stage, we’ll make our LLM agent MCP compatible so that MCP tools can be equipped to the backbone LLM. In stage three, we will implement the human-in-the-loop pattern and add memory modules to our LLM agent. And, in the fourth and final stage, we’ll incorporate A2A and other multi-agent coordination logic into our framework to enable building MAS.

Summary

  • LLMs have become very powerful text generators that have been applied successfully to tasks like text summarization, question-answering, and text classification, but they have a critical limitation in that they cannot act; they can only express an intent to act (such as making a tool call) through text. That’s where LLM agents come in to bring in the ability to carry out the intended actions.
  • Applications for LLM agents are many, such as report generation, deep research, computer use and coding.
  • With MAS, individual LLM agents collaborate to collectively perform tasks.
  • Many applications for LLM agents can further benefit from MAS. In principle, MAS excel when complex tasks can be decomposed into smaller subtasks, where specialized LLM agents outperform general-purpose LLM agents.
  • LLM agents are systems comprised of an LLM and tools that can act autonomously to perform tasks.
  • LLM agents use a processing loop to execute tasks. Tool calling and planning capabilities are key components of that processing loop.
  • Protocols like MCP and A2A have helped to create a vibrant LLM agent ecosystem that is powering the growth of LLM agents and their applications. MCP is a protocol developed by Anthropic that has paved the way for LLM agents to use third-party provided tools.
  • A2A is a protocol developed by Google to standardize how agent-to-agent interactions are conducted in MAS.
  • Building an LLM agent requires infrastructure elements like interfaces for LLMs, tools, and tasks.
  • We’ll build LLM agents, MAS, and all the required infrastructure from scratch into a Python framework called llm-agents-from-scratch.

FAQ

What is an LLM agent, and why are LLMs alone insufficient?An LLM agent is an autonomous system, comprised of a backbone LLM and tools, that executes the tool-call requests and plans the LLM formulates to perform tasks on a user’s behalf. LLMs alone only generate text—they can express intent (e.g., “I will search the web…”) but cannot take actions. Agents provide the orchestration layer that turns those intentions into real actions and returns results to the LLM for synthesis.
How does tool-calling work in an LLM agent?The agent equips the LLM with a catalog of tools described in text (name, description, parameters). The LLM then emits a structured tool-call request (often JSON) selecting a tool and parameter values. The application executes the tool outside the LLM, and the results are fed back into the LLM’s context so it can plan next steps or produce a final answer. Many models learn tool usage via supervised fine-tuning on tool-call examples.
What planning capabilities does the backbone LLM need, and how do reasoning LLMs help?The backbone LLM must be able to: - Propose an initial plan for the task - Synthesize intermediate results - Adapt the plan across steps (course-correct, decide when to stop) Poor initial plans can cause failures or inefficiency. Reasoning LLMs (often trained to generate step-by-step “chain-of-thought”) tend to have stronger planning ability and can be good backbone candidates.
What is the processing loop of an LLM agent?The agent executes tasks as a sequence of sub-steps. At each step it: - Synthesizes progress so far - Produces the next intermediate plan - Optionally issues tool calls - Evaluates stopping conditions (e.g., task complete, max steps) Agents often record a trajectory (or rollout) of plans, tool calls, and results for debugging and improvement, even though this is typically not shown to end users.
What real-world applications are a good fit for LLM agents and MAS?Common uses include: - Report generation (collect, synthesize, structure; monitor for hallucinations) - Web and deep research (multi-step planning, browsing, synthesis) - Agentic RAG (retrieve from organization knowledge stores to answer queries) - Coding agents (with sandboxed code interpreters; team-like collaboration) - Computer use (controlling apps/OS to complete tasks; next-gen adaptive RPA) Multi-agent systems (MAS) can further enhance these by combining specialized agents.
When do multi-agent systems (MAS) outperform single agents?MAS excel when a complex task can be decomposed into focused subtasks where specialized agents outperform a generalist. Examples: one agent optimized for extracting from financial docs plus another for report composition; or a team of coding agents split across front-end and back-end. MAS aggregate the outcomes of each agent’s processing loop into an overall result.
What enhancements and design patterns improve LLM agent effectiveness?- Memory: Modules that store past trajectories, tool-call results, and final outputs for later retrieval, reducing repeated work and improving efficiency. - Human-in-the-loop: Let humans review/approve critical plans or final outputs to mitigate cascading errors. This improves accuracy but increases latency since the loop pauses for feedback.
What is the Model Context Protocol (MCP), and why is it important?MCP (by Anthropic) standardizes how agents access third-party tools and other resources. Its fast-growing ecosystem offers many MCP-compatible tools and data resources. MCP compatibility lets you rapidly equip your backbone LLM with powerful external capabilities using a common interface.
What is the Agent2Agent (A2A) protocol, and what does it enable?A2A (by Google) standardizes agent-to-agent communication. It enables agents built with different frameworks to collaborate, making it easier to assemble MAS that span tools and platforms. As agent workflows grow, interoperable agent communication becomes increasingly important.
How do LLM agents compare with RL agents and Large Action Model (LAM) agents?- LLM agents: Repurpose pretrained LLMs to generate plans and tool calls; they don’t learn an optimal task policy but rely on text-generation capabilities plus tooling. - RL agents: Defined by an environment, actions, and rewards; trained to learn an optimal policy that maximizes cumulative reward. - LAM agents: Use a backbone Large Action Model trained to predict action sequences in a specific domain (e.g., GUI actions). They are highly specialized, whereas LLM agents are more general-purpose.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Multi-Agent System (from Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Multi-Agent System (from Scratch) ebook for free