Overview

1 What are LLM Agents and Multi-Agent Systems?

Large language models are excellent at expressing intent, but they cannot act without scaffolding. This chapter introduces LLM agents—systems that turn an LLM’s text-planned intentions into real actions by calling external tools—and shows how multiple agents can be combined into multi-agent systems for tougher problems. It surveys prominent applications such as report generation, web and deep research, retrieval-augmented generation, coding with sandboxed interpreters, and full computer-use automation. Standards like the Model Context Protocol expand an agent’s toolset, while Agent2Agent enables collaboration across heterogeneous agent frameworks. The overarching goal is to build a deep, working understanding by constructing agents and multi-agent systems from scratch.

At their core, LLM agents pair a backbone LLM with tools and rely on two prerequisite capabilities: planning and tool-calling. Tasks run inside a processing loop where the agent formulates or adapts a plan, invokes tools via structured requests, synthesizes results, and iterates until completion; the resulting “trajectory” is invaluable for inspection and debugging. The chapter details enhancements that improve reliability and efficiency: memory modules to save and reload past steps and tool outputs, and human-in-the-loop checkpoints to prevent cascading errors (with a latency trade-off). It also clarifies related ideas—reasoning-oriented prompting, specialized Large Action Models for narrow action domains, and how LLM agents differ from reinforcement learning agents trained to optimize explicit reward signals.

Multi-agent systems shine when complex tasks can be decomposed into focused subtasks handled by specialized agents whose results are combined, often outperforming a single generalist. Protocols anchor this ecosystem: MCP standardizes access to third-party tools and resources, and A2A provides agent-to-agent communication so agents across frameworks can collaborate. The chapter closes with a hands-on roadmap for building a custom framework in four stages: implement base Tool and LLM interfaces and an agent processing loop; add MCP compatibility (including building a server); incorporate memory and human-in-the-loop controls; and finally integrate A2A and coordination logic to construct full multi-agent solutions for real applications like deep research and automated report generation.

The applications for LLM agents are many, including agentic RAG, report generation, deep search and computer use, all of which can benefit from MAS.
An LLM agent is comprised of a backbone LLM and its equipped tools.
LLM agents utilize the planning capability of backbone LLMs to formulate initial plans for tasks, as well as to adapt current plans based on the results of past steps or actions taken towards task completion.
An illustration of the tool-equipping process, where a textual description of the tool that contains the tool’s name, description and its parameters is provided to the LLM agent.
The tool-calling process, where any equipped tool can be used.
A mental model of an LLM agent performing a task through its processing loop, where tool calling and planning are used repeatedly. The task is executed through a series of sub-steps, a typical approach for performing tasks.
An LLM agent that has access to memory modules where it can store key information of task executions and load this back into its context for future tasks.
A mental model of the LLM agent processing loop that has memory modules for saving and loading important information obtained during task execution.
An LLM agent processing loop with access to human operators. The processing loop is effectively paused each time a human operator is required to provide input.
Multiple LLM agents collaborating to complete an overarching task. The outcomes of each LLM agent’s processing loop are combined to form the overall task result.
A first look at the llm-agents-from-scratch framework that we’ll build together.
A simple UML class diagram that shows two classes from the llm-agents-from-scratch framework. The BaseTool class lives in the base module, while the ToolCallResult lives in the data_structures module. The attributes and methods of both classes are indicated in their respective class diagrams and the relation between them is also described.
A UML sequence diagram that illustrates how the flow of a tool call. First, an LLM agent prepares a ToolCall object and invokes the BaseTool, which initiates the processing of the tool call. Once completed, the BaseTool class constructs a ToolCallResult which then gets sent back to the LLM agent.
The build plan for our llm-agents-from-scratch framework. We will build this framework in four stages. In the first stage, we’ll implement the interfaces for tools and LLMs, as well as our LLM agent class. In the second stage, we’ll make our LLM agent MCP compatible so that MCP tools can be equipped to the backbone LLM. In stage three, we will implement the human-in-the-loop pattern and add memory modules to our LLM agent. And, in the fourth and final stage, we’ll incorporate A2A and other multi-agent coordination logic into our framework to enable building MAS.

Summary

  • LLMs have become very powerful text generators that have been applied successfully to tasks like text summarization, question-answering, and text classification, but they have a critical limitation in that they cannot act; they can only express an intent to act (such as making a tool call) through text. That’s where LLM agents come in to bring in the ability to carry out the intended actions.
  • Applications for LLM agents are many, such as report generation, deep research, computer use and coding.
  • With MAS, individual LLM agents collaborate to collectively perform tasks.
  • Many applications for LLM agents can further benefit from MAS. In principle, MAS excel when complex tasks can be decomposed into smaller subtasks, where specialized LLM agents outperform general-purpose LLM agents.
  • LLM agents are systems comprised of an LLM and tools that can act autonomously to perform tasks.
  • LLM agents use a processing loop to execute tasks. Tool calling and planning capabilities are key components of that processing loop.
  • Protocols like MCP and A2A have helped to create a vibrant LLM agent ecosystem that is powering the growth of LLM agents and their applications. MCP is a protocol developed by Anthropic that has paved the way for LLM agents to use third-party provided tools.
  • A2A is a protocol developed by Google to standardize how agent-to-agent interactions are conducted in MAS.
  • Building an LLM agent requires infrastructure elements like interfaces for LLMs, tools, and tasks.
  • We’ll build LLM agents, MAS, and all the required infrastructure from scratch into a Python framework called llm-agents-from-scratch.

FAQ

What is an LLM agent, and why aren’t LLMs alone sufficient?An LLM agent is an autonomous system built around a backbone LLM and a set of tools. While LLMs can articulate plans (“intent to act”), they only generate text and cannot execute actions. An LLM agent orchestrates those plans by invoking tools, executing the resulting actions, and feeding results back to the LLM to complete tasks on a user’s behalf.
How do LLM agents turn intentions into actions?They rely on tool-calling. The agent provides the LLM with descriptions of available tools and their parameters. The LLM then generates a structured tool-call request (often JSON) specifying which tool to use and with what inputs. The application executes the tool call externally and returns the results to the LLM for synthesis and the next decision.
What is the processing loop of an LLM agent?The processing loop executes a task through iterative sub-steps: (1) synthesize progress so far, (2) plan the next action(s), (3) perform tool calls if needed, (4) evaluate results, and (5) repeat or stop when done or when a stopping condition is met. The sequence of plans, tool calls, and results forms the agent’s “trajectory” (or rollout), which is valuable for debugging and improvement.
Which backbone LLM capabilities are prerequisites for effective agents?Two key capabilities are required: (1) Planning—the ability to propose initial and adaptive plans across steps; and (2) Tool calling—the ability to produce correctly formatted tool-call requests and know when to use them. Reasoning-oriented LLMs often exhibit stronger planning behaviors and can be strong backbones.
What is the Model Context Protocol (MCP), and why does it matter?MCP (by Anthropic) standardizes how agents access third-party tools and other resources. By adopting MCP, an agent can easily equip many community-built tools and data resources, dramatically expanding its capabilities without bespoke integrations.
What real-world applications benefit from LLM agents and MAS?Common uses include: (1) Report generation (collect, synthesize, structure insights; monitor for hallucinations), (2) Web search and deep research (multi-step browse–synthesize–report workflows), (3) Agentic RAG (retrieve from internal knowledge stores to ground responses), (4) Coding assistants and “vibe coding” with sandboxed code interpreters, and (5) Computer use (controlling apps/OS to perform tasks like ordering or ticket buying), often seen as next-gen RPA.
When do multi-agent systems (MAS) outperform single agents, and how do they coordinate?MAS excel when a complex task can be decomposed into specialized subtasks (e.g., separate agents for retrieval vs. synthesis, or frontend vs. backend coding). Agents collaborate by exchanging intermediate results; Google’s Agent2Agent (A2A) protocol standardizes agent-to-agent communication, even across different frameworks.
How do memory modules improve LLM agents?Memory lets agents save useful artifacts from prior runs—such as trajectories, sub-steps, and tool-call results—and load relevant pieces into context for new tasks. This can reduce redundant tool calls, cut latency and cost, and provide richer context for more accurate planning and synthesis.
What is the human-in-the-loop pattern, and what are its trade-offs?Humans can review or approve critical plans, validate intermediate steps, and accept or reject final outputs, reducing cascading errors and improving reliability. The trade-off is increased execution time because the loop pauses awaiting human input, so teams must balance accuracy needs with latency constraints.
How do LLM agents differ from LAM and RL agents?LLM agents orchestrate pre-trained LLMs plus tools to execute text-described plans; they are general-purpose. LAM agents rely on Large Action Models fine-tuned to predict domain-specific action sequences (e.g., GUI steps), making them highly specialized. RL agents are formulated around environments, actions, states, and rewards, and are trained to learn optimal policies—unlike LLM agents, which repurpose language modeling to decide next actions via text.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Multi-Agent System (from Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Multi-Agent System (from Scratch) ebook for free