Overview

1 What are LLM Agents and Multi-Agent Systems?

This chapter introduces LLM agents as orchestration systems that transform a language model’s intent into concrete actions. While LLMs are strong at expressing plans in text, they cannot execute them without an agentic wrapper that handles tool use, result handoffs, and iterative decision-making. The chapter explains core primitives—planning and tool-calling—and shows how agents run tasks through a processing loop that repeatedly plans, invokes tools, synthesizes results, and adapts. It surveys high-impact applications such as report generation, web and deep research, agentic RAG over organizational knowledge, coding assistance with code interpreters, and computer-use scenarios that resemble next-generation RPA, noting the importance of monitoring to mitigate hallucinations.

To be effective, an LLM agent relies on a capable backbone model that can plan sensible next actions and produce structured tool-call requests. The chapter covers enhancements and patterns that improve accuracy and efficiency: memory modules to recall prior trajectories and tool results; human-in-the-loop checkpoints to prevent cascading errors and approve critical steps; and standardized protocols that expand capabilities. Model Context Protocol enables seamless access to third-party tools and data resources, while Agent2Agent defines a common interface for agent-to-agent collaboration, making multi-agent systems feasible across diverse frameworks. The text also clarifies terminology relative to reinforcement learning and specialized Large Action Models, and highlights the value of capturing agent trajectories for debugging and evaluation.

Multi-agent systems coordinate specialized agents to tackle complex tasks by decomposition, combining domain-focused sub-results into a coherent outcome; they often outperform single agents when specialization matters. The chapter closes with a hands-on roadmap for building an agent framework from scratch: defining base interfaces for tools and LLMs, implementing an agent and its processing loop, adding MCP compatibility, integrating human-in-the-loop and memory, and finally enabling A2A-based multi-agent coordination. Along the way, readers use practical evaluation techniques and lightweight UML to reason about structure and interactions, gaining a deep, implementation-level understanding they can apply to existing frameworks or custom solutions.

The applications for LLM agents are many, including agentic RAG, report generation, deep search and computer use, all of which can benefit from MAS.

An LLM agent is comprised of a backbone LLM and its equipped tools.

LLM agents utilize the planning capability of backbone LLMs to formulate initial plans for tasks, as well as to adapt current plans based on the results of past steps or actions taken towards task completion.

An illustration of the tool-equipping process, where a textual description of the tool that contains the tool’s name, description and its parameters is provided to the LLM agent.

The tool-calling process, where any equipped tool can be used.

A mental model of an LLM agent performing a task through its processing loop, where tool calling and planning are used repeatedly. The task is executed through a series of sub-steps, a typical approach for performing tasks.

An LLM agent that has access to memory modules where it can store key information of task executions and load this back into its context for future tasks.

A mental model of the LLM agent processing loop that has memory modules for saving and loading important information obtained during task execution.

An LLM agent processing loop with access to human operators. The processing loop is effectively paused each time a human operator is required to provide input.

Multiple LLM agents collaborating to complete an overarching task. The outcomes of each LLM agent’s processing loop are combined to form the overall task result.

A first look at the llm-agents-from-scratch framework that we’ll build together.

A simple UML class diagram that shows two classes from the llm-agents-from-scratch framework. The BaseTool class lives in the base module, while the ToolCallResult lives in the data_structures module. The attributes and methods of both classes are indicated in their respective class diagrams and the relation between them is also described.

A UML sequence diagram that illustrates how the flow of a tool call. First, an LLM agent prepares a ToolCall object and invokes the BaseTool, which initiates the processing of the tool call. Once completed, the BaseTool class constructs a ToolCallResult which then gets sent back to the LLM agent.

The build plan for our llm-agents-from-scratch framework. We will build this framework in four stages. In the first stage, we’ll implement the interfaces for tools and LLMs, as well as our LLM agent class. In the second stage, we’ll make our LLM agent MCP compatible so that MCP tools can be equipped to the backbone LLM. In stage three, we will implement the human-in-the-loop pattern and add memory modules to our LLM agent. And, in the fourth and final stage, we’ll incorporate A2A and other multi-agent coordination logic into our framework to enable building MAS.

Summary

LLMs have become very powerful text generators that have been applied successfully to tasks like text summarization, question-answering, and text classification, but they have a critical limitation in that they cannot act; they can only express an intent to act (such as making a tool call) through text. That’s where LLM agents come in to bring in the ability to carry out the intended actions.
Applications for LLM agents are many, such as report generation, deep research, computer use and coding.
With MAS, individual LLM agents collaborate to collectively perform tasks.
Many applications for LLM agents can further benefit from MAS. In principle, MAS excel when complex tasks can be decomposed into smaller subtasks, where specialized LLM agents outperform general-purpose LLM agents.
LLM agents are systems comprised of an LLM and tools that can act autonomously to perform tasks.
LLM agents use a processing loop to execute tasks. Tool calling and planning capabilities are key components of that processing loop.
Protocols like MCP and A2A have helped to create a vibrant LLM agent ecosystem that is powering the growth of LLM agents and their applications. MCP is a protocol developed by Anthropic that has paved the way for LLM agents to use third-party provided tools.
A2A is a protocol developed by Google to standardize how agent-to-agent interactions are conducted in MAS.
Building an LLM agent requires infrastructure elements like interfaces for LLMs, tools, and tasks.
We’ll build LLM agents, MAS, and all the required infrastructure from scratch into a Python framework called llm-agents-from-scratch.

FAQ

What problem do LLM agents solve that raw LLMs cannot?

LLMs are great at describing what they would do, but they cannot act. LLM agents surround an LLM with orchestration that turns its intentions into real actions by choosing and invoking tools, handling results, and iterating toward a goal—so users don’t have to manually shuttle between the LLM and external tools.

What is an LLM agent?

An LLM agent is an autonomous system composed of a backbone LLM plus tools. It executes the tool-call requests and plans generated by the LLM to carry out tasks on a user’s behalf, then feeds results back to the LLM for synthesis and next steps.

How does tool-calling work in an LLM agent?

- Equip the LLM with tool specs (names, descriptions, parameters) in text.
- The LLM emits a structured tool-call request (often JSON) choosing a tool and parameters.
- The application executes the tool externally and returns results to the LLM.
- The LLM synthesizes results, updates its plan, and may issue more tool calls.

What is the processing loop and why is it central?

The processing loop runs a task as a sequence of sub-steps: plan the next action, call tools, observe results, and adapt. It continues until success or a stop condition (e.g., max steps). Capturing this “trajectory” (plans, calls, results) is invaluable for debugging, evaluation, and improvement.

Which backbone LLM capabilities are prerequisites?

- Planning: formulating initial and updated plans, course-correcting as new evidence arrives.
- Tool usage: generating valid tool-call requests and knowing when to invoke tools. Tool use is typically taught via supervised fine-tuning. “Reasoning” LLMs (with enhanced chain-of-thought) often make stronger planners.

Where are LLM agents and MAS useful today?

- Report generation (synthesize large corpora; watch for hallucinations, add monitoring).
- Web search and deep research (multi-step browse, synthesize, report).
- Agentic RAG (retrieve company knowledge to answer employee queries).
- Coding agents (sandboxed interpreters; multi-agent “team” development).
- Computer use (next-gen RPA: operating apps/OS to perform tasks like buying tickets).

When should I use a multi-agent system (MAS) instead of a single agent?

Use MAS when a complex task decomposes into focused subtasks where specialized agents outperform a generalist. Examples: a summarization agent plus a report-structuring agent; or separate front-end and back-end coding agents. MAS can collaborate across frameworks using A2A.

Which enhancements and design patterns improve agent effectiveness?

- Memory: store past trajectories, tool results, and outcomes; load relevant items into context to avoid repeated work and speed execution.
- Human-in-the-loop: get human review/approval on critical plans or results to prevent cascading errors; increases accuracy but also latency.

What protocols standardize tools and agent collaboration?

- Model Context Protocol (MCP, Anthropic): a fast-growing standard to equip agents with third-party tools and access other resources in a uniform way.
- Agent2Agent (A2A, Google): a standard for agent-to-agent communication, enabling heterogeneous agents (from different frameworks) to collaborate.

How do LLM agents compare to LAM and RL agents?

LLM agents rely on a general-purpose LLM that plans and issues tool calls in text; they’re not trained to optimize an environment-specific action policy. LAM agents use models trained to predict fine-grained actions in specialized domains (e.g., GUI control). RL agents are defined by states, actions, rewards, and learn policies to maximize cumulative reward—different objectives and training setup from LLM agents.

1.1.1 Report generation

1.1.2 Web search and deep search

1.1.3 Agentic RAG

1.1.4 Coding LLM agents

1.1.5 Computer use

1.1.6 Enhancing applications with MAS

1.2.1 Prerequisite LLM capabilities

1.8.1 Code

1.8.2 The basics of UML diagrams

2.1.1 Implementing ToolCall and ToolCallResult

2.1.2 Implementing BaseTool

2.1.3 The AsyncBaseTool

2.2.1 Implementing SimpleFunctionTool

2.2.2 The AsyncSimpleFunctionTool

3.1.1 Implementing CompleteResult, ChatMessage, and ChatRole

3.1.2 Implementing BaseLLM

3.2.1 Implementing OllamaLLM

3.2.2 Hailstone tool call with OllamaLLM

The applications for LLM agents are many, including agentic RAG, report generation, deep search and computer use, all of which can benefit from MAS.

An LLM agent is comprised of a backbone LLM and its equipped tools.

LLM agents utilize the planning capability of backbone LLMs to formulate initial plans for tasks, as well as to adapt current plans based on the results of past steps or actions taken towards task completion.

An illustration of the tool-equipping process, where a textual description of the tool that contains the tool’s name, description and its parameters is provided to the LLM agent.

The tool-calling process, where any equipped tool can be used.

A mental model of an LLM agent performing a task through its processing loop, where tool calling and planning are used repeatedly. The task is executed through a series of sub-steps, a typical approach for performing tasks.

An LLM agent that has access to memory modules where it can store key information of task executions and load this back into its context for future tasks.

A mental model of the LLM agent processing loop that has memory modules for saving and loading important information obtained during task execution.

An LLM agent processing loop with access to human operators. The processing loop is effectively paused each time a human operator is required to provide input.

Multiple LLM agents collaborating to complete an overarching task. The outcomes of each LLM agent’s processing loop are combined to form the overall task result.

A first look at the llm-agents-from-scratch framework that we’ll build together.

A UML sequence diagram that illustrates how the flow of a tool call. First, an LLM agent prepares a ToolCall object and invokes the BaseTool, which initiates the processing of the tool call. Once completed, the BaseTool class constructs a ToolCallResult which then gets sent back to the LLM agent.

pro $24.99 per month

lite $19.99 per month

team

pro

team

pro

team