AI Agents in Action you own this product

Micheal Lanham

February 2025
ISBN 9781633436343
344 pages

Included with a Manning Online subscription

printed in black & white

available in Italian, Korean, Russian, Simplified Chinese

catalog / Data Science / AI / AI Agents

table of content

1 Introduction to agents and their world

1.1 Defining agents

1.2 Understanding the component systems of an agent

1.3 Examining the rise of the agent era: Why agents?

1.4 Peeling back the AI interface

1.5 Navigating the agent landscape

Summary

2 Harnessing the power of large language models

2.1 Mastering the OpenAI API

2.1.1 Connecting to the chat completions model

2.1.2 Understanding the request and response

2.2 Exploring open source LLMs with LM Studio

2.2.1 Installing and running LM Studio

2.2.2 Serving an LLM locally with LM Studio

2.3 Prompting LLMs with prompt engineering

2.3.1 Creating detailed queries

2.3.2 Adopting personas

2.3.3 Using delimiters

2.3.4 Specifying steps

2.3.5 Providing examples

2.3.6 Specifying output length

2.4 Choosing the optimal LLM for your specific needs

2.5 Exercises

Summary

3 Engaging GPT assistants

3.1 Exploring GPT assistants through ChatGPT

3.2 Building a GPT that can do data science

3.3 Customizing a GPT and adding custom actions

3.3.1 Creating an assistant to build an assistant

3.3.2 Connecting the custom action to an assistant

3.4 Extending an assistant’s knowledge using file uploads

3.4.1 Building the Calculus Made Easy GPT

3.4.2 Knowledge search and more with file uploads

3.5 Publishing your GPT

3.5.1 Expensive GPT assistants

3.5.2 Understanding the economics of GPTs

3.5.3 Releasing the GPT

3.6 Exercises

Summary

4 Exploring multi-agent systems

4.1 Introducing multi-agent systems with AutoGen Studio

4.1.1 Installing and using AutoGen Studio

4.1.2 Adding skills in AutoGen Studio

4.2 Exploring AutoGen

4.2.1 Installing and consuming AutoGen

4.2.2 Enhancing code output with agent critics

4.2.3 Understanding the AutoGen cache

4.3 Group chat with agents and AutoGen

4.4 Building an agent crew with CrewAI

4.4.1 Creating a jokester crew of CrewAI agents

4.4.2 Observing agents working with AgentOps

4.5 Revisiting coding agents with CrewAI

4.6 Exercises

Summary

5 Empowering agents with actions

5.1 Defining agent actions

5.2 Executing OpenAI functions

5.2.1 Adding functions to LLM API calls

5.2.2 Actioning function calls

5.3 Introducing Semantic Kernel

5.3.1 Getting started with SK semantic functions

5.3.2 Semantic functions and context variables

5.4 Synergizing semantic and native functions

5.4.1 Creating and registering a semantic skill/plugin

5.4.2 Applying native functions

5.4.3 Embedding native functions within semantic functions

5.5 Semantic Kernel as an interactive service agent

5.5.1 Building a semantic GPT interface

5.5.2 Testing semantic services

5.5.3 Interactive chat with the semantic service layer

5.6 Thinking semantically when writing semantic services

5.7 Exercises

Summary

6 Building autonomous assistants

6.1 Introducing behavior trees

6.1.1 Understanding behavior tree execution

6.1.2 Deciding on behavior trees

6.1.3 Running behavior trees with Python and py_trees

6.2 Exploring the GPT Assistants Playground

6.2.1 Installing and running the Playground

6.2.2 Using and building custom actions

6.2.3 Installing the assistants database

6.2.4 Getting an assistant to run code locally

6.2.5 Investigating the assistant process through logs

6.3 Introducing agentic behavior trees

6.3.1 Managing assistants with assistants

6.3.2 Building a coding challenge ABT

6.3.3 Conversational AI systems vs. other methods

6.3.4 Posting YouTube videos to X

6.3.5 Required X setup

6.4 Building conversational autonomous multi-agents

6.5 Building ABTs with back chaining

6.6 Exercises

Summary

7 Assembling and using an agent platform

7.1 Introducing Nexus, not just another agent platform

7.1.1 Running Nexus

7.1.2 Developing Nexus

7.2 Introducing Streamlit for chat application development

7.2.1 Building a Streamlit chat application

7.2.2 Creating a streaming chat application

7.3 Developing profiles and personas for agents

7.4 Powering the agent and understanding the agent engine

7.5 Giving an agent actions and tools

7.6 Exercises

Summary

8 Understanding agent memory and knowledge

8.1 Understanding retrieval in AI applications

8.2 The basics of retrieval augmented generation (RAG)

8.3 Delving into semantic search and document indexing

8.3.1 Applying vector similarity search

8.3.2 Vector databases and similarity search

8.3.3 Demystifying document embeddings

8.3.4 Querying document embeddings from Chroma

8.4 Constructing RAG with LangChain

8.4.1 Splitting and loading documents with LangChain

8.4.2 Splitting documents by token with LangChain

8.5 Applying RAG to building agent knowledge

8.6 Implementing memory in agentic systems

8.6.1 Consuming memory stores in Nexus

8.6.2 Semantic memory and applications to semantic, episodic, and procedural memory

8.7 Understanding memory and knowledge compression

8.8 Exercises

Summary

9 Mastering agent prompts with prompt flow

9.1 Why we need systematic prompt engineering

9.2 Understanding agent profiles and personas

9.3 Setting up your first prompt flow

9.3.1 Getting started

9.3.2 Creating profiles with Jinja2 templates

9.3.3 Deploying a prompt flow API

9.4 Evaluating profiles: Rubrics and grounding

9.5 Understanding rubrics and grounding

9.6 Grounding evaluation with an LLM profile

9.7 Comparing profiles: Getting the perfect profile

9.7.1 Parsing the LLM evaluation output

9.7.2 Running batch processing in prompt flow

9.7.3 Creating an evaluation flow for grounding

9.7.4 Exercises

Summary

10 Agent reasoning and evaluation

10.1 Understanding direct solution prompting

10.1.1 Question-and-answer prompting

10.1.2 Implementing few-shot prompting

10.1.3 Extracting generalities with zero-shot prompting

10.2 Reasoning in prompt engineering

10.2.1 Chain of thought prompting

10.2.2 Zero-shot CoT prompting

10.2.3 Step by step with prompt chaining

10.3 Employing evaluation for consistent solutions

10.3.1 Evaluating self-consistency prompting

10.3.2 Evaluating tree of thought prompting

10.4 Exercises

Summary

11 Agent planning and feedback

11.1 Planning: The essential tool for all agents/assistants

11.2 Understanding the sequential planning process

11.3 Building a sequential planner

11.4 Reviewing a stepwise planner: OpenAI Strawberry

11.5 Applying planning, reasoning, evaluation, and feedback to assistant and agentic systems

11.5.1 Application of assistant/agentic planning

11.5.2 Application of assistant/agentic reasoning

11.5.3 Application of evaluation to agentic systems

11.5.4 Application of feedback to agentic/assistant applications

11.6 Exercises

Summary

Appendices

Appendix A: Accessing OpenAI large language models

A.1 Accessing OpenAI accounts and keys

A.2 Azure OpenAI Studio, keys, and deployments

Appendix B: Python development environment

B.1 Downloading the source code

B.2 Installing Python

B.3 Installing VS Code

B.4 Installing VS Code Python extensions

B.5 Creating a new Python environment with VS Code

B.6 Using VS Code Dev Containers (Docker)

Overview

1 Introduction to agents and their world

This chapter introduces AI agents as software entities that act on a user’s behalf, bridging traditional notions from reinforcement learning with practical assistants woven into everyday applications. It outlines a spectrum of interactions with large language models, ranging from direct use to proxy assistants, tool-using agents that execute functions with user approval, and fully autonomous agents capable of planning and decision-making. It also motivates multi-agent systems in which specialized profiles collaborate—often through a coordinating controller—to divide work, provide mutual feedback, and reduce errors, highlighting why agents are becoming central to modern AI workflows.

The chapter decomposes agents into core components that can be mixed and matched for different goals. Profiles and personas (often defined by a system prompt) anchor an agent’s role, tone, and capabilities; actions and tool use translate intent into task execution, exploration, or communication; knowledge and memory structures surface the right context efficiently; reasoning and evaluation help agents think through problems and assess outputs; and planning and feedback mechanisms organize steps toward goals, with or without human oversight. Planning can follow a single path or explore multiple strategies, and external planners or other agents can orchestrate complex workflows. Memory can be unified or hybrid, spanning documents, databases, embeddings, and lightweight lists. These components benefit both non-autonomous and autonomous agents, enabling specialization, reliability, and scalable collaboration.

With the limits of prompt engineering exposed by real-world iteration, agent systems emerged to embed planning, evaluation, and repetition into the problem-solving loop, exemplifying why structured agent workflows outperform ad hoc prompting on complex tasks. Trust, guardrails, and clear goals remain essential, so many production tools prioritize supervised, non-autonomous designs while still delivering meaningful automation. At the same time, a broader software shift is underway: data and applications are increasingly exposed through natural language interfaces that agents can consume, enabling more intuitive, accurate, and integrated solutions. The result is a rapidly evolving landscape of frameworks and patterns, and this chapter equips readers with the concepts needed to navigate and build effective agent systems.

The differences between the LLM interactions from direct action to proxy agents, agents, and autonomous agents

In this example of a multi-agent system, the controller or agent proxy communicates directly with the user. Two agents—a coder and a tester—work in the background to create code and write unit tests to test the code.

The five main components of a single-agent system (image generated through DallE-3)

An in-depth look at how we will explore creating agent profiles

The aspects of agent actions we will explore in this book

Exploring the role and use of agent memory and knowledge

a The reasoning and evaluation component and details

b Exploring the role of agent planning and reasoning

The original design of the AutoGPT agent system

A vision of how agents will interact with software systems

Summary

An agent is an entity that acts or exerts power, produces an effect or serves as a means for achieving a result. An agent automates interaction with a large language model (LLM) in AI.
An assistant is synonymous with an agent. Both terms encompass tools like OpenAI’s GPT Assistants.
Autonomous agents can make independent decisions, and their distinction from non-autonomous agents is crucial.
The four main types of LLM interactions include direct user interaction, agent/assistant proxy, agent/assistant, and autonomous agent.
Multi-agent systems involve agent profiles working together, often controlled by a proxy, to accomplish complex tasks.
The main components of an agent include the profile/persona, actions, knowledge/memory, reasoning/evaluation, and planning/feedback.
Agent profiles and personas guide an agent’s tasks, responses, and other nuances, often including background and demographics.
Actions and tools for agents can be manually generated, recalled from memory, or follow predefined plans.
Agents use knowledge and memory structures to optimize context and minimize token usage, utilizing various formats from documents to embeddings.
Reasoning and evaluation systems enable agents to think through problems and assess solutions using prompting patterns like zero-shot, few-shot, and chain-of-thought.
Planning/feedback components organize tasks to achieve goals, using single-path or multipath reasoning and integrating feedback from the environment and humans.
The rise of AI agents has introduced a new software development paradigm, shifting from traditional to natural language-based AI interfaces.
Understanding the progression and interaction of these tools helps develop agent systems, whether single, multiple, or autonomous.

FAQ

What does this book mean by an “AI agent” (and “assistant”)?

An agent is an active system that acts on your behalf to achieve goals. In reinforcement learning it’s a decision-making learner; in software it’s an application that performs tasks for you. In this book, assistant and agent are used interchangeably, encompassing tools like GPT-based Assistants, whether or not they are fully autonomous.

How do direct LLM use, proxy agents, tool-using agents, and autonomous agents differ?

- Direct interaction: you talk to the LLM with no intermediary.
- Proxy agent: an assistant reformulates your request for a target model or task (for example, crafting better prompts for image generation).
- Agent with tools: the LLM can call functions/plugins when you approve, then summarizes results back to you.
- Autonomous agent: plans, chooses tools, executes, and makes decisions with minimal oversight; may ask for feedback at milestones but operates independently.

What is a multi-agent system and why use one?

A multi-agent system combines specialized agent “profiles” (personas) that collaborate. Benefits include parallel task execution, domain specialization, mutual feedback and evaluation to reduce errors, and flexibility to run autonomously or under human guidance (human-in-the-loop).

What are the main component categories of a single-agent system?

Five recurring categories:
- Profile and persona: the agent’s role, background, instructions, and communication style.
- Actions and tool use: how the agent carries out tasks and interacts with external systems.
- Knowledge and memory: what the agent knows and recalls to stay within context limits.
- Reasoning and evaluation: thinking through options and judging outputs.
- Planning and feedback: organizing steps toward goals, with or without human or environmental feedback.

What is an agent profile/persona and how is it created?

The profile (often a “system prompt”) anchors the agent’s identity and scope: role, goals, tone, constraints, and tools. It can be crafted by hand, refined with LLM assistance, or generated from data-driven methods (including evolutionary techniques), and may include background and demographic cues that shape responses.

How do agents take actions and use tools effectively?

Agents:
- Target different aims: task completion, exploration, or communication.
- Consider impact: on the environment, task outcome, and their internal state/memory.
- Generate actions: manually from instructions, by recalling prior steps, or by following/adjusting a plan. Tool calls are actions the agent chooses to execute in pursuit of the goal.

How do knowledge and memory help agents work within context limits?

Agents retrieve only the most relevant information to keep token usage low. Stores can be unified or hybrid, spanning formats like documents, databases, embeddings for semantic search, or simple lists. Effective retrieval and summarization let agents ground responses in pertinent facts without overloading context.

What roles do reasoning, evaluation, and planning play in agents?

- Reasoning and evaluation let agents think through alternatives and judge outputs before responding.
- Planning can be autonomous or feedback-driven, adapting to changes and human input.
- Single-path planning proceeds step by step; multipath explores several strategies and preserves effective ones. External planners (code or other agents) may orchestrate larger workflows.

Why are agents rising now, and how should teams approach adoption and trust?

Prompt engineering improved early LLM use but hit limits on complex goals. Systems like AutoGPT showed that planning, iteration, and repetition boost reliability on multifaceted tasks. Most production tools remain non-autonomous to build trust gradually. Start with scoped, tool-using agents, add guardrails and evaluation, gather feedback, and expand autonomy as confidence grows.

What is an “AI interface,” and how will it change software and data access?

An AI interface exposes data and application capabilities through natural language, not just UIs, APIs, or SQL. It enables agents to query, invoke functions, and coordinate with other systems semantically. As these interfaces spread, many applications will become more agent-ready, improving task accuracy and enabling more trustworthy, increasingly autonomous workflows (though not every use case requires this model).

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $31.19

you save $16.80 (35%)

include audio $24.99 $16.24

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $31.19

you save $16.80 (35%)

include audio $24.99 $16.24

eBook

pdf, ePub, online

$47.99 $31.19

you save $16.80 (35%)

include audio $24.99 $16.24

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more