AI Agents and Applications you own this product

With LangChain, LangGraph, and MCP

Roberto Infante

MEAP began July 2024
Last updated August 2025
Publication in January 2026 (estimated)

ISBN 9781633436541
427 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Korean, Russian

catalog / Data Science / AI / AI Agents

table of content

Part 1: Getting started with LLMs

1 Introduction to AI Agents and Applications

1.1 Introducing LangChain

1.1.1 LangChain architecture

1.2 LangChain core object model

1.3 Building LLM applications and AI agents

1.3.1 LLM-based applications: summarization and Q&A engines

1.3.2 LLM-based chatbots

1.3.3 AI agents

1.4 Typical LLM use cases

1.5 How to adapt an LLM to your needs

1.5.1 Prompt engineering

1.5.2 Retrieval Augmented Generation (RAG)

1.5.3 Fine-tuning

1.6 Which LLMs to choose

1.7 What You’ll Learn from this Book

1.8 Recap on LLM terminology

1.9 Summary

2 Executing prompts programmatically

2.1 Running prompts programmatically

2.1.1 Setting up an OpenAI Jupyter Notebook environment

2.1.2 Minimal prompt execution

2.2 Running prompts with LangChain

2.3 Prompt templates

2.3.1 Implementing a prompt template with a Python function

2.3.2 Using LangChain’s PromptTemplate

2.4 Types of prompt

2.4.1 Text classification

2.4.2 Sentiment analysis

2.4.3 Text summarization

2.4.4 Composing text

2.4.5 Question answering

2.4.6 Reasoning

2.5 Reasoning in detail

2.5.1 One-shot learning

2.5.2 Two-shot learning

2.5.3 Providing steps

2.5.4 Few-shot learning

2.5.5 Implementing few-shot learning with LangChain

2.5.6 Chain of Thought (CoT)

2.6 Prompt structure

2.7 Summary

Part 2: Summarization

3 Summarizing text using LangChain

3.1 Summarizing a document bigger than context window

3.1.1 Chunking the text into Document objects

3.1.2 Split

3.1.3 Map

3.1.4 Reduce

3.1.5 Map-reduce combined chain

3.1.6 Map-reduce execution

3.2 Summarizing across documents

3.2.1 Creating a list of Document objects

3.2.2 Wikipedia content

3.2.3 File based content

3.2.4 Creating the Document list

3.2.5 Progressively refining the final summary

3.3 Summarization flowchart

3.4 Summary

4 Building a research summarization engine

4.1 Overview of a research summarization engine

4.2 Setting up the project

4.3 Implementing the core functionality

4.3.1 Implementing web searching

4.3.2 Implementing web scraping

4.3.3 Instantiating the LLM client

4.3.4 JSON to Python object converter

4.4 Enhancing the architecture with query rewriting

4.5 Prompt engineering

4.5.1 Crafting Web Search Prompts

4.5.2 Crafting Summarization Prompts

4.5.3 Research Report prompt

4.6 Initial implementation

4.6.1 Importing Functions and Prompt Templates

4.6.2 Setting constants and input variables

4.6.3 Instantiating the LLM client

4.6.4 Generating the web searches and collecting the results

4.6.5 Scraping the web results

4.6.6 Summarizing the web results

4.6.7 Generating the research report

4.7 Reimplementing the research summary engine in LCEL

4.7.1 Assistant Instructions chain

4.7.2 Web Searches chain

4.7.3 Search and Summarization chain

4.7.4 Web Research chain

4.8 Summary

5 Agentic workflows with LangGraph

5.1 Understanding Agentic Workflows and Agents

5.1.1 Workflows

5.1.2 Agents

5.1.3 When to Use Agent-Based Architectures

5.1.4 Agent Development Frameworks

5.2 LangGraph Basics

5.3 Moving from LangChain Chains to LangGraph

5.4 LangGraph Core Components

5.4.1 StateGraph Structure

5.4.2 State Management and Typing

5.4.3 Node Functions and Edge Definitions

5.4.4 Entry Points and End Conditions

5.5 Turning the Web Research Assistant into an AI Agent

5.5.1 Original LangChain Implementation Overview

5.5.2 Identifying Components for Conversion

5.5.3 Step-by-Step Transformation Process

5.5.4 Code Comparison and Benefits Realized

5.6 Summary

Part 3: Q&A chatbots

6 RAG fundamentals with Chroma DB

6.1 Semantic Search

6.1.1 A Basic Q&A Chatbot Over a Single Document

6.1.2 A More Complex Q&A Chatbot Over a Knowledge Base

6.1.3 The RAG Design Pattern

6.2 Vector Stores

6.2.1 What’s a Vector Store?

6.2.2 How Do Vector Stores Work?

6.2.3 Vector Libraries vs. Vector Databases

6.2.4 Most Popular Vector Stores

6.2.5 Storing Text and Performing a Semantic Search Using Chroma

6.3 Implementing RAG from Scratch

6.3.1 Retrieving Content from the Vector Database

6.3.2 Invoking the LLM

6.3.3 Building the Chatbot

6.3.4 Recap on RAG Terminology

6.4 Summary

7 Q&A chatbots with LangChain and LangSmith

7.1 LangChain object model for Q&A chatbots

7.1.1 Content Ingestion (Indexing) Stage

7.1.2 Q&A (Retrieval and Generation) Stage

7.2 Vector Store Content Ingestion

7.2.1 Splitting and Storing the Documents

7.2.2 Ingesting Multiple Documents from a Folder

7.3 Q&A Across Stored Documents

7.3.1 Querying the vector store directly

7.3.2 Asking a Question through a LangChain Chain

7.3.3 Completing the RAG Chain Setup

7.4 Chatbot memory of message history

7.4.1 Amending the Prompt

7.4.2 Updating the Chat Message History

7.4.3 Feeding the Chat History to the RAG Chain

7.4.4 Putting Everything Together

7.5 Tracing Execution with LangSmith

7.5.1 Inspecting the LangSmith Traces

7.6 Creating a Q&A Chain with RetrievalQA

7.7 Summary

Part 4: Advanced RAG

8 Advanced indexing

8.1 Improving RAG Accuracy

8.1.1 Content Ingestion Stage

8.1.2 Question Answering Stage

8.2 Advanced Document Indexing

8.3 Splitting Strategy

8.3.1 Splitting by HTML Header

8.4 Embedding Strategy

8.4.1 Embedding Child Chunks with ParentDocumentRetriever

8.4.2 Embedding Child Chunks with MultiVectorRetriever

8.4.3 Embedding Document Summaries

8.4.4 Embedding Hypothetical Questions

8.5 Granular Chunk Expansion

8.6 Semi-Structured Content

8.7 Multi-Modal RAG

8.8 Summary

9 Question transformations

9.1 Rewrite-Retrieve-Read

9.1.1 Retrieving Content Using the Original User Question

9.1.2 Setting Up the Query Rewriter Chain

9.1.3 Retrieving Content with the Rewritten Query

9.1.4 Combining Everything into a Single RAG Chain

9.2 Generating Multiple Queries

9.2.1 Setting Up the Chain for Generating Multiple Queries

9.2.2 Setting Up a Custom Multi-Query Retriever

9.2.3 Using a Standard MultiQueryRetriever Instance

9.3 Step-Back Question

9.3.1 Setting Up the Chain to Generate a Step-Back Question

9.3.2 Incorporating Step-Back Question Generation into the RAG Chain

9.4 Hypothetical Document Embeddings (HyDE)

9.4.1 Generating an Hypothetical Document for the User Question

9.4.2 Integrating the HyDE Chain into the RAG Chain

9.5 Single-Step and Multi-Step Decomposition

9.6 Summary

10 Query generation, routing and retrieval post-processing

10.1 Content Database Query Generation

10.2 Self-Querying (Metadata Query Enrichment)

10.2.1 Ingestion: Metadata Enrichment

10.2.2 Q & A on a Metadata-Enriched Collection

10.3 Generating a Structured SQL Query

10.3.1 Installing SQLite

10.3.2 Setting Up and Connecting to the Database

10.3.3 Generating SQL Queries from Natural Language

10.3.4 Executing the SQL Query

10.4 Generating a Semantic SQL Query

10.4.1 Standard SQL Query

10.4.2 Semantic SQL Query

10.4.3 Creating the Embeddings

10.4.4 Performing a Semantic SQL Search

10.4.5 Automating Semantic SQL Search

10.4.6 Benefits of Semantic SQL Search

10.5 Generating Queries for a Graph Database

10.6 Chain Routing

10.6.1 Setting Up Data Retrievers

10.6.2 Setting Up the Query Router

10.6.3 Integrating the Chain Router into a Full RAG Chain

10.7 Retrieval Post-Processing

10.7.1 Similarity Postprocessor

10.7.2 Keyword Postprocessors

10.7.3 Time Weighting

10.7.4 RAG Fusion (Reciprocal Rank Fusion)

10.8 Summary

Part 5: AI agents

11 Building Tool-based Agents with LangGraph

11.1 Starting Simple: Building a Single-Tool Travel Info Agent

11.1.1 Project Setup

11.1.2 Loading Environment Variables

11.1.3 Preparing the Travel Information Vector Store

11.2 Enabling Agents to Call Tools

11.2.1 From Function Calling to Tool Calling

11.2.2 How Tool Calling Works with LLMs

11.2.3 Registering Tools with the LLM

11.2.4 Agent State: Tracking the Conversation

11.2.5 Executing Tool Calls

11.2.6 The LLM Node: Coordinating Reasoning and Action

11.3 Assembling the Agent Graph

11.3.1 Understanding the Agent Graph Structure

11.4 Running the Agent Chatbot: The REPL Loop

11.5 Executing a Request

11.5.1 Step-by-Step Debugging

11.6 Expanding Your Agent: Adding a Weather Forecast Tool

11.6.1 Implementing a Mock Weather Service

11.6.2 Creating the Weather Forecast Tool

11.6.3 Updating the Agent for Multi-Tool Support

11.7 Executing the Multi-Tool Agent

11.7.1 Running the Multi-Tool Agent (Initial Behavior)

11.7.2 Improving LLM Tool Usage with System Guidance

11.8 Using Pre-Built Components for Rapid Development

11.8.1 Refactoring to Use the LangGraph React Agent

11.8.2 Running the Pre-Built Agent

11.8.3 Observing and Debugging with LangSmith

11.9 Summary

12 Multi-agent Systems

12.1 Building an Accommodation Booking Agent

12.1.1 Hotel Booking Tool

12.1.2 B&B Booking Tool

12.1.3 ReAct Accommodation Booking Agent

12.2 Building a router-based Travel assistant

12.2.1 Designing the Router Agent

12.2.2 Routing Logic

12.2.3 Building the Multi-Agent Graph

12.2.4 Trying Out the Router Agent

12.3 Handling Multi-Agent Requests with a Supervisor

12.3.1 The Supervisor Pattern: An Agent of Agents

12.3.2 From “One-Way” to “Return Ticket” Interactions

12.3.3 Trying out the Supervisor agent

12.4 Summary

13 Building and consuming MCP servers

13.1 Introduction to MCP Servers

13.1.1 The Problem: Context Integration at Scale

13.1.2 The Solution: The Model Context Protocol (MCP)

13.1.3 The MCP Ecosystem

13.2 How to Build MCP Servers

13.2.1 Essential Resources for MCP Server Development

13.2.2 Official Language-Specific MCP SDKs

13.2.3 Consuming MCP Servers in LLM Applications and Agents

13.3 Building a Weather MCP Server

13.3.1 Implementing the MCP Server

13.3.2 Trying out the MCP Server with MCP Inspector

13.3.3 Consuming the MCP Server from a Test MCP Host

13.4 Integrating the Weather MCP Tool into an Agent

13.4.1 Preparing the Travel Agent for Live Weather Data

13.4.2 Integrating the AccuWeather MCP Tool

13.4.3 Updating the Agent Chat Loop

13.4.4 Combining Local and Remote Tools

13.4.5 Testing and Verification

13.4.6 Using the Agent for Complex Queries

13.5 Summary

14 Productionizing AI Agents: memory, guardrails, and beyond

14.1 Memory 14.1.1 Types of Memory 14.1.2 Why Short-Term Memory is Needed 14.1.3 Checkpoints in LangGraph 14.1.4 Adding Short-Term Memory to Our Travel Assistant 14.1.5 Executing the Checkpointer-Enabled Assistant 14.1.6 Rewinding the State to a Past Checkpoint 14.2 Guardrails 14.2.1 Implementing Guardrails to Reject Non-Travel-Related Questions 14.2.2 Implementing More Restrictive Guardrails at Agent Level 14.3 Beyond this chapter 14.3.1 Long-term user and application memory 14.3.2 Human-in-the-loop (HITL) 14.3.3 Post-model guardrails 14.3.4 Evaluation of AI agents and applications 14.3.5 Deployment on LangGraph Platform and Open Agent Platform (OAP) 14.4 Summary

Appendices

Appendix A: Trying out LangChain

A.1 Trying out LangChain in a Jupyter Notebook environment

A.1.1 Sentence completion example

A.1.2 Prompt engineering examples

A.1.3 Creating chains and executing them with LCEL

Appendix B: Setting up a Jupyter Notebook environment

Appendix C: Choosing an LLM

C.1 Popular Large Language Models

C.1.1 OpenAI GPT-4o and GPT-4.1 series

C.1.2 OpenAI o1 and o3 series

C.1.3 Gemini

C.1.4 Gemma

C.1.5 Claude

C.1.6 Cohere

C.1.7 Llama

C.1.8 Falcon

C.1.9 Mistral

C.1.10 Qwen

C.1.11 Grok

C.1.12 Phi

C.1.13 DeepSeek

C.2 How to choose a model

C.2.1 Model Purpose

C.2.2 Proprietary vs. Open-Source

C.2.3 Model size (Number of Parameters)

C.2.4 Context Window size

C.2.5 Multilingual Support

C.2.6 Accuracy vs. Speed

C.2.7 Cost and Hardware Requirements

C.2.8 Task suitability (standard benchmarks)

C.2.9 Safety and Bias

C.2.10 A practical example

C.3 A Word of Caution

Appendix D: Installing SQLite on Windows

Appendix E: Open-source LLMs

E.1 Benefits of open-source LLMs

E.1.1 Transparency

E.1.2 Privacy

E.1.3 Community driven

E.1.4 Cost Savings

E.2 Popular open-source LLMs

E.3 Considerations on running open-source LLMs locally

E.3.1 Limitations of consumer hardware

E.3.2 Quantization

E.3.3 OpenAI REST API compatibility

E.4 Local inference engines

E.4.1 Llama.cpp

E.4.2 Ollama

E.4.3 vLLM

E.4.4 llamafile

E.4.5 LM Studio

E.4.6 LocalAI

E.4.7 GPT4All

E.4.8 Comparing local inference engines

E.4.9 Choosing a local inference engine

E.5 Inference via the HuggingFace Transformers library

E.5.1 Hugging Face Transformers library

E.5.2 LangChain’s HuggingFace Pipeline

E.6 Building a local summarization engine

E.6.1 Choosing the inference engine

E.6.2 Starting up the OpenAI compatible server

E.6.3 Modifying the original solution

E.6.4 Running the summarization engine through the local LLM

E.6.5 Comparison between OpenAI and local LLM

E.7 Summary

Overview

1 Introduction to AI Agents and Applications

Large language models have moved from novelty to necessity, enabling applications that understand, generate, and act on natural language—and unlocking a new class of systems: AI agents. This chapter introduces the core problems in building LLM-powered software and motivates the use of frameworks like LangChain, LangGraph, and LangSmith to replace boilerplate with proven patterns. It sets the stage by outlining the main application families—engines, chatbots, and agents—and the foundational techniques you will use throughout the book, especially prompt engineering and retrieval-augmented generation (RAG).

At a high level, the architecture centers on ingesting data into Documents, splitting text into chunks, embedding those chunks, and storing them in vector databases for fast semantic retrieval. Prompts combine user intent with retrieved context and are sent to an LLM or chat model, with output parsers shaping structured results and optional caching reducing cost. LangChain’s design emphasizes modularity, composability, and extensibility: standardized interfaces, the Runnable API, and the LangChain Expression Language (LCEL) let you build reliable chains and, when needed, shift to graph-shaped flows with LangGraph. Beyond vector search, retrievers can tap relational or graph stores, and the ecosystem supports evaluation, debugging, and monitoring via LangSmith.

The chapter surveys practical patterns across three app types. Engines provide targeted capabilities such as summarization and Q&A, typically implemented with RAG across ingestion and query phases. Chatbots add safe, guided conversation with role-based prompting, memory, and domain grounding. Agents represent the most advanced class: they plan and execute multi-step workflows, select and invoke tools and APIs, integrate heterogeneous data, and can include human-in-the-loop controls; emerging standards like MCP further streamline tool integration. To adapt models to real needs, the chapter compares prompt engineering, RAG, and fine-tuning, and offers guidance on model selection (purpose, context window, speed, size, multilingual, instruction vs reasoning, and open vs proprietary). Finally, it previews a hands-on path to design, build, evaluate, and scale production-grade systems with LangChain and LangGraph, cultivating transferable skills and best practices.

LangChain architecture: The Document Loader imports data, which the Text Splitter divides into chunks. These are vectorized by an Embedding Model, stored in a Vector Store, and retrieved through a Retriever for the LLM. The LLM Cache checks for prior requests to return cached responses, while the Output Parser formats the LLM's final response.

Object model of classes associated with the Document core entity, including Document loaders (which create Document objects), splitters (which create a list of Document objects), vector stores (which store Document objects in vector stores) and retrievers (which retrieve Document objects from vector stores and other sources)

Object model of classes associated with Language Models, including Prompt Templates and Prompt Values

A summarization engine efficiently summarizes and stores content from large volumes of text and can be invoked by other systems through REST API.

A Q&A engine implemented with RAG design: an LLM query engine stores domain-specific document information in a vector store. When an external system sends a query, it converts the natural language question into its embeddings (or vector) representation, retrieves the related documents from the vector store, and then gives the LLM the information it needs to craft a natural language response.

A summarization chatbot has some similarities with a summarization engine, but it offers an interactive experience where the LLM and the user can work together to fine-tune and improve the results.

Sequence diagram that outlines how a user interacts with an LLM through a chatbot to create a more concise summary.

Workflow of an AI agent tasked with assembling holiday packages: An external client system sends a customized holiday request in natural language. The agent prompts the LLM to select tools and formulate queries in technology-specific formats. The agent executes the queries, gathers the results, and sends them back to the LLM, along with the original request, to obtain a comprehensive holiday package summary. Finally, the agent forwards the summarized package to the client system.

A collection of documents is split into text chunks and transformed into vector-based embeddings. Both text chunks and related embeddings are then stored in a vector store.

Summary

LLMs have rapidly evolved into core building blocks for modern applications, enabling tasks like summarization, semantic search, and conversational assistants.
Without frameworks, teams often reinvent the wheel—managing ingestion, embeddings, retrieval, and orchestration with brittle, one-off code. LangChain addresses this by standardizing these patterns into modular, reusable components.
LangChain’s modular architecture builds on loaders, splitters, embedding models, retrievers, and vector stores, making it straightforward to build engines such as summarization and Q&A systems.
Conversational use cases demand more than static pipelines. LLM-based chatbots extend engines with dialogue management and memory, allowing adaptive, multi-turn interactions.
Beyond chatbots, AI agents represent the most advanced type of LLM application. Agents orchestrate multi-step workflows and tools under LLM guidance, with frameworks like LangGraph designed to make this practical and maintainable.
Retrieval-Augmented Generation (RAG) is a foundational pattern that grounds LLM outputs in external knowledge, improving accuracy while reducing hallucinations and token costs.
Prompt engineering remains a critical skill for shaping LLM behavior, but when prompts alone aren’t enough, RAG or even fine-tuning can extend capabilities further.

FAQ

What are the core challenges in building LLM-powered applications?

Common hurdles include: ingesting and managing proprietary data; structuring and maintaining prompts; chaining model calls reliably; integrating external APIs and services; handling context-window limits and token costs; orchestrating multi-step workflows; and evaluating, debugging, and monitoring apps once deployed.

How does LangChain address these challenges?

LangChain standardizes recurring patterns into modular, composable components (document loaders, text splitters, embedding models, vector stores, retrievers, prompt templates, LLM cache, output parsers). With the Runnable interface and LangChain Expression Language (LCEL), you can chain components consistently, reducing glue code and improving maintainability. Its design principles—modularity, composability, extensibility—let you swap models, stores, and connectors without rewrites.

What is the typical LangChain data flow and architecture?

Data is loaded and wrapped as Document objects, split into chunks, embedded into vectors, and stored in a vector store. At query time, a retriever fetches relevant chunks, a prompt template combines the user query with retrieved context, the LLM/ChatModel generates an answer (optionally using an LLM cache), and an output parser structures the result (e.g., JSON). Graph databases can complement vector stores when relationships matter.

What’s the difference between a chain and an agent?

A chain is a fixed, sequential workflow tailored to a task (e.g., load → retrieve → prompt → LLM → parse). An agent manages a dynamic workflow: it selects tools at runtime, branches based on intermediate results, and iterates until the goal is met. Tools (plugins) form a toolkit the agent can choose from. LangGraph helps implement these adaptive, graph-shaped flows.

What is Retrieval-Augmented Generation (RAG) and why use it?

RAG augments generation with retrieved context from a local knowledge base (often a vector store). Ingestion: load, split, embed, and index documents. Query: embed the question, retrieve similar chunks, and include them in the prompt. Benefits: efficiency (shorter prompts, lower cost), accuracy (grounded answers, fewer hallucinations), and flexibility (swap embeddings, retrievers, or stores as needed).

How do engines, chatbots, and AI agents differ?

- Engines: backend capabilities (e.g., summarization, Q&A) exposed via APIs; great for system-to-system tasks and RAG-powered search. - Chatbots: conversational interfaces with role-based prompts and memory; can ground answers via retrieval. - Agents: autonomous or semi-autonomous systems that plan and execute multi-step tasks across tools and data sources, returning synthesized results.

What are LCEL and the Runnable interface, and why do they matter?

Runnable is a common interface that lets components compose cleanly. LCEL is an expression language to wire components together declaratively. Together, they reduce boilerplate, make pipelines consistent, and simplify testing, debugging, and maintenance.

How do knowledge graphs and LangGraph fit into the picture?

Knowledge graphs (e.g., Neo4j) represent entities and relationships, complementing vector stores when relational reasoning is needed. LangChain integrates graph databases and supports graph-based memory/planning. LangGraph formalizes graph-shaped workflows and provides agent/orchestrator classes to build robust multi-step, branching applications.

How should I choose an LLM for my application?

Consider: task purpose (general vs specialized like code); context-window size; multilingual support; model size vs cost/latency; speed requirements; instruction vs reasoning models; and open-source vs proprietary trade-offs. Many systems mix models (e.g., smaller models for routing/summarization, larger or reasoning models for synthesis or planning).

How can I evaluate, monitor, and keep LLM apps reliable and safe?

Use LangSmith for evaluation, debugging, and monitoring. Employ prompt guardrails, validators, and source citation to improve grounding. Include human-in-the-loop for high-stakes actions. For tool use at scale, adopt the Model Context Protocol (MCP) to standardize tool integration, and orchestrate multi-step flows with LangGraph for control and observability.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $28.79

you save $19.20 (40%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $28.79

you save $19.20 (40%)

eBook

pdf, ePub, online

$47.99 $28.79

you save $19.20 (40%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more