Overview

1 Introduction to AI agents and applications

Large language models now underpin a wide range of software, from question answering and summarization to tool coordination and decision-making with AI agents. Despite varied use cases, most systems share a common pattern: accept natural language, enrich it with context, and prompt a model to produce useful output. Building reliable products introduces recurring challenges—data ingestion, prompt design, orchestration, evaluation, and cost control—which modern frameworks such as LangChain, LangGraph, and LangSmith address through modular abstractions and proven practices.

LLM-powered solutions cluster into three archetypes. Engines deliver focused capabilities like summarization and search; Q&A engines typically use Retrieval-Augmented Generation, transforming documents into embeddings stored in vector databases to ground answers in retrieved context. Chatbots add dialogue management, memory, role instructions, and optional retrieval to maintain coherent, safe, and adaptive conversations. Agents represent the most advanced class: they plan and execute multi-step workflows, choose and invoke tools and services, integrate structured and unstructured data, and can incorporate human-in-the-loop safeguards. Emerging standards such as the Model Context Protocol simplify tool integration and expand an agent’s reachable capabilities.

To build these systems effectively, LangChain offers a composable architecture—document loaders, text splitters, embedding models, vector stores, retrievers, prompt templates, model interfaces, caching, and output parsers—wired consistently via the Runnable interface and LCEL, and extended to graph-shaped flows with LangGraph for agentic control. The chapter outlines core adaptation techniques (prompt engineering, RAG, and fine-tuning) and guidance for model selection across accuracy, latency, and cost trade-offs. It also previews a hands-on path: constructing engines and chatbots, advancing to agents with LangGraph, deep-diving into RAG, and instrumenting with LangSmith—equipping you to design, evaluate, and scale production-grade AI applications.

A summarization engine efficiently summarizes and stores content from large volumes of text and can be invoked by other systems through REST API.
A Q&A engine implemented with RAG design: an LLM query engine stores domain-specific document information in a vector store. When an external system sends a query, it converts the natural language question into its embeddings (or vector) representation, retrieves the related documents from the vector store, and then gives the LLM the information it needs to craft a natural language response.
A summarization chatbot has some similarities with a summarization engine, but it offers an interactive experience where the LLM and the user can work together to fine-tune and improve the results.
Sequence diagram that outlines how a user interacts with an LLM through a chatbot to create a more concise summary.
Workflow of an AI agent tasked with assembling holiday packages: An external client system sends a customized holiday request in natural language. The agent prompts the LLM to select tools and formulate queries in technology-specific formats. The agent executes the queries, gathers the results, and sends them back to the LLM, along with the original request, to obtain a comprehensive holiday package summary. Finally, the agent forwards the summarized package to the client system.
LangChain architecture: The Document Loader imports data, which the Text Splitter divides into chunks. These are vectorized by an Embedding Model, stored in a Vector Store, and retrieved through a Retriever for the LLM. The LLM Cache checks for prior requests to return cached responses, while the Output Parser formats the LLM's final response.
Object model of classes associated with the Document core entity, including Document loaders (which create Document objects), splitters (which create a list of Document objects), vector stores (which store Document objects in vector stores) and retrievers (which retrieve Document objects from vector stores and other sources)
Object model of classes associated with Language Models, including Prompt Templates and Prompt Values
A collection of documents is split into text chunks and transformed into vector-based embeddings. Both text chunks and related embeddings are then stored in a vector store.

Summary

  • LLMs have rapidly evolved into core building blocks for modern applications, enabling tasks like summarization, semantic search, and conversational assistants.
  • Without frameworks, teams often reinvent the wheel—managing ingestion, embeddings, retrieval, and orchestration with brittle, one-off code. LangChain addresses this by standardizing these patterns into modular, reusable components.
  • LangChain’s modular architecture builds on loaders, splitters, embedding models, retrievers, and vector stores, making it straightforward to build engines such as summarization and Q&A systems.
  • Conversational use cases demand more than static pipelines. LLM-based chatbots extend engines with dialogue management and memory, allowing adaptive, multi-turn interactions.
  • Beyond chatbots, AI agents represent the most advanced type of LLM application.
  • Agents orchestrate multi-step workflows and tools under LLM guidance, with frameworks like LangGraph designed to make this practical and maintainable.
  • Retrieval-Augmented Generation (RAG) is a foundational pattern that grounds LLM outputs in external knowledge, improving accuracy while reducing hallucinations and token costs.
  • Prompt engineering remains a critical skill for shaping LLM behavior, but when prompts alone aren’t enough, RAG or even fine-tuning can extend capabilities further.

FAQ

What types of LLM-powered systems does this chapter define, and how do they differ?LLM-based engines provide a single capability (for example, summarization, search, or Q&A) via APIs. Chatbots add a conversational interface with memory and prompt controls to keep multi-turn dialogue coherent and safe. AI agents go further: they plan and execute multi-step tasks, select and call tools/APIs, branch based on results, and iterate until they reach a goal.
How does a RAG-based Q&A engine work end to end?Two phases: (1) Ingestion—load documents, split into chunks, create embeddings, and store chunks plus vectors in a vector store. (2) Query—embed the user question, retrieve the most similar chunks, compose a prompt with the question and retrieved context, and have the LLM generate a grounded answer.
What are embeddings and why are they important?Embeddings map text into high-dimensional vectors that capture semantic meaning. They enable similarity search, letting systems retrieve conceptually related content (not just keyword matches). Embeddings are central to RAG, semantic search, clustering, routing, and evaluation tasks.
Which LangChain components make up a typical RAG or LLM app?Key pieces include: Document Loaders, Text Splitters, Embedding Models, Vector Stores, Retrievers, Prompt Templates, LLM/ChatModel, optional LLM Cache, and Output Parsers. These are composed via the Runnable interface and LCEL to form reliable, maintainable chains.
What core challenges in LLM apps does the chapter highlight?- Getting your own data into the model without dumping full documents into prompts - Keeping prompts, chains, and integrations maintainable as features grow - Handling context limits and token costs while preserving accuracy - Orchestrating multi-step workflows and API calls robustly - Evaluating, debugging, and monitoring behavior in production
How do chatbots maintain context and improve accuracy?They use role-based prompts (system/user/assistant), conversation memory to track prior turns, and context compression/summarization to fit within context windows and control cost. They often ground answers with retrieved facts from vector stores to improve reliability and reduce hallucinations.
What is an AI agent and what does its decision loop look like?An agent uses an LLM to decide which tools to call, executes those tools, inspects results, and iterates until it produces a final solution. It can mix structured data (SQL/APIs) and unstructured sources (docs/web), support human-in-the-loop approvals, and even use multi-agent designs with a supervisor coordinating specialized sub-agents.
How do LangGraph and LangChain support building agents?LangChain supplies modular components and chaining via LCEL. LangGraph adds graph-based orchestration and pre-built agent/orchestrator classes plus tool integrations, enabling dynamic, branching workflows (not just linear pipelines) for planning, tool use, memory, and supervision.
What is the Model Context Protocol (MCP) and why does it matter?MCP standardizes how services expose tools via MCP servers that agents can access through MCP clients, as if the tools were local. It shifts integration work to the tool provider, reduces custom connectors, and broadens an agent’s toolset through public MCP portals.
How should I choose an LLM for my application?Balance accuracy, speed (latency), and cost. Also consider context window size, multilingual support, instruction vs. reasoning models, and open-source vs. proprietary trade-offs. Many systems mix models (e.g., small for fast tasks, larger for complex reasoning) to optimize end-to-end performance and cost.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Agents and Applications ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Agents and Applications ebook for free