Overview

1 Introduction to AI Agents and Applications

This chapter frames how large language models (LLMs) have evolved from novelty to a core application primitive and introduces AI agents as a new class of applications that plan, call tools, and orchestrate multi-step workflows from natural language input.

  • Motivation: Post-ChatGPT, LLMs enable answering complex questions, tailored content generation, document summarization, and cross-system coordination—culminating in agents that act on behalf of users.
  • Core challenges: Teams repeatedly face data ingestion and management, prompt design, reliable chaining of model calls, and integration with external APIs and services.
  • Frameworks as leverage: LangChain, LangGraph, and LangSmith provide modular building blocks to reduce boilerplate, encode best practices, and let developers focus on application logic rather than low-level wiring.
  • Application families: The chapter orients readers to three common LLM-powered patterns—engines, chatbots, and agents—each with distinct roles and orchestration needs.
  • Foundational patterns: Introduces prompt engineering and Retrieval-Augmented Generation (RAG) as recurring techniques used throughout the book.

By the end of the chapter, readers gain a clear map of the problem space, an overview of LangChain’s architecture and object model, and the key patterns and frameworks that will be used to design, build, and scale real LLM applications and agents.

1.1 Introducing LangChain

This section introduces LangChain as a framework that solves recurring challenges in building LLM applications: robust data ingestion, maintainable prompts and chains, managing context limits and costs, orchestrating multi-step workflows, and enabling evaluation/debugging/monitoring. LangChain standardizes these patterns into modular, composable components and provides a consistent chaining model through the Runnable interface and the LangChain Expression Language (LCEL). It is guided by modularity, composability, and extensibility, enabling easy swaps of models and stores, custom integrations, and dynamic agent workflows. Learning these patterns equips developers with transferable skills across similar frameworks.

  • Workflow overview: ingest data, split into chunks, embed into vectors, store in a vector database, retrieve relevant context, construct prompts, call an LLM, and parse outputs.
  • RAG backbone: vector stores power most retrieval-augmented generation use cases; graph databases (e.g., Neo4j) complement scenarios requiring entity/relationship reasoning, memory, or planning.
  • Composition tools: Runnable + LCEL provide clean, debuggable pipelines; LangGraph supports graph-shaped, branching flows for advanced orchestration.

Core components (high level):

  • Document loaders: extract data from files, databases, or websites into Document objects with content and metadata.
  • Text splitters: chunk large texts to fit context windows and improve indexing and retrieval.
  • Document: a unit of content plus metadata (e.g., source, page).
  • Embedding models: convert text chunks to semantic vectors.
  • Vector stores: index embeddings for fast similarity search; serve as offline knowledge bases.
  • Knowledge graph databases: optional graph stores for entities/relationships and graph-based reasoning.
  • Retrievers: fetch relevant Documents from vector, relational, or graph stores.
  • Prompts: reusable templates that combine user input and retrieved context; support techniques like few-shot prompting.
  • LLM Cache: optional cache to reduce latency and cost for repeat queries.
  • LLM / ChatModel: interfaces to various providers (and a fake model for testing).
  • Output Parser: structures LLM responses (e.g., JSON) for reliable downstream use.

Composition patterns and app types:

  • Chains: linear pipelines tailored to specific tasks.
  • Agents: dynamic workflows that select tools at runtime; tools collectively form a toolkit.
  • Primary applications: summarization and query services, chatbots, and agents.
Figure LangChain architecture: The Document Loader imports data, which the Text Splitter divides into chunks. These are vectorized by an Embedding Model, stored in a Vector Store, and retrieved through a Retriever for the LLM. The LLM Cache checks for prior requests to return cached responses, while the Output Parser formats the LLM's final response.

1.2 LangChain core object model

LangChain’s core object model is organized as class hierarchies that center on the Document entity. Loaders create Document objects from raw sources, splitters divide them into manageable chunks, and these segments are stored in vector databases and accessed via retrievers for downstream tasks. This structure clarifies how data flows through the framework and how components interoperate.

  • Document
  • DocumentLoader
  • TextSplitter
  • VectorStore
  • Retriever

LangChain integrates broadly with third-party tools across these components and also offers the community-driven LangChain Hub for sharing reusable prompts, chains, and tools. A unifying feature across many components is the Runnable interface, which enables consistent composition and chaining; this underpins highly modular workflows and is further enhanced by the LangChain Expression Language (LCEL) for building expressive LLM pipelines.

On the language model side, the object model encompasses PromptTemplate and PromptValue, which connect to the LLM interface. This hierarchy is somewhat more complex than the Document-centric flow and governs how prompts are structured, rendered, and passed to models.

Object model of classes associated with the Document core entity, including Document loaders (which create Document objects), splitters (which create a list of Document objects), vector stores (which store Document objects in vector stores) and retrievers (which retrieve Document objects from vector stores and other sources)
Object model of classes associated with Language Models, including Prompt Templates and Prompt Values

1.3 Building LLM applications and AI agents

LLMs excel at understanding and generating natural language and now power a wide range of applications across industries. Despite varied use cases, most LLM apps share a core pattern: accept natural language input, enrich it with relevant context, and construct a prompt for the model. This section outlines three major application types—LLM-based engines, chatbots, and AI agents—and how frameworks like LangChain and LangGraph streamline building them.

  • LLM-based applications or engines: Focused capabilities such as summarization, search, Q&A, or content generation
  • Chatbots: Conversational systems that maintain context, apply role instructions, and can ground answers in local knowledge
  • AI agents: Autonomous or semi-autonomous systems that plan and execute multi-step tasks with tools and external data

1.3.1 LLM-based applications: summarization and Q&A engines

Engines act as backend services that handle specific NL tasks for other systems. Summarization engines condense long texts and expose results via APIs; Q&A engines answer questions against a knowledge base using a two-phase pipeline: ingestion and query. Ingestion turns documents into embeddings and stores them (and their chunks) in a vector store; querying converts a user question to an embedding, retrieves relevant chunks, and composes a prompt for the LLM. This pattern is known as Retrieval-Augmented Generation (RAG).

A summarization engine efficiently summarizes and stores content from large volumes of text and can be invoked by other systems through REST API.
Summarization engine diagram
A Q&A engine implemented with RAG design: an LLM query engine stores domain-specific document information in a vector store. When an external system sends a query, it converts the natural language question into its embeddings representation, retrieves related documents, and gives the LLM the information it needs to craft a grounded response.
RAG Q&A engine diagram
Definition

Embeddings are high-dimensional vector representations of text (from words to document chunks) that capture semantic similarity and context, enabling efficient retrieval and reasoning.

Definition

Retrieval-Augmented Generation (RAG) augments an LLM’s generation with retrieved, domain-specific context (often from a vector store) at query time.

LangChain provides modular building blocks—loaders, splitters, embedding models, vector stores, and retrievers—so you can assemble engines with minimal boilerplate. Engines can also orchestrate tools and APIs, translating NL instructions into queries and presenting results cleanly.

1.3.2 LLM-based chatbots

Chatbots add interactive, multi-turn conversations to LLM capabilities. They rely on strong prompt design and role-based messaging to keep outputs relevant and safe, and often ground responses with local knowledge via vector stores. Conversation memory maintains coherence across turns, typically using summarization or compression to fit within context windows. Many chatbots specialize (e.g., summarization, Q&A, translation) while adapting to user feedback in real time.

A summarization chatbot has some similarities with a summarization engine, but it offers an interactive experience where the LLM and the user can work together to fine-tune and improve the results.
Summarization chatbot architecture
Sequence diagram that outlines how a user interacts with an LLM through a chatbot to create a more concise summary.
Chatbot summarization sequence diagram

1.3.3 AI agents

AI agents coordinate multi-step workflows by selecting tools, executing them, and iteratively deciding next actions with LLM guidance. They integrate structured (APIs, databases) and unstructured sources (documents, web) to produce end-to-end solutions. A typical agent loop: choose tools → run them → analyze results → continue until a complete output is ready.

  • Example: A holiday-planning agent selects travel and weather tools, formulates queries, executes them, and composes a final itinerary for a booking site.
  • Designs range from a single agent loop to multi-agent systems with a supervisory coordinator.
  • Human-in-the-loop checkpoints are common in high-stakes domains for validation and trust.
Workflow of an AI agent tasked with assembling holiday packages: An external client sends a natural language request; the agent prompts the LLM to select tools and formulate queries, executes them, aggregates results, and returns a comprehensive itinerary.
AI agent workflow for holiday packages
Note

Agent runs often involve multiple LLM-tool iterations. Designs may use granular sub-agents overseen by a supervisor. Human approval steps can be embedded when required. LangChain—and especially LangGraph—enables controlled, modular orchestration for these workflows.

Momentum around agents has accelerated with the Model Context Protocol (MCP), which standardizes how services expose tools via MCP servers and clients. With growing ecosystem support (including major providers), MCP reduces integration overhead and expands accessible toolsets for agents.

Takeaway: Engines deliver focused NL capabilities, chatbots add interactive dialogue with memory and grounding, and agents execute adaptive, multi-step plans across tools and data. LangChain’s components and LangGraph’s orchestration patterns make building all three practical and extensible.

1.4 Typical LLM use cases

This section outlines common, real-world applications of large language models (LLMs), spanning understanding, generation, reasoning, and automation, with brief examples and pointers to where deeper coverage appears in later chapters.

  • Text Classification and Sentiment Analysis: Categorize content and assess sentiment to drive actions like ticket routing or stock recommendations; exemplified by automated support ticket classification at GoDaddy.
  • Natural Language Understanding and Generation: Identify main topics and produce tailored summaries by length, tone, or terminology; Duolingo accelerates lesson creation. Summarization is covered in Chapters 3–4.
  • Semantic Search: Retrieve information by intent and context rather than keywords; used to enhance recipe search in a supermarket app. Related Q&A chatbot methods appear in Chapters 6–7, with advanced techniques in Chapters 8–10.
  • Autonomous Reasoning and Workflow Execution: Plan and execute multi-step tasks (e.g., booking a complete holiday) through agentic orchestration. Building agents with LangGraph is discussed in Chapter 12.
  • Structured Data Extraction: Pull entities and relationships from unstructured texts such as financial reports or news articles.
  • Code Understanding and Generation: Analyze, refactor, and create code from instructions; powers IDE assistants like GitHub Copilot and Cline AI, emerging editors such as Cursor and Windsurf, and CLI tools like Claude Code and OpenAI Codex.
  • Personalized Education and Tutoring: Deliver interactive, adaptive learning support; exemplified by Khan Academy’s Khanmigo.

A key caveat: these use cases presume the LLM can handle requests reliably, yet real-world tasks often span domains beyond initial training. The next section focuses on ensuring models meet user needs effectively in such scenarios.

1.5 How to adapt an LLM to your needs

This section outlines three escalating techniques to tailor an LLM’s behavior to your tasks and data: prompt engineering, Retrieval-Augmented Generation (RAG), and fine-tuning.

1.5.1 Prompt engineering

  • Design structured prompts—ranging from simple commands to rich instruction blocks with examples—to guide model behavior and improve accuracy.
  • Use in-context learning and few-shot prompting to teach patterns directly in the prompt, often via reusable templates with variable fields.
  • Maintain conversational context (e.g., recent turns) to enable coherent multi-turn responses.
  • Powerful and lightweight, but limited when answers must be grounded in user- or enterprise-specific data.

1.5.2 Retrieval Augmented Generation (RAG)

  • Augment prompts with relevant snippets retrieved from your own knowledge base (typically a vector database) to ground responses in verified information.
  • Workflow: ingest documents, split into chunks, embed into vectors, store in a vector store, retrieve top matches by semantic similarity, and include them in the prompt.
A collection of documents is split into text chunks and transformed into vector-based embeddings. Both text chunks and related embeddings are then stored in a vector store.
  • Benefits:
    • Efficiency: retrieve only key chunks to control token use and respect context limits.
    • Accuracy: ground answers on real data to reduce hallucinations; can cite sources for transparency.
    • Flexibility: swap embedding models, retrievers, or vector stores per domain needs.
  • Reliability improves when prompts instruct the model to use only retrieved context; guardrails, validators, and human review can further enhance safety.
Definition

Grounding: adding trusted context (often from a vector store) to the prompt so the LLM relies on verified facts rather than only its pretraining.

Definition

Hallucination: when an LLM produces incorrect or fabricated content, often due to missing context or limitations in training data.

RAG bridges static pretrained knowledge and dynamic, domain-specific needs; if it’s not sufficient, consider fine-tuning.

1.5.3 Fine-tuning

  • Adapt a pretrained LLM to a specific task or domain by training on curated examples that capture desired style, terminology, and reasoning.
  • Main advantage: efficiency at inference—fewer long instructions or examples needed once the model internalizes your patterns.
  • Trade-offs: dataset preparation effort, compute cost (often GPUs), and operational complexity.
  • Parameter-efficient methods (e.g., LoRA) and approaches like instruction tuning and RLHF reduce cost and improve instruction following.
  • Evidence suggests RAG often outperforms fine-tuning for less-popular knowledge by supplying context at runtime, reducing retraining.
  • Still essential for highly specialized domains (e.g., medicine, law, finance) with noted examples: BioMistral, LexisNexis’s legal-domain LLM (“LexiGPT”), BloombergGPT, and Claude Code.

Overall, start with prompt engineering, add RAG to ground answers in your data, and use fine-tuning when domain specialization and consistent behavior justify the added cost and complexity.

1.6 Which LLMs to choose

There is a broad spectrum of LLMs—proprietary and open source—available via APIs and chat interfaces, often in multiple size variants to balance performance, speed, and cost. LangChain’s standardized interface simplifies swapping models with minimal code changes, which is valuable in a fast-evolving ecosystem. Choosing the right model depends on your task, constraints, and deployment needs.

  • Model purpose: Most major families (GPT, Gemini, Claude, Llama, Mistral) handle general tasks like summarization, translation, classification, and sentiment analysis. For specialized tasks (e.g., code generation), pick fine-tuned options such as Claude Sonnet or Meta’s Code Llama.
  • Context window size: Larger windows support longer prompts and documents (ranging from ~128K–256K up to ~2M tokens), but increase latency and cost—especially with per-token pricing.
  • Multilingual support: Choose models trained broadly across languages if your app is multilingual. Qwen and Llama are strong across Western and Asian languages; some Gemma variants specialize (e.g., Japanese).
  • Model size: From small (≈1B) to very large (trillions) parameters. Smaller models are cheaper and faster and can be sufficient for simple tasks; “mini” or “nano” variants can deliver strong accuracy at lower cost and latency.
  • Speed: Smaller and mid-size models generally respond faster. For latency-sensitive apps (like chat), benchmark both quality and responsiveness.
  • Instruction vs. reasoning: Instruction models (e.g., GPT-4 series, Gemini Pro) excel at following clear directions—fast and economical. Reasoning models (e.g., OpenAI’s o-series, Gemini Thinking) plan and adapt when steps aren’t fully specified—more capable but typically slower and costlier. Choose based on whether you provide the plan or want the model to devise it.
  • Open-source vs. proprietary: Open-source (Llama, Mistral, Qwen, Falcon) offers stronger data control and on-prem/private-cloud deployment. Proprietary APIs are easy to adopt and often state-of-the-art, but long-term costs can be higher; many teams start with commercial APIs and later migrate for cost or compliance.

In practice, align the model to each task’s accuracy, speed, and cost needs, and consider a multi-model setup. For example: use a small model (e.g., GPT-4o Mini) for summarization and sentiment, a reasoning model (e.g., o3) to interpret and route queries, and a stronger instruction model (e.g., GPT-4.1) for final answer synthesis—balancing performance and budget across the workflow.

1.7 What You'll Learn from this Book

This section outlines the skills and outcomes you’ll gain—from core prompt engineering to building production-ready AI applications—using LangChain, LangGraph, and large language models.

  • Begin with prompt engineering to interact effectively with LLMs, first via ChatGPT, then programmatically through REST APIs.
  • Use LangChain to build:
    • Custom engines (e.g., summarization, Q&A).
    • Chatbots that combine conversational fluency with knowledge retrieval.
    All examples share a common travel-industry theme for coherence.
  • Advance to AI agents with LangGraph that orchestrate multi-step workflows, coordinate tools, and make adaptive decisions. You’ll start from a simple Python script and iteratively add capabilities like tool use, planning, and memory.
  • Deep dive into Retrieval-Augmented Generation (RAG) through focused scripts that cover both fundamentals and advanced workflows.
  • Work with both hosted (OpenAI) and open-source models via inference engines (see Appendix E) to balance cost, privacy, and control.
  • Cover the full application lifecycle: debugging, monitoring, and refinement with LangSmith; workflow orchestration with LangGraph; and production best practices for scalability and maintainability.

By the end, you’ll have a portfolio of working projects, mastery of key architectural patterns, and the confidence to design, implement, and evolve LLM-powered systems.

1.9 Summary

LLMs have rapidly evolved into core building blocks for modern applications, enabling tasks like summarization, semantic search, and conversational assistants. Without frameworks, teams often reinvent the wheel—managing ingestion, embeddings, retrieval, and orchestration with brittle, one-off code. LangChain addresses this by standardizing these patterns into modular, reusable components. LangChain’s modular architecture builds on loaders, splitters, embedding models, retrievers, and vector stores, making it straightforward to build engines such as summarization and Q&A systems. Conversational use cases demand more than static pipelines. LLM-based chatbots extend engines with dialogue management and memory, allowing adaptive, multi-turn interactions. Beyond chatbots, AI agents represent the most advanced type of LLM application. Agents orchestrate multi-step workflows and tools under LLM guidance, with frameworks like LangGraph designed to make this practical and maintainable. Retrieval-Augmented Generation (RAG) is a foundational pattern that grounds LLM outputs in external knowledge, improving accuracy while reducing hallucinations and token costs. Prompt engineering remains a critical skill for shaping LLM behavior, but when prompts alone aren’t enough, RAG or even fine-tuning can extend capabilities further.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Agents and Applications ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Agents and Applications ebook for free