Overview

1 The World of Large Language Models

This chapter opens by tracing how our uniquely human capacity for language led to the field of natural language processing and, eventually, to deep learning breakthroughs that made contemporary large language models possible. It contrasts early, narrowly scoped voice assistants with today’s models that can sustain open-ended dialogue, summarize and reason across diverse domains, and feel far more conversational. Rather than dwelling on mathematical detail, the chapter frames LLMs as practical building blocks inside broader machine-learning systems and sets the book’s goal: guiding readers through real-world uses of these models and how to build effective applications around them.

At their core, LLMs learn probabilistic patterns of language from vast text corpora and use that knowledge to predict and generate coherent, context-aware text. The chapter explains pretraining (learning general language patterns) and fine-tuning (specializing to domains), and highlights the immense data and compute required—often distributed training on GPUs or TPUs. Beyond text-only systems, it notes the rise of multimodal models that integrate text, images, and audio for more human-like perception and response. A tour of applications spans conversational assistants, text and code generation, retrieval and classification, recommendations, content editing, and autonomous, agent-like task execution. Special attention is given to Retrieval-Augmented Generation, which couples targeted document lookup with generation to produce more grounded, up-to-date answers from curated knowledge sources.

The chapter also surveys the costs and constraints that come with scale—training time, infrastructure needs, and deployment considerations—alongside core risks such as data bias, ethical misuse, limited interpretability, and hallucinations, all of which necessitate careful validation and governance. It outlines the practical “anatomy” of an LLM application, from defining use cases and data pipelines to selecting hardware, tuning strategies, and orchestration. Finally, it sketches the startup landscape catalyzed by LLMs: lightweight wrappers, infrastructure providers (e.g., vector databases and LLM frameworks), and capital-intensive model labs competing at the frontier. The throughline is pragmatic: the book will focus on building robust, context-aware applications—especially with techniques like RAG—so readers can translate LLM capabilities into reliable, real-world solutions.

An output for a given prompt using ChatGPT
Rose Goldberg’s famous self-operation napkin constructing an LLM application demands a thoughtful orchestration of resources, from computational power to application definition, echoing the complexity of Rube Goldberg's contraptions.
A Python code snippet demonstrating how to use the Ares API to retrieve information about taco spots in San Francisco using the internet. Instead of just showing URLs, the API returns actual answers with web URLs as source
Retrieval Augmentation Generation is used to enhance the capabilities of LLMs, especially in generating relevant and contextually appropriate responses. The approach involves incorporating an initial retrieval step before generating a response to leverage information from a knowledge base.

Summary

  • Large language models (LLMs) are the latest breakthrough in natural language processing after statistical models and deep learning. LLMs stand on the shoulders of this prior research but take language understanding to new heights through scale.
  • Pretrained on massive text corpora, LLMs like GPT-3 capture broad knowledge about language in their model parameters. This allows them to achieve state-of-the-art performance on language tasks.
  • Applications powered by LLMs include text generation, classification, translation, and semantic search to name a few.
  • LLMs utilize multi-billion parameter Transformer architectures. Training such gigantic models requires massive computational resources only recently made possible through advances in AI hardware.
  • Bias and safety are key challenges with large models. Extensive testing is required to prevent unintended model behavior across diverse demographics.
  • Numerous startups are offering LLM model APIs, democratizing access and allowing innovation in the realm of Generative AI.

FAQ

What is a Large Language Model (LLM) and how does it generate text?An LLM is a deep learning model trained on vast amounts of text to predict the next word in a sequence. By learning statistical patterns, context, and nuances in language, it can generate coherent, human-like text across many topics.
How are LLMs different from early virtual assistants like Siri or Alexa?Early assistants primarily followed predefined commands and intent schemas within narrow domains. LLMs, by contrast, proactively generate rich, context-aware responses, anticipate conversational turns, and handle open-ended dialogue that often feels more natural and human-like.
What are the main real-world applications of LLMs? - Conversational assistants and chatbots (including RAG-powered systems) - Text and code generation (summarization, translation, creative writing, programming help) - Information retrieval and organization - Language understanding (sentiment, intent, named entity recognition, tutoring) - Recommendation systems - Content creation and editing (clarity, coherence, grammar) - Agent-based task fulfillment (autonomous assistants executing multi-step tasks)
What is Retrieval-Augmented Generation (RAG), and when should I use it?RAG augments an LLM with a retrieval step that pulls relevant information from a targeted corpus before generating an answer. It’s ideal for specialized domains or time-sensitive topics where accuracy and recency matter, though it works best on focused, high-quality collections rather than the entire internet.
How does a typical RAG pipeline work? - Retrieval: Search a selected knowledge base for relevant passages based on the user query. - Candidate selection: Choose the most pertinent snippets or documents. - Context integration: Feed those snippets into the model alongside the original query. - Response generation: Produce a final answer grounded in both the retrieved context and the model’s prior knowledge.
What does the “scale” of LLMs mean, and why does it matter?Scale refers to the enormous amount of training data and parameters these models use. Larger scale enables nuanced, contextually accurate language generation but introduces challenges like high compute requirements, long training times, and operational costs.
How are LLMs trained and fine-tuned, and what resources are required?Training exposes the model to large text corpora to learn next-word prediction by adjusting weights and biases over many iterations, often across distributed GPUs/TPUs for weeks or months. Fine-tuning adapts a pre-trained model to a specific domain (e.g., legal or medical) using targeted data. Broad web-scale datasets (e.g., corpora like Common Crawl) and significant compute are typically required.
Why are multimodal models important, and how do they differ from text-only LLMs?Multimodal models can understand and combine text with other modalities like images and audio, enabling tasks such as visual question answering or captioning. This more closely mirrors human perception and broadens AI’s applicability; an example highlighted is Google’s Gemini.
What are the key challenges and limitations of LLMs?Common issues include data bias, ethical risks (e.g., misleading or harmful content), limited interpretability (black-box behavior), and hallucinations (confident but incorrect answers). Mitigation involves careful data curation, governance, retrieval grounding, and validation/fact-checking pipelines.
How has the rise of LLMs shaped the startup ecosystem?The field spans: (1) application “wrappers” over LLMs (e.g., presentation tools), (2) infrastructure providers like vector databases (Pinecone, Qdrant) and LLM frameworks (LangChain, LlamaIndex), and (3) “GPU-rich” companies training frontier models. Funding varies widely, with infrastructure startups often raising sizable rounds and top-tier model builders securing billions and large GPU fleets (e.g., H100s).

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build an Advanced RAG Application (From Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build an Advanced RAG Application (From Scratch) ebook for free