Overview

1 Large language models: The foundation of generative AI

Large language models burst into public awareness with the release of ChatGPT, revealing how convincingly machines can converse, write, and reason across many domains. This chapter introduces LLMs as general-purpose systems that have rapidly reshaped natural language processing and begun influencing education, work, creativity, and communication. It sets expectations for both promise and pitfalls, arguing that a practical understanding of how these models function is essential to using them well and navigating the societal debates they provoke.

The narrative traces NLP’s evolution from brittle rule-based systems to data-driven statistical methods and then to deep learning, culminating in transformers—models built on attention that capture long-range context efficiently and at scale. Pretraining on vast unlabeled corpora followed by fine-tuning enabled GPT, BERT, and successors to generalize across tasks through next-token prediction and self-supervision. As a result, LLMs power a wide array of applications: dialogue and language modeling, question answering and reading comprehension, translation and summarization, coding assistance, content generation, and emerging forms of mathematical and scientific reasoning. Their flexibility, multimodality, and capacity gains have unlocked capabilities once considered out of reach for machine language systems.

The chapter also examines where LLMs fall short and how the ecosystem is responding. Training data can embed and amplify social biases; fluent outputs can contain confident falsehoods (hallucinations); and the energy, cost, and compute concentration raise sustainability and access concerns. Industry strategies diverge: rapid capability scaling and multimodal releases (OpenAI), foundational research and product integration (Google), open-access model families (Meta), enterprise-wide assistants built on partnerships (Microsoft), and safety-centered alignment approaches (Anthropic), alongside rising players like DeepSeek, Mistral, Cohere, Perplexity, Stability, Midjourney, and Runway. The chapter closes with a balanced outlook: progress is accelerating and transformative, but realizing its benefits responsibly will require sustained attention to privacy, bias, accountability, and safety.

The reinforcement learning cycle
The distribution of attention for the word “it” in different contexts.
A timeline of breakthrough events in NLP.
Representation of word embeddings in the vector space

Summary

  • The history of NLP is as old as computers themselves. The first application that sparked interest in NLP was machine translation in the 1950s, which was also the first commercial application released by Google in 2006.
  • Transformer models and the debut of the attention mechanism were the biggest NLP breakthroughs of the decade. The attention mechanism attempts to mimic attention in the human brain by placing “importance” on the most relevant information.
  • The boom in NLP from the late 2010s to early 2020s is due to the increasing availability of text data from around the internet and the development of powerful computational resources. This marked the beginning of the LLM.
  • Today’s LLMs are trained primarily with self-supervised learning on large volumes of text from the web and are then fine-tuned with reinforcement learning.
  • GPT, released by OpenAI, was one of the first general-purpose LLMs designed for use with any natural language task. These models can be fine-tuned for specific tasks and are especially well-suited for text-generation applications, such as chatbots.
  • LLMs are versatile and can be applied to various applications and use cases, including text generation, answering questions, coding, logical reasoning, content generation, and more. Of course, there are also inherent risks, such as encoding bias, hallucinations, and emission of sizable carbon footprints.
  • In January 2023, OpenAI’s ChatGPT set a record for the fastest-growing user base in history and set off an AI arms race in the tech industry to develop and release LLM-based conversational dialogue agents. As of 2025, the most significant LLMs have come from OpenAI, Google, Meta, Microsoft, and Anthropic.

FAQ

What is a large language model (LLM), and why did ChatGPT’s launch matter?LLMs are neural network models—typically transformer-based—that learn to predict the next token in context using massive text corpora. ChatGPT’s public release in late 2022 let anyone converse with such a model, showcasing capabilities like explanation, drafting, and coding, and catalyzing mainstream awareness and adoption, even though it was the product of steady progress rather than a single breakthrough.
How do transformers and the attention mechanism work?Attention lets a model weigh the relevance of different tokens when generating or interpreting a sequence. Transformers rely on self-attention to capture long-range dependencies while enabling parallel computation, which dramatically improved speed and performance over prior sequence models and set new records in tasks like machine translation.
How has NLP evolved from early systems to LLMs?- Rule-based era: hand-crafted grammars and heuristics (brittle and labor-intensive).
- Statistical era: data-driven methods using parallel corpora and probabilistic models.
- Neural/deep learning era: large neural networks trained on vast data; transformers enabled today’s LLMs.
What kinds of machine learning do NLP systems use?- Supervised learning: maps labeled inputs to outputs (e.g., translation pairs).
- Unsupervised/self-supervised learning: learns patterns from unlabeled text (e.g., next-token prediction, masked tokens).
- Reinforcement learning: optimizes behavior via rewards/penalties; modern LLMs often combine approaches during training and alignment.
What are pretraining, fine-tuning, and tokenization in LLMs?- Pretraining: models learn general language patterns from large unlabeled text.
- Fine-tuning: adapting a pretrained model to a specific task or style with smaller, task-focused data.
- Tokenization: splitting text into tokens (words/subwords) so models can encode inputs and decode outputs.
What are the most common applications of LLMs discussed in the chapter?- Language modeling and text generation (chat, autocomplete, style control).
- Question answering (extractive, open-book generative, closed-book).
- Coding assistance (suggestions, scaffolding, comments-to-code).
- Content generation (articles, marketing copy, emails, social posts).
- Logical and commonsense reasoning (math, science, multi-step problems).
- Machine translation and text summarization (extractive and abstractive).
What are hallucinations, and why do LLMs produce them?Hallucinations are fluent but incorrect statements. They stem from predictive text generation (not grounded understanding), imperfections or gaps in training data, and adversarial or ambiguous prompts. As outputs get longer, the space of possible continuations grows, making strict factuality harder to guarantee without added safeguards.
How do bias and training data quality affect LLM behavior?Training data drawn from the web can include harmful language, stereotypes, and historical inequities. Models internalize these patterns, leading to disparate outputs across identity attributes (e.g., gender, race). Even earlier word-embedding work showed such stereotypes; mitigating them in large models remains challenging and imperfect.
What are the costs and sustainability concerns of LLMs?Training and serving LLMs require significant compute (GPUs/TPUs), money, and energy, with associated carbon emissions. Inference at scale can rival or exceed training energy use. These demands advantage large firms with data centers, spurring efforts toward efficiency, smaller/open models, and techniques that reduce compute while preserving capabilities.
Who are the major players, and how do their approaches differ?- OpenAI: rapid multimodal advancement (GPT-4/4o, Sora, o1), strong Microsoft partnership.
- Google: foundational transformer research; Gemini and ecosystem integration with a principle-led posture.
- Meta: open-access strategy (Llama family) enabling on-device and researcher use.
- Microsoft: broad “Copilot” product integration; early chatbot lessons and enterprise focus.
- Anthropic: safety-forward “Constitutional AI” and Claude series.
- Others: DeepSeek (efficiency/MoE), Cohere (enterprise), Perplexity (AI search), Mistral (efficient open models), xAI/Grok, plus image/video leaders like Midjourney, Stability AI, and Runway.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Introduction to Generative AI, Second Edition ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Introduction to Generative AI, Second Edition ebook for free