Overview

1 Large language models: The foundation of generative AI

Generative AI moved from research labs to everyday life with the release of ChatGPT, which showcased how large language models (LLMs) can converse, create, and assist across countless tasks. This chapter situates LLMs as the central technology behind the current AI wave, pairing their impressive fluency and versatility with a pragmatic look at their limitations and risks. It sets the stage for the rest of the book by offering a clear, intuitive grounding in what LLMs are, why they matter, and how to use them effectively and responsibly in work, education, and society.

The chapter traces the evolution of natural language processing from brittle rule-based systems to statistical methods and then to deep neural networks, culminating in the transformer architecture and attention mechanism that enabled today’s scaling breakthroughs. With self-supervised pretraining and task-specific fine-tuning, models like GPT and BERT learned general-purpose language representations that transfer across applications. The result is a single family of models that can power dialogue agents, question answering, translation, summarization, content and code generation, and even elements of mathematical and scientific reasoning—often with surprising, emergent capabilities and growing multimodal competence.

Alongside these advances, the chapter details critical challenges: bias inherited from internet-scale training data, the tendency to hallucinate confident but incorrect answers, and the environmental and economic costs of training and serving massive models. It also surveys the rapidly shifting ecosystem—OpenAI, Google, Meta, Microsoft, Anthropic, and rising players such as DeepSeek, Mistral, Cohere, Perplexity, and others—highlighting divergent strategies around capability, safety, openness, and deployment. The takeaway is a balanced framework: LLMs are transformational yet imperfect, accelerating quickly while demanding careful governance, evaluation, and responsible integration into real-world systems.

The reinforcement learning cycle
The distribution of attention for the word “it” in different contexts.
A timeline of breakthrough events in NLP.
Representation of word embeddings in the vector space

Summary

  • The history of NLP is as old as computers themselves. The first application that sparked interest in NLP was machine translation in the 1950s, which was also the first commercial application released by Google in 2006.
  • Transformer models and the debut of the attention mechanism were the biggest NLP breakthroughs of the decade. The attention mechanism attempts to mimic attention in the human brain by placing “importance” on the most relevant information.
  • The boom in NLP from the late 2010s to early 2020s is due to the increasing availability of text data from around the internet and the development of powerful computational resources. This marked the beginning of the LLM.
  • Today’s LLMs are trained primarily with self-supervised learning on large volumes of text from the web and are then fine-tuned with reinforcement learning.
  • GPT, released by OpenAI, was one of the first general-purpose LLMs designed for use with any natural language task. These models can be fine-tuned for specific tasks and are especially well-suited for text-generation applications, such as chatbots.
  • LLMs are versatile and can be applied to various applications and use cases, including text generation, answering questions, coding, logical reasoning, content generation, and more. Of course, there are also inherent risks, such as encoding bias, hallucinations, and emission of sizable carbon footprints.
  • In January 2023, OpenAI’s ChatGPT set a record for the fastest-growing user base in history and set off an AI arms race in the tech industry to develop and release LLM-based conversational dialogue agents. As of 2025, the most significant LLMs have come from OpenAI, Google, Meta, Microsoft, and Anthropic.

FAQ

What is a large language model (LLM), and why was ChatGPT a turning point?LLMs are neural-network models trained on massive text corpora to predict the next token and generate human-like language. ChatGPT, released in November 2022, brought LLMs to a mass audience with an easy web interface, showcasing capabilities like dialogue, summarization, and code generation and catalyzing rapid adoption and investment across the industry.
How did NLP evolve to enable today’s LLMs?NLP progressed from brittle rule-based systems to data-driven statistical methods, then to neural networks and deep learning. The major leap was the transformer architecture, which replaced sequential processing with attention-based, parallel computation—making training on far larger datasets feasible and unlocking modern LLM performance.
What is “attention,” and why are transformers so effective?Attention lets a model focus on the most relevant parts of an input sequence when generating each token. Transformers use self-attention across the whole context to capture long-range dependencies while computing in parallel, yielding faster training and state-of-the-art results across tasks like translation and text generation.
How are LLMs trained—supervised, unsupervised, or reinforcement learning?LLMs primarily use self-supervised learning (a form of unsupervised learning) by predicting masked or next tokens from unlabeled text. They can be combined with supervised fine-tuning on task-specific data and reinforcement learning techniques that apply rewards/penalties to shape outputs toward desired behaviors.
What is pre-training and fine-tuning?Pre-training teaches a model broad language patterns from huge unlabeled datasets. Fine-tuning then adapts that general model to specific tasks or domains (for example, coding assistance or summarization) using smaller, targeted datasets—leveraging what the model already “knows” without training from scratch.
What are the most common applications of LLMs?Common uses include: - Language modeling and text generation (chatbots, drafting, autocomplete) - Question answering (extractive, open-book generative, and closed-book) - Coding assistance (code suggestions, explanation, testing) - Content generation (marketing copy, articles, emails, social posts) - Logical and mathematical reasoning (step-by-step problem solving) - Translation, summarization, grammar correction, and learning tools
Where do LLMs fall short?Key limitations include hallucinations (fluent but incorrect statements), vulnerability to adversarial prompts, inconsistent reasoning in complex cases, and difficulty guaranteeing factuality as outputs get longer. These issues stem from learning statistical patterns rather than grounded understanding.
How do training data and bias affect LLM behavior?LLMs reflect patterns in their training data, which can include stereotypes, toxic language, misinformation, personal data, and copyrighted material. Bias can emerge as disparate outputs across identity attributes (e.g., gender, race). Debiasing is challenging because societal patterns are deeply encoded in text.
What are the environmental and compute considerations of LLMs?Training and serving LLMs require substantial compute (GPUs/TPUs), money, and energy. Estimates suggest significant carbon emissions for large runs, and inference can also be energy-intensive at scale. This favors well-resourced organizations, though research on efficiency and smaller open models is accelerating.
Who are the major players, and how do their strategies differ?Highlights: - OpenAI: Rapid multimodal releases (GPT-4/4o, Sora, o1); Microsoft partnership; strong focus on capability and deployment. - Google/DeepMind: Invented transformers; Gemini and Project Astra; integrates AI across Search and devices with stated AI Principles. - Meta: Open-access approach (Llama series) enabling on-device and research use; broad product integration. - Microsoft: Deep OpenAI integration; Copilot across products; emphasis on enterprise and consumer workflows. - Anthropic: “Constitutional AI,” Claude models; safety-first posture with major backing from Amazon and Google. - Others: DeepSeek (efficient MoE models), Cohere (enterprise focus), Perplexity (AI search), Mistral (efficient open models), xAI (Grok), Stability, Midjourney, and Runway (image/video).

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Introduction to Generative AI, Second Edition ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Introduction to Generative AI, Second Edition ebook for free