Overview
1 Large language models: The foundation of generative AI
Large language models burst into public view with ChatGPT, revealing how generative AI can converse, write, summarize, and assist across domains with seemingly human fluency. This chapter frames LLMs as the new foundation of NLP, explains why they matter, and sets expectations: they are powerful, broadly useful, and rapidly improving, yet imperfect and occasionally unreliable. It emphasizes the need for practical literacy—understanding how LLMs work, where they excel, where they fail, and how to use them responsibly—so individuals and organizations can harness benefits while avoiding common pitfalls.
The chapter traces NLP’s evolution from rule-based systems to statistical learning and then to deep neural networks, culminating in transformers and the attention mechanism that enabled today’s scale and performance. With self-supervised pretraining and task-specific fine-tuning, models like GPT and BERT learned general representations that transfer across tasks. This foundation unlocked wide-ranging applications: language modeling and text generation; open- and closed-book question answering and reading comprehension; assistive coding and pair programming; content creation for marketing, media, and communication; reasoning in math and science with step-by-step methods; and classic NLP tasks such as translation and summarization. The result is a flexible, general-purpose capability that can be adapted to many workflows and products.
Alongside these advances, the chapter foregrounds critical limitations and externalities: biases absorbed from web-scale data, difficulties controlling outputs and preventing hallucinations, and the financial, environmental, and access costs of training and serving large models. It surveys the competitive landscape—OpenAI, Google, Meta, Microsoft, Anthropic, and a growing set of challengers in both enterprise and open-source communities—each pursuing different trade-offs among capability, safety, cost, and accessibility. The chapter closes by arguing that while generative AI is accelerating and permeating everyday tools, durable value will depend on responsible design, governance, and user proficiency in steering LLMs toward reliable, safe, and socially beneficial outcomes.
Summary
- The history of NLP is as old as computers themselves. The first application that sparked interest in NLP was machine translation in the 1950s, which was also the first commercial application released by Google in 2006.
- Transformer models and the debut of the attention mechanism were the biggest NLP breakthroughs of the decade. The attention mechanism attempts to mimic attention in the human brain by placing “importance” on the most relevant information.
- The boom in NLP from the late 2010s to early 2020s is due to the increasing availability of text data from around the internet and the development of powerful computational resources. This marked the beginning of the LLM.
- Today’s LLMs are trained primarily with self-supervised learning on large volumes of text from the web and are then fine-tuned with reinforcement learning.
- GPT, released by OpenAI, was one of the first general-purpose LLMs designed for use with any natural language task. These models can be fine-tuned for specific tasks and are especially well-suited for text-generation applications, such as chatbots.
- LLMs are versatile and can be applied to various applications and use cases, including text generation, answering questions, coding, logical reasoning, content generation, and more. Of course, there are also inherent risks, such as encoding bias, hallucinations, and emission of sizable carbon footprints.
- In January 2023, OpenAI’s ChatGPT set a record for the fastest-growing user base in history and set off an AI arms race in the tech industry to develop and release LLM-based conversational dialogue agents. As of 2025, the most significant LLMs have come from OpenAI, Google, Meta, Microsoft, and Anthropic.
FAQ
What is a large language model (LLM)?
An LLM is a neural network trained on vast amounts of text to predict the next token in context. Through this self-supervised pretraining, it learns rich internal representations of language that can be adapted to many tasks, from conversation to coding. Modern LLMs use transformer architectures, often with hundreds of billions of parameters, enabling fluent, general-purpose text generation and understanding.
Why was ChatGPT’s release so consequential?
ChatGPT turned powerful LLM capabilities into an easy-to-use web experience, triggering mass adoption within weeks. It showcased how a conversational interface could write, summarize, answer questions, and explain concepts, while also exposing limitations like hallucinations. Although not a single technical breakthrough, it marked a turning point in public awareness and accelerated investment and deployment across industries.
How did NLP evolve from rules to today’s models?
Early NLP relied on handcrafted rules and heuristics, which were brittle and hard to scale. In the 1990s, statistical methods took over, learning patterns from data. With more data and compute, neural networks—and especially deep learning—became dominant. The transformer architecture then enabled efficient training on massive corpora, sparking today’s LLM era.
What is the attention mechanism and why do transformers matter?
Attention lets a model weight the most relevant parts of an input sequence when generating or understanding text. Transformers use self-attention to capture long-range dependencies while allowing parallel computation, making training faster and more scalable. This shift, popularized by “Attention Is All You Need,” enabled state-of-the-art results and paved the way for LLMs like BERT and GPT.
How are LLMs trained and adapted to tasks?
LLMs are pretrained with self-supervised learning (predicting masked or next tokens) on large text corpora, requiring no manual labels. They can then be fine-tuned on smaller, task-specific datasets to specialize—e.g., for sentiment, QA, or code. Many systems also incorporate reinforcement learning to shape behavior, and are evaluated via inference, where the trained model generates outputs.
What can LLMs do in practice?
- Language modeling and text generation (chat, autocomplete, stylistic writing)
- Question answering (extractive, open-book generative, closed-book)
- Coding assistance (code completion, translation from comments to code, test generation)
- Content creation (blogs, emails, marketing copy, news-like articles)
- Reasoning tasks (math, science, common-sense reasoning, with varying reliability)
- Other NLP tasks (machine translation, summarization, grammar correction, learning novel words)
How do LLMs approach question answering?
There are three main approaches: extractive QA (copy the answer from provided context), open-book generative QA (generate an answer using provided context), and closed-book generative QA (answer from the model’s internal knowledge without external context). Reading comprehension benchmarks often combine these skills, testing multi-step understanding and conversational follow-ups.
Where do LLMs fall short?
- Hallucinations: fluent but incorrect or fabricated content
- Bias: reproducing and potentially amplifying societal stereotypes from training data
- Control and safety: susceptibility to adversarial prompts and unpredictable outputs
- Sustainability: substantial compute, energy use, and carbon footprint
- Access and concentration: high costs and specialized hardware favor large organizations
Who are the major players in generative AI and how do they differ?
- OpenAI: rapid, multimodal releases (GPT-4, GPT-4o, Sora), broad consumer and developer focus
- Google: foundational transformer research, Gemini ecosystem, cautious principles-driven rollout
- Meta: open-access Llama models emphasizing accessibility and efficiency
- Microsoft: deep OpenAI partnership, Copilot integration across enterprise products
- Anthropic: safety-forward “Constitutional AI,” Claude models
- Others: Mistral (efficient open models), DeepSeek (MoE efficiency), Cohere (enterprise focus), Perplexity (AI search), xAI (Grok), Stability AI, Midjourney, Runway (image/video)
What is fine-tuning and why is it useful?
Fine-tuning takes a broadly pretrained LLM and adapts it to a specific domain or task using a smaller, targeted dataset. It reuses the model’s general language understanding while aligning outputs to desired formats, styles, or constraints. This approach saves time and cost versus training from scratch and typically yields better performance on the intended use case.