Overview

1 Big Picture, What is GPT?

Artificial intelligence has surged into everyday conversation, largely due to ChatGPT, and this chapter sets the stage by explaining what GPT is and why it matters. It introduces GPT as a Generative Pre-trained Transformer, situating it within the broader category of large language models that create new content based on patterns learned from vast text data. The authors aim to demystify how these systems work in plain language, clarifying what they can and cannot do and why. The discussion emphasizes that ChatGPT’s impact stems from scale—models that are far larger and trained on far more data than earlier approaches—enabling convincing conversation, summarization, and instruction following.

At a high level, GPTs are language models from the field of natural language processing that learn statistical relationships among words, not human-like understanding. They rely on neural networks—especially Transformers—to encode and generate text, and their effectiveness arises from massive datasets, enormous parameter counts, and specialized hardware. The chapter contrasts human, interactive language learning with the model’s static training process, stressing that the learned representation is powerful but fallible and steerable. A central lesson is that “bigger often beats clever”: with LLMs, performance gains frequently come more from more data and larger models than from intricate algorithmic tweaks. Training such systems is computationally expensive, so the book focuses on durable concepts rather than fast-changing implementation details.

The chapter also frames both the promise and the risks. GPTs can accelerate tasks like summarization, content creation, and dialogue, yet they can make confident mistakes, fail at seemingly simple problems, and be manipulated into harmful outputs. Responsible use requires skepticism, validation, and thoughtful system design that anticipates failure modes and ethical concerns. The book targets a broad audience with minimal coding and math prerequisites, offering the vocabulary and mental models needed to evaluate applications, understand limitations, and decide when to use or avoid LLMs. By the end, readers should be equipped to engage with GPTs’ capabilities and constraints in real-world contexts.

A simple Haiku generated by ChatGPT
figure
Generative AI is about taking some input (numbers, text, images) and producing a new output (usually text or images). Any combination of input/output options is possible, and the nature of the output depends on what the algorithm was trained for. It could be to add detail, re-write something to be shorter, extrapolate missing portions, and more.
figure
A high-level map of the various terms you’ll become familiar with and how they relate. Generative AI is a description of functionality: the function of generating content and using techniques from AI to accomplish that goal.
figure
When you sign up for OpenAI’s ChatGPT, you have two options: the GPT-3.5 model that you can use for free or the GPT-4 model that costs money.
figure
If the cleverness of an algorithm is based on how much information you encode into the design, older techniques often increase performance by being more clever than their predecessors. As reflected by the size of the circles, LLMs have mostly chosen a “dumber” approach of just using more data and parameters - imposing minimal constraints on what the algorithm can learn.
figure

Summary

  • ChatGPT is one type of Large Language Model, which is itself in the larger family of Generative AI/ML. Generative models produce new output, and LLMs are unique in the quality of their output but are extremely costly to make and use.
  • ChatGPT is loosely patterned after an incomplete understanding of human brain function and language learning. This is used as inspiration in design, but it does not mean ChatGPT has the same abilities or weaknesses as humans.
  • Intelligence is a multi-faceted and hard-to-quantify concept, making it difficult to say if LLMs are intelligent. It is easier to think about LLMs and their potential use in terms of capabilities and reliability.
  • Human language must be converted to and from an LLM’s internal representation. How this representation is formed will change what an LLM learns and influence how you can build solutions using LLMs.

FAQ

What does GPT stand for, and what does each word imply?GPT stands for Generative Pre-trained Transformer. Generative means the model produces new content (like text) based on patterns learned from data. Pre-trained indicates it is first trained on very large text collections before being adapted for use. Transformer is the core neural network component that makes modern LLMs possible; later chapters explain it in detail.
What is generative AI, and how does ChatGPT fit in?Generative AI creates new media—text, images, audio, or video—by learning from vast amounts of prior data and feedback about desirable outputs. ChatGPT is a generative model focused primarily on text: given a prompt, it produces novel, human-like responses. Although centered on text, related systems can mix inputs and outputs across modalities.
How are AI, NLP, LLMs, Transformers, and ChatGPT related?Artificial Intelligence is the broad field; Natural Language Processing (NLP) focuses on human language within AI. Large Language Models (LLMs) are NLP models trained on massive text corpora, and Transformers are the main building block of today’s LLMs. ChatGPT is a product built on an LLM that uses Transformer architecture.
Why are these models called “large,” and how large is ChatGPT?They are “large” because they contain enormous numbers of learned parameters and are trained on huge datasets. ChatGPT is rumored to have about 1.76 trillion parameters—on the order of seven terabytes just to hold the model in memory—requiring many high-end GPUs and multi-machine infrastructure. By contrast, more conventional language models can be a few gigabytes.
Why do GPTs and other LLMs perform so well compared to older methods?Recent gains come less from clever new tricks and more from scaling up data and model size. LLMs use relatively simple mechanisms to capture relationships in text, but at massive scale this “brute-force” approach outperforms many older, carefully engineered algorithms. The trade-off is that such models can be costly and complex to deploy.
How do humans and machines represent language differently?Humans learn language interactively over time and use rich, innate and social cues; our understanding is flexible and context-driven. LLMs learn a static representation from large text datasets via neural networks that approximate patterns in language. Their representations can be powerful but are imperfect, and we can shape or constrain their behavior through design.
What can ChatGPT do well today?It excels at tasks like dialogue, instruction following, summarization, question answering, and content creation. Its strong language representations enable outputs that often feel human-written. New applications continue to emerge as people rethink workflows around these capabilities.
What are the main limitations and risks when using ChatGPT?LLMs can fail in surprising ways—even on simple tasks—and may produce confident but incorrect or unsafe outputs. Safety systems can sometimes be bypassed, and small error rates can scale to large harms with millions of users. Effective use demands skepticism, verification, and thoughtful design, along with attention to ethical and societal impacts.
Can I train my own LLM? What resources would I need?Training a modern LLM is expensive and resource-intensive. Even modest efforts can cost ≥$100,000, and competing with state-of-the-art systems would require investments on the order of hundreds of millions of dollars plus significant infrastructure. Because tools evolve rapidly, this book emphasizes durable concepts over step-by-step training code.
Who is this book for, and what will I learn?It targets a broad audience—technical and non-technical—assuming minimal coding and math background. You’ll gain a practical vocabulary, a big-picture understanding of how LLMs work, what they can and can’t do, and how to use or deploy them responsibly. The first half explains LLM mechanics and outputs; the second half focuses on human factors, risks, and ethics.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • How Large Language Models Work ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • How Large Language Models Work ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • How Large Language Models Work ebook for free