How Large Language Models Work you own this product

Edward Raff, Drew Farris and Stella Biderman for Booz Allen Hamilton

June 2025
ISBN 9781633437081
200 pages

Included with a Manning Online subscription

printed in black & white

available in Simplified Chinese

catalog / Data Science / AI

resources: Book forum Register your pBook for a free eBook

table of content

1 Big picture: What are LLMs?

1.1 Generative AI in context

1.2 What you will learn

1.3 Introducing how LLMs work

1.4 What is intelligence, anyway?

1.5 How humans and machines represent language differently

1.6 Generative Pretrained Transformers and friends

1.7 Why LLMs perform so well

1.8 LLMs in action: The good, bad, and scary

2 Tokenizers: How large language models see the world

2.1 Tokens as numeric representations

2.2 Language models see only tokens

2.2.1 The tokenization process

2.2.2 Controlling vocabulary size in tokenization

2.2.3 Tokenization in detail

2.2.4 The risks of tokenization

2.3 Tokenization and LLM capabilities

2.3.1 LLMs are bad at word games

2.3.2 LLMs are challenged by mathematics

2.3.3 LLMs and language equity

2.4 Check your understanding

2.5 Tokenization in context

3 Transformers: How inputs become outputs

3.1 The transformer model

3.1.1 Layers of the transformer model

3.2 Exploring the transformer architecture in detail

3.2.1 Embedding layers

3.2.2 Transformer layers

3.2.3 Unembedding layers

3.3 The tradeoff between creativity and topical responses

3.4 Transformers in context

4 How LLMs learn

4.1 Gradient descent

4.1.1 What is a loss function?

4.1.2 What is gradient descent?

4.2 LLMs learn to mimic human text

4.2.1 LLM reward functions

4.3 LLMs and novel tasks

4.3.1 Failing to identify the correct task

4.3.2 LLMs cannot plan

4.4 If LLMs cannot extrapolate well, can I use them?

4.5 Is bigger better?

5 How do we constrain the behavior of LLMs?

5.1 Why do we want to constrain behavior?

5.1.1 Base models are not very usable

5.1.2 Not all model outputs are desirable

5.1.3 Some cases require specific formatting

5.2 Fine-tuning: The primary method of changing behavior

5.2.1 Supervised fine-tuning

5.2.2 Reinforcement learning from human feedback

5.2.3 Fine-tuning: The big picture

5.3 The mechanics of RLHF

5.3.1 Beginning with a naive RLHF

5.3.2 The quality reward model

5.3.3 The similar-but-different RLHF objective

5.4 Other factors in customizing LLM behavior

5.4.1 Altering training data

5.4.2 Altering base model training

5.4.3 Altering the outputs

5.5 Integrating LLMs into larger workflows

5.5.1 Customizing LLMs with retrieval augmented generation

5.5.2 General-purpose LLM programming

6 Beyond natural language processing

6.1 LLMs for software development

6.1.1 Improving LLMs to work with code

6.1.2 Validating code generated by LLMs

6.1.3 Improving code via formatting

6.2 LLMs for formal mathematics

6.2.1 Sanitized input

6.2.2 Helping LLMs understand numbers

6.2.3 Math LLMs also use tools

6.3 Transformers and computer vision

6.3.1 Converting images to patches and back

6.3.2 Multimodal models using images and text

6.3.3 Applicability of prior lessons

7 Misconceptions, limits, and eminent abilities of LLMs

7.1 Human rate of learning vs. LLMs

7.1.1 The limitations on self-improvement

7.1.2 Few-shot learning

7.2 Efficiency of work: A 10-watt human brain vs. a 2000-watt computer

7.2.1 Power

7.2.2 Latency, scalability, and availability

7.2.3 Refinement

7.3 Language models are not models of the world

7.4 Computational limits: Hard problems are still hard

7.4.1 Using fuzzy algorithms for fuzzy problems

7.4.2 When close enough is good enough for hard problems

8 Designing solutions with large language models

8.1 Just make a chatbot?

8.2 Automation bias

8.2.1 Changing the process

8.2.2 When things are too risky for autonomous LLMs

8.3 Using more than LLMs to reduce risk

8.3.1 Combining LLM embeddings with other tools

8.3.2 Designing a solution that uses embeddings

8.4 Technology presentation matters

8.4.1 How can you be transparent?

8.4.2 Aligning incentives with users

8.4.3 Incorporating feedback cycles

9 Ethics of building and using LLMs

9.1 Why did we build LLMs at all?

9.1.1 The pros and cons of LLMs doing everything

9.1.2 Do we want to automate all human work?

9.2 Do LLMs pose an existential risk?

9.2.1 Self-improvement and the iterative S-curve

9.2.2 The alignment problem

9.3 The ethics of data sourcing and reuse

9.3.1 What is fair use?

9.3.2 The challenges associated with compensating content creators

9.3.3 The limitations of public domain data

9.4 Ethical concerns with LLM outputs

9.4.1 Licensing implications for LLM output

9.4.2 Do LLM outputs poison the well?

9.5 Other explorations in LLM ethics

References

Overview

5 How do we constrain the behavior of LLMs?

Constraining LLM behavior makes them more useful because base models simply continue text and can drift off-topic, produce undesirable content, or violate strict formatting needs. The chapter explains why constraints are essential and outlines four levers for control: curate training data, alter the base training process, fine-tune after pretraining, and post-process outputs with code. Fine-tuning is emphasized as the most practical and impactful approach, turning a general “base” or “foundation” model into an instruction-following system tailored to specific tasks. Motivations include keeping models safe and on-task, coping with missing or new information, and meeting rigid output formats that probabilistic decoding alone cannot guarantee. No single method is perfect, so practitioners typically layer techniques to achieve reliability.

Supervised fine-tuning (SFT) extends next-token training on high-quality, task-specific examples to inject domain knowledge and style, but it does not change the model’s incentives and can suffer from catastrophic forgetting and privacy risks. Reinforcement Learning from Human Feedback (RLHF) tackles abstract goals like helpfulness and harmlessness by training a reward model from human-rated examples, then optimizing the LLM to maximize predicted quality while staying close to base behavior via an explicit similarity constraint. This balance stabilizes learning and reduces reward hacking, but RLHF is data- and compute-intensive, works best on known issues, and does not add new reasoning capabilities. In practice it is often combined with SFT and careful prompt design to create usable chatbots that avoid many base-model failure modes, while still requiring continuous evaluation.

Beyond fine-tuning, behavior can be shaped by curating data (quality, diversity, and tokenization choices), modifying base training to protect privacy (such as with differential privacy), and enforcing constraints at inference time via decoding rules, guardrails, and schema-aware validators that regenerate tokens on parse errors. Practical systems also integrate LLMs into broader workflows, notably Retrieval-Augmented Generation, which retrieves relevant documents and conditions the model on them to improve factuality and transparency. Emerging tools for LLM “programming” help orchestrate multi-step pipelines, automate prompt construction and tuning, and make it easier to swap models or data sources. The overarching theme is to combine data, training, and runtime controls with rigorous testing to align outputs to task, safety, and formatting requirements.

There are four places where one may intervene to change or constrain an LLM’s behavior. The two stages of model training are shown in the middle of the diagram, where the model’s parameters are altered. On the left, one could also alter the training data before model training. On the right, one could intercept the model outputs after model training and write code to handle specific situations.

Commercial LLMs like ChatGPT are designed to follow instructions (within some limits) and can perform a lot of low-cognition or pattern-matching tasks with very high efficacy. This includes stylized writing, such as pattern matching, or instruction following, such as roleplaying as a care salesman.

Supervised Fine Tuning (SFT) is a simple approach to improving model results. You repeat the same process used to build the base model. Once the base model is trained on a large amount of general data, you continue training on the smaller specialized data collection.

RL is about iterative interactions, where the “reward” for your actions may not materialize for a long time and requires multiple steps to achieve. For a chatbot like GPT, the “environment” is the conversation with a user, and the “actions” are the infinite possible texts that GPT might complete. The reward becomes, in some sense, the user’s satisfaction with the chatbot at the end of the conversation.

RLHF is quite good at getting LLMs to avoid known, specific issues. However, it does not endow the model with new tools to handle novel issues. The desire to talk about the Miami Dolphins as the logical thing to say next after asking about football in Miami violates the first request to avoid ever mentioning dolphins.

A naive and incomplete version of RLHF. The dashed lines represent text being sent from one component to another. Since text is incompatible with gradient decedent, a more difficult RL algorithm must be used instead. This allows us to alter the weights of the LLM based on a quality score for the LLM’s outputs.

The reward model is trained like a standard supervised classification algorithm. A neural network, which could be an LLM itself or another simpler network like a convolutional or recurrent neural network, is trained to predict how a human would score a prompt completion pair. Because neural networks are differentiable, this training works and provides a tool that stands in as the “human” in RLHF.

The full version of RLHF. The dashed lines are text and require reinforcement learning to update the parameters. The Original LLM is the base model without any alterations, while the LLM to fine-tune starts as the base model but is altered to improve the quality of its outputs. The similarity and quality reward components are provided with word probabilities to improve calculation. RL adjusts the parameters by combining the quality and similarity scores.

In addition to fine-tuning, one can change the model’s behavior by altering the training data, altering the base model training process, or modifying the model outputs by writing code to handle specific situations.

By writing code that enforces a format specification, you can catch invalid output from an LLM as it is being generated. Once detected, simply having the LLM produce the next most likely token until a valid output is found is a simple way to improve the situation.

On the left, we show the normal use of an LLM of a user asking about how to write JSON. LLMs naturally have the chance of producing errant outputs, which we want to minimize. On the right, we show the RAG approach. By using a search engine, we can find documents that are relevant to a query and combine them into a new prompt, giving the LLM more information and context to produce a better answer.

Summary

There are four places you can intervene to change a model’s behavior: the data collection/tokenization, training the initial base model, fine-tuning the base model, and intercepting the predicted tokens. All four places are important, but fine-tuning is the most effective place for most users to make a change for both lower cost and the ability to change the model’s goals.
Supervised Fine-Tuning (SFT) performs the normal training process on a smaller bespoke data collection and is useful for refining the model’s knowledge of a particular domain.
Reinforcement Learning from Human Feedback (RLHF) requires more data, but allows us to specify objectives more complex than “predict the next token”.
You can use existing tools like syntax checkers to detect incorrect LLM outputs in cases where the output format must be strict, such as for JSON or XML. Generation and syntax checking can be run in a loop until the output satisfies the necessary syntax constraints.
Retrieval Augmented Generation is a popular method of augmenting the input of an LLM by first finding relevant content via a search engine or database and inserting it into the prompt.
Coding frameworks like DSPy are beginning to emerge that separate the specific LLM, vectorization, and prompt definition from the logic of how inputs and outputs from the LLM are modified for a specific task. This allows you to build more reliable and repeatable LLM solutions that can quickly adapt to new models and methods.

FAQ

Why is it necessary to constrain an LLM’s behavior?

LLMs are trained to continue text, not to follow goals. Without constraints, they can go off-topic, produce unsafe or legally risky content, or fail to meet task requirements. Constraining behavior aligns outputs with intended use (e.g., a car-sales bot staying on script).

What are the four places we can constrain an LLM?

There are four intervention points: 1) curate training data before pre-training, 2) alter the base model training process, 3) fine-tune the model (e.g., SFT, RLHF), and 4) post-process or intercept outputs with code after training.

Why is fine-tuning the primary method for changing behavior?

Fine-tuning updates model parameters to add knowledge and align behavior using far less data and cost than pre-training. It is widely supported (open- and closed-source) and can be layered on top of base models to achieve instruction following and domain specificity.

How does supervised fine-tuning (SFT) work, and what is it good for?

SFT continues next-token training on high-quality, domain-specific text (manuals, transcripts, scripts). It’s excellent for injecting new knowledge and adapting to a domain but is less effective for abstract rules like “be polite” or “refuse unsafe requests.”

What are the pitfalls of fine-tuning?

Key risks include catastrophic forgetting (new training overwrites prior knowledge), data leakage or privacy exposure (the model may reproduce fine-tuning data), and the need to balance specialization with general capabilities. Fine-tuning is not purely additive.

What is RLHF and how does it constrain behavior?

RLHF uses human feedback (via a learned reward model) to score outputs and adjust the LLM toward helpful, safe, and instruction-following behavior. A second “similarity” objective keeps the model close to base behavior to prevent reward hacking and gibberish. It’s powerful but data- and compute-intensive and can be brittle on novel cases.

Why aren’t base models very usable out of the box?

Base models are optimized only for next-token prediction; they aren’t trained to be chatbots, stay on topic, or avoid harmful content. Without additional alignment (e.g., fine-tuning/RLHF), they can be unhelpful or unsafe.

How can we enforce strict output formats like JSON?

Use decoding-time constraints and validators that parse partial output and force regeneration on errors. You can also intercept tokens, apply “go/no-go” word filters, and delay responses to run checks before sending content to users.

How do data curation and base training choices affect behavior and safety?

Careful curation reduces harmful language and misinformation but must preserve enough examples to recognize and reject bad content. Tokenization choices are locked in at training. Techniques like differential privacy can mitigate training-data leakage at some performance and cost trade-off.

What is Retrieval Augmented Generation (RAG), and when should I use it?

RAG retrieves relevant documents and feeds them with the query to the LLM, improving factuality and enabling citations. It reduces hallucinations by grounding answers in sources, but its quality depends on the search/index. Tools like DSPy help build robust multi-step pipelines that combine retrieval, prompting, and validation.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$54.99 $34.64

you save $20.35 (37%)

include audio $19.99 $12.59

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$54.99 $34.64

you save $20.35 (37%)

include audio $19.99 $12.59

eBook

pdf, ePub, online

$54.99 $34.64

you save $20.35 (37%)

include audio $19.99 $12.59

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more