Overview

1 Large Language Models

Amid the current hype cycle around Large Language Models, this chapter offers a clear, pragmatic overview of what these systems can and cannot do, contrasting generalist models with domain-specific ones. It traces the practical tipping point to the public debut of conversational systems while grounding the real breakthrough in the 2017 introduction of Transformers. By shifting from labor-intensive supervised labeling to self-supervised learning on vast corpora, LLMs scaled predictably and acquired emergent abilities, enabling fluent generation and reasoning-like behaviors across tasks such as summarization, coding assistance, and basic problem solving.

Technically, the Transformer’s self-attention and parallelism overcame recurrent models’ bottlenecks, with word embeddings capturing rich syntactic and semantic context. The original encoder-decoder design later specialized into encoder-only models for understanding (BERT-like) and decoder-only models for generation (GPT-like), with reinforcement learning from human feedback refining behavior. As capabilities broadened from natural language to other symbolic formats, an open-source surge expanded options: organizations can fine-tune pretrained models to cut development and training costs, though deployment and inference still demand careful engineering. The chapter positions optimization techniques as essential enablers for running capable models on constrained infrastructure without prohibitive expense.

Relying on closed, generalist LLMs brings concrete risks: data leaving organizational boundaries, potential leakage, opacity about training data and model changes, hallucinations that are hard to audit, and the misuse of code-generation features. Generalist training on largely web-scale, generic data often lacks the depth required in regulated or high-stakes domains. Domain-specific approaches—via transfer learning and careful curation of private, specialized data—can improve accuracy, compliance, privacy, and sustainability, since smaller targeted models reduce resource use and environmental impact. The chapter also reframes alignment beyond blanket rules toward intent and domain-scoped guardrails, concluding that the best business value often comes from specialized, privately deployed models tuned to the tasks and risks that matter most.

Some examples of diverse content an LLM can generate
The timeline of LLMs since 2019 (image taken from paper [2])
Order of magnitude of costs for each phase of LLM implementation from scratch
Order of magnitude of costs for each phase of LLM implementation when starting from a pretrained model
Ratios of data source types used to train some popular existing LLMs
Generic model specialization to a given domain
A LLM trained for tasks on molecule structures (generation and captioning)

Summary

The next chapter introduces you to techniques to specialize pre-trained models on your own data using Hugging Face libraries. In this chapter you have learned:

  • What is a Transformer.
  • Which risks and challenges to consider with generalist and/or closed source LLMs.
  • How to decide when it is the case to prefer a domain specific LLM.
  • Techniques to train and optimize an Open Source LLM on your own data exist and are the core topic of this book.

References

FAQ

What sparked the recent LLM revolution and how did Transformers enable it?

The public release of ChatGPT on November 30, 2022 marked the commercialization tipping point for LLMs, but the core technological shift began in 2017 with the “Attention is All You Need” paper introducing the Transformer. Transformers process entire sequences at once using self-attention and remove recurrence, enabling massive parallelism, faster training, and scalability that earlier architectures lacked. This shift allowed models trained on vast unlabeled corpora to acquire broad linguistic capabilities.

What learning paradigms are relevant to LLMs?
  • Supervised: Train on labeled pairs (input → label), e.g., text categorization.
  • Unsupervised: Discover structure in unlabeled data, e.g., clustering/topic modeling.
  • Semi-supervised: Combine a small labeled set with a large unlabeled one.
  • Reinforcement: Learn via rewards and trial-and-error, e.g., conversational agents.
  • Self-supervised: Create labels from the data itself, e.g., predicting masked/next tokens.
How does self-supervised learning train LLMs without manual labels?

LLMs are trained on huge text corpora where labels are generated programmatically. A common approach is next-token prediction: remove the next word (or token) from a sequence and train the model to predict it, comparing its guess with the original. This avoids the costly human labeling bottleneck and scales training to much larger datasets.

How do Transformers improve over RNN/LSTM/GRU models for text?
  • Parallelism: Transformers process entire sequences simultaneously (no recurrence), dramatically speeding up training.
  • Self-attention: Models relationships between all positions in a sequence, capturing longer-range dependencies better than typical RNNs.
  • Scalability: The architecture scales more effectively to large datasets and model sizes.
What are word embeddings and why are they important?

Word embeddings map words into high-dimensional vectors that capture syntactic and semantic properties. In this continuous space, nearby vectors tend to share meaning or relationships. This lets models reason about words in context rather than as isolated symbols, improving understanding and generation.

How do encoder-only (BERT) and decoder-only (GPT) Transformers differ, and when should each be used?
  • BERT (encoder-only): Excels at understanding tasks like classification and prediction on text.
  • GPT (decoder-only): Excels at generative tasks such as text completion and code generation.

The choice depends on your target task: understanding vs. generation.

What is RLHF and how is it used in modern LLMs?

Reinforcement Learning from Human Feedback (RLHF) uses human preference signals to shape a model’s behavior, optimizing for a reward that reflects desirable outputs. Models behind ChatGPT (evolved from GPT) employ RLHF during fine-tuning to improve helpfulness and alignment with user expectations.

What can LLMs do beyond natural-language prose?

Beyond translation and conversation, LLMs handle tasks such as language understanding, text classification/generation, question answering, summarization, semantic parsing, pattern recognition, basic math, code generation, dialogue, general knowledge, and logical inference chains. They can also work with other symbolic text forms (e.g., programming code or domain notations), not just natural language.

How is the open-source LLM ecosystem changing costs and choices for organizations?

Open-source (OS) models have proliferated, especially after ChatGPT’s release. Organizations can start from a pretrained OS model and fine-tune it on their data instead of training from scratch. This significantly reduces development and training costs. While deployment and inference still require investment (scalability, performance, monitoring), OS options provide flexibility in architecture choice, ownership, and optimization opportunities.

What are the key risks of generalist closed-source LLMs, and when should you choose a domain-specific model?
  • Risks with generalist closed-source LLMs: data leaves your network; potential data leakage; lack of transparency about training data and model changes; limited reproducibility/interpretability; hallucinations (intrinsic and extrinsic); code-generation misuse without robust guardrails.
  • Choose domain-specific LLMs when: you need high accuracy in a specialized domain; must meet regulatory/compliance needs; must keep data private/on-prem; want control/interpretability; or need to reduce environmental and compute costs by using smaller specialized models and optimizations. Specialization is typically achieved via transfer learning (fine-tuning a pretrained model on domain data).

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free