Overview

1 Small Language Models

Amid the hype surrounding large language models, this chapter sets out to provide a clear, practical foundation for understanding where language models deliver value and where they fall short. It contrasts general-purpose, closed-source LLMs with open and domain-specialized alternatives, outlining benefits, risks, and decision criteria. The narrative orients readers to definitions, core architecture, major application areas, open-source momentum, and the business and technical motivations for adopting domain-specific models—especially small language models (SLMs).

SLMs are built on the same Transformer foundations as LLMs but use far fewer parameters—typically from hundreds of millions up to under ten billion—yielding faster inference, lower memory and energy use, and suitability for on-device, edge, and on-prem deployment with stronger data locality and privacy. They inherit capabilities from self-supervised training and the attention mechanism, and can be specialized efficiently via transfer learning and parameter-efficient fine-tuning on private, domain data. The chapter sketches key Transformer evolutions (encoder-only and decoder-only variants, embeddings, and RLHF) and highlights the growing view that SLMs are both sufficient and economical for many agentic AI workflows, often as part of heterogeneous systems that mix models by task.

Practically, language models now power a broad range of tasks—classification, QA, summarization, code generation, basic reasoning, and more—but generalist closed-source LLMs carry notable risks: data leaving organizational boundaries, limited transparency and reproducibility, potential bias and hallucinations, and guardrail gaps for generated code. Open-source models provide a strong alternative, cutting development and training costs by starting from pretrained checkpoints and enabling private, compliant deployment. For regulated or sensitive domains, domain-specific models typically yield higher accuracy and better alignment to context, and SLMs further reduce operational footprint and environmental impact. The book’s focus is on making such models practical: optimizing and quantizing for efficient inference, serving through diverse APIs, deploying across constrained hardware, and integrating with RAG and agentic patterns.

Some examples of diverse content an LLM can generate.
The timeline of LLMs since 2019 (image taken from paper [3])
Order of magnitude of costs for each phase of LLM implementation from scratch.
Order of magnitude of costs for each phase of LLM implementation when starting from a pretrained model.
Ratios of data source types used to train some popular existing LLMs.
Generic model specialization to a given domain.
A LLM trained for tasks on molecule structures (generation and captioning).

Summary

  • The definition of SLMs.
  • Transformers use self-attention mechanisms to process entire text sequences at once instead of word by word.
  • Self-supervised learning creates training labels automatically from text data without human annotation.
  • BERT models use only the encoder part of Transformers for classification and prediction tasks.
  • GPT models use only the decoder part of Transformers for text generation tasks.
  • Word embeddings convert words into numerical vectors that capture semantic relationships.
  • RLHF uses reinforcement learning to improve LLM responses based on human feedback.
  • LLMs can generate any symbolic content including code, math expressions, and structured data.
  • Open source LLMs reduce development costs by providing pre-trained models as starting points.
  • Transfer learning adapts pre-trained models to specific domains using domain-specific data.
  • Generalist LLMs risk data leakage when deployed outside organizational networks.
  • Closed source models lack transparency about training data and model architecture.
  • Domain-specific LLMs provide better accuracy for specialized tasks than generalist models.
  • Smaller specialized models require less computational power than large generalist models.
  • Fine-tuning costs significantly less than training models from scratch.
  • Regulatory compliance often requires domain-specific models with known training data.

FAQ

What is a Small Language Model (SLM)?SLMs are Transformer-based language models with far fewer parameters than LLMs—typically from hundreds of millions to a few billion, usually under 10B. They are optimized for speed, memory, and energy efficiency, making them suitable for on-device, edge, and on‑prem deployments where data can remain local.
How do SLMs differ from Large Language Models (LLMs)?They use the same core Transformer technology; the difference is scale and resource needs. SLMs prioritize efficiency and deployability on CPUs or consumer GPUs, while LLMs provide more raw capability but require substantially more compute and memory.
Why did Transformers revolutionize NLP compared to RNNs?Transformers use self‑attention to process entire sequences in parallel and remove recurrence, enabling faster, scalable training. Combined with word embeddings and self‑supervised learning, they capture syntax and semantics and generalize to many tasks.
What’s the difference between encoder-only (BERT) and decoder-only (GPT) Transformers?Encoder-only models like BERT are typically better for understanding tasks such as classification and prediction. Decoder-only models like GPT excel at generative tasks; the original encoder–decoder architecture remains useful depending on the task.
What is RLHF and why is it used?Reinforcement Learning from Human Feedback optimizes a model to maximize a reward aligned with human preferences. It’s used to fine‑tune GPT‑style models (e.g., ChatGPT) to improve behavior and usefulness.
What can language models actually do?Beyond translation, they handle language understanding, classification, test/text generation, question answering, summarization, semantic parsing, pattern recognition, basic math, code generation, dialogue, general knowledge, and logical inference—even on symbolic text formats.
What are the risks of using closed-source generalist LLMs?Key risks include data leaving your network, potential data leakage, lack of transparency and reproducibility, undisclosed training data (bias/copyright issues), hallucinations (intrinsic/extrinsic), and code‑generation misuse with bypassable guardrails.
When do domain-specific models provide greater value?They outperform generalists in regulated, privacy‑sensitive contexts and specialized tasks where domain accuracy and context matter. Using transfer learning on private/domain data enables better performance, with private, offline deployment options.
How do open-source models and SLMs affect cost and deployment?Starting from a pretrained open‑source model cuts development and training costs; you still invest in data preparation and fine‑tuning. Deployment and inference remain challenging but can be optimized; SLMs’ smaller size enables on‑prem/edge serving and lower energy/CO₂ impact.
What role do SLMs play in Agentic AI?Recent research argues SLMs are sufficiently powerful, better suited, and more economical for many agentic invocations. Heterogeneous agent systems that mix SLMs and LLMs are a natural choice, a theme explored later in the book.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free