Overview

1 Small Language Models

This chapter cuts through the hype surrounding large language models to clarify where small language models fit and why they matter. SLMs are compact, Transformer-based models—typically up to a few billion parameters—engineered for efficiency in memory, speed, and energy. Because they can run locally on CPUs, consumer GPUs, mobile, and edge devices, they keep data on-premises and enable offline, near–real-time use. Built on the same architectural principles as larger models, they trade raw scale for deployability and privacy, and they are especially attractive because they can be specialized to domains at relatively low cost. The chapter also highlights the growing view that SLMs are well suited to agentic AI, often as part of heterogeneous systems that combine multiple models.

After introducing SLMs, the chapter provides a concise tour of the foundations behind modern language models. It revisits the Transformer breakthrough—self-attention, parallel processing, and embeddings—that enabled large-scale self-supervised training and unlocked broad generalization. It outlines key variants (encoder-only for understanding tasks and decoder-only for generation) and describes how techniques like reinforcement learning from human feedback refine model behavior. With these advances, models now handle far more than translation: comprehension, classification, summarization, question answering, code generation, basic math, and multi-step reasoning are all within scope.

The chapter then turns to practical considerations: the rapid rise of open-source models offers credible alternatives to proprietary systems and can dramatically lower costs by starting from pretrained checkpoints rather than training from scratch. It weighs the risks of closed, generalist LLMs—external data handling, leakage, opacity, bias, hallucinations, and fragile guardrails—against the advantages of private, domain-specific models tailored via transfer learning for regulated or high-stakes settings. The case is made that small, specialized models can deliver better accuracy, privacy, sustainability, and cost-efficiency, especially when optimized and quantized to run on constrained infrastructure. Finally, the chapter sets expectations for the rest of the book: hands-on techniques for optimizing and serving customized SLMs, integrating patterns like RAG and agentic workflows, and the basic skills readers should bring to make the most of these methods.

Some examples of diverse content an LLM can generate.
The timeline of LLMs since 2019 (image taken from paper [3])
Order of magnitude of costs for each phase of LLM implementation from scratch.
Order of magnitude of costs for each phase of LLM implementation when starting from a pretrained model.
Ratios of data source types used to train some popular existing LLMs.
Generic model specialization to a given domain.
A LLM trained for tasks on molecule structures (generation and captioning).

Summary

  • The definition of SLMs.
  • Transformers use self-attention mechanisms to process entire text sequences at once instead of word by word.
  • Self-supervised learning creates training labels automatically from text data without human annotation.
  • BERT models use only the encoder part of Transformers for classification and prediction tasks.
  • GPT models use only the decoder part of Transformers for text generation tasks.
  • Word embeddings convert words into numerical vectors that capture semantic relationships.
  • RLHF uses reinforcement learning to improve LLM responses based on human feedback.
  • LLMs can generate any symbolic content including code, math expressions, and structured data.
  • Open source LLMs reduce development costs by providing pre-trained models as starting points.
  • Transfer learning adapts pre-trained models to specific domains using domain-specific data.
  • Generalist LLMs risk data leakage when deployed outside organizational networks.
  • Closed source models lack transparency about training data and model architecture.
  • Domain-specific LLMs provide better accuracy for specialized tasks than generalist models.
  • Smaller specialized models require less computational power than large generalist models.
  • Fine-tuning costs significantly less than training models from scratch.
  • Regulatory compliance often requires domain-specific models with known training data.

FAQ

What is a Small Language Model (SLM)?SLMs are Transformer-based language models designed to handle NLP tasks like their larger counterparts but with far fewer parameters (from a few hundred million up to typically under 10 billion). They have smaller memory footprints and lower computational requirements, making them suitable for mobile, edge, and on-prem deployments.
How do SLMs differ from LLMs beyond size?Both use the same Transformer fundamentals; the main difference is scale. SLMs prioritize efficiency, speed, and energy use, can run well on CPUs or consumer GPUs, and keep data local—an advantage for privacy-sensitive or offline use cases—while LLMs focus on raw capability at much higher resource cost.
Why did Transformers change the game for NLP?Transformers introduced self-attention and removed recurrence, enabling parallel processing of entire sequences and significantly faster training. Combined with word embeddings and large-scale (self-supervised) training, they capture rich syntax and semantics, unlocking diverse capabilities from text generation to code completion.
What training paradigm do LLMs/SLMs typically follow?They are primarily trained via self-supervised learning (e.g., predicting the next token in text) using vast unlabeled corpora. Many modern systems are later refined with Reinforcement Learning from Human Feedback (RLHF) to optimize behavior for helpfulness, truthfulness, and safety.
What are the main Transformer variants and when should I use each?Encoder-only (e.g., BERT) models excel at understanding tasks like classification and prediction. Decoder-only (e.g., GPT) models are typically best for generative tasks. Encoder–decoder hybrids are used where both encoding and generation are central; the choice depends on your target task.
What kinds of tasks and content can these models handle?They can perform language understanding, text classification and generation, question answering, summarization, semantic parsing, pattern recognition, basic math, code generation, dialogue, general-knowledge recall, and logical chains. Beyond natural language, they can generate or interpret other symbolic text formats (e.g., code or domain-specific notations).
What are the key risks of using closed-source generalist LLMs?Risks include data leaving your network, potential data leakage, lack of transparency and reproducibility, unknown training data (bias and IP concerns), hallucinations (especially extrinsic, where sources are unverifiable), and the possibility of unsafe code generation if guardrails are bypassed.
When do domain-specific models provide greater business value?They shine in regulated and high-stakes domains requiring specialized knowledge, verifiable sources, and strict privacy. By applying transfer learning to a pretrained model using domain (including private) data, you can boost accuracy and compliance, keep data on-prem, and reduce environmental impact thanks to smaller model sizes.
How does the open-source ecosystem change costs and feasibility?Starting from a pretrained open-source model dramatically reduces development and training costs compared to building from scratch, though data collection/prep and fine-tuning still require investment. Deployment and inference challenges remain, but optimization/quantization let you serve models cost-effectively on constrained hardware; always review licenses for intended use.
Why are SLMs considered well-suited for Agentic AI?A 2025 NVIDIA paper argues SLMs are sufficiently capable, more naturally suited, and more economical for many agentic invocations. It also suggests heterogeneous agentic systems—agents that invoke both SLMs and LLMs—are a practical approach when general conversational breadth and specialized efficiency are both needed.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free