1 Small Language Models
This chapter cuts through the hype surrounding large language models to clarify where small language models fit and why they matter. SLMs are compact, Transformer-based models—typically up to a few billion parameters—engineered for efficiency in memory, speed, and energy. Because they can run locally on CPUs, consumer GPUs, mobile, and edge devices, they keep data on-premises and enable offline, near–real-time use. Built on the same architectural principles as larger models, they trade raw scale for deployability and privacy, and they are especially attractive because they can be specialized to domains at relatively low cost. The chapter also highlights the growing view that SLMs are well suited to agentic AI, often as part of heterogeneous systems that combine multiple models.
After introducing SLMs, the chapter provides a concise tour of the foundations behind modern language models. It revisits the Transformer breakthrough—self-attention, parallel processing, and embeddings—that enabled large-scale self-supervised training and unlocked broad generalization. It outlines key variants (encoder-only for understanding tasks and decoder-only for generation) and describes how techniques like reinforcement learning from human feedback refine model behavior. With these advances, models now handle far more than translation: comprehension, classification, summarization, question answering, code generation, basic math, and multi-step reasoning are all within scope.
The chapter then turns to practical considerations: the rapid rise of open-source models offers credible alternatives to proprietary systems and can dramatically lower costs by starting from pretrained checkpoints rather than training from scratch. It weighs the risks of closed, generalist LLMs—external data handling, leakage, opacity, bias, hallucinations, and fragile guardrails—against the advantages of private, domain-specific models tailored via transfer learning for regulated or high-stakes settings. The case is made that small, specialized models can deliver better accuracy, privacy, sustainability, and cost-efficiency, especially when optimized and quantized to run on constrained infrastructure. Finally, the chapter sets expectations for the rest of the book: hands-on techniques for optimizing and serving customized SLMs, integrating patterns like RAG and agentic workflows, and the basic skills readers should bring to make the most of these methods.
Some examples of diverse content an LLM can generate.
The timeline of LLMs since 2019 (image taken from paper [3])
Order of magnitude of costs for each phase of LLM implementation from scratch.
Order of magnitude of costs for each phase of LLM implementation when starting from a pretrained model.
Ratios of data source types used to train some popular existing LLMs.
Generic model specialization to a given domain.
A LLM trained for tasks on molecule structures (generation and captioning).
Summary
- The definition of SLMs.
- Transformers use self-attention mechanisms to process entire text sequences at once instead of word by word.
- Self-supervised learning creates training labels automatically from text data without human annotation.
- BERT models use only the encoder part of Transformers for classification and prediction tasks.
- GPT models use only the decoder part of Transformers for text generation tasks.
- Word embeddings convert words into numerical vectors that capture semantic relationships.
- RLHF uses reinforcement learning to improve LLM responses based on human feedback.
- LLMs can generate any symbolic content including code, math expressions, and structured data.
- Open source LLMs reduce development costs by providing pre-trained models as starting points.
- Transfer learning adapts pre-trained models to specific domains using domain-specific data.
- Generalist LLMs risk data leakage when deployed outside organizational networks.
- Closed source models lack transparency about training data and model architecture.
- Domain-specific LLMs provide better accuracy for specialized tasks than generalist models.
- Smaller specialized models require less computational power than large generalist models.
- Fine-tuning costs significantly less than training models from scratch.
- Regulatory compliance often requires domain-specific models with known training data.
Domain-Specific Small Language Models ebook for free