Overview

1 Small Language Models

The chapter introduces Small Language Models as a practical alternative to highly hyped, general-purpose Large Language Models. It explains that SLMs use the same Transformer-based foundations as larger models but operate at a smaller scale, typically with far fewer parameters, lower memory needs, and reduced computational requirements. Because they can run locally on edge devices, commodity hardware, on-premises servers, or small clusters, they are especially valuable when privacy, latency, offline operation, cost, and energy efficiency matter.

The chapter also gives a high-level view of how modern language models emerged from the Transformer architecture. It contrasts earlier recurrent neural networks with Transformers, emphasizing self-attention, parallel processing, and word embeddings as key advances that made large-scale self-supervised training possible. It then describes important Transformer families, including encoder-based models suited to classification and prediction tasks and decoder-based models suited to text generation, while also noting techniques such as reinforcement learning from human feedback. Language models are shown to support a wide range of tasks, including question answering, summarization, classification, code generation, dialogue, reasoning, and work with symbolic text-like representations.

The chapter argues that closed-source generalist LLMs, while powerful and convenient, introduce business risks around data exposure, lack of transparency, limited interpretability, hallucinations, hidden training data, infrastructure dependence, and misuse. Open source models offer organizations more control and can reduce development costs by providing pretrained foundations that can be specialized rather than built from scratch. The chapter concludes that domain-specific SLMs and customized language models can provide greater value in regulated, privacy-sensitive, or highly specialized industries because they can be tuned on relevant data, deployed within organizational boundaries, achieve better task-specific accuracy, and reduce computational and environmental costs compared with very large generalist systems.

Some examples of diverse content an LLM can generate.
The timeline of LLMs since 2019 (image taken from paper [3])
Order of magnitude of costs for each phase of LLM implementation from scratch.
Order of magnitude of costs for each phase of LLM implementation when starting from a pretrained model.
Ratios of data source types used to train some popular existing LLMs.
Generic model specialization to a given domain.
A LLM trained for tasks on molecule structures (generation and captioning).

Summary

  • The definition of SLMs.
  • Transformers use self-attention mechanisms to process entire text sequences at once instead of word by word.
  • Self-supervised learning creates training labels automatically from text data without human annotation.
  • BERT models use only the encoder part of Transformers for classification and prediction tasks.
  • GPT models use only the decoder part of Transformers for text generation tasks.
  • Word embeddings convert words into numerical vectors that capture semantic relationships.
  • RLHF uses reinforcement learning to improve LLM responses based on human feedback.
  • LLMs can generate any symbolic content including code, math expressions, and structured data.
  • Open source LLMs reduce development costs by providing pre-trained models as starting points.
  • Transfer learning adapts pre-trained models to specific domains using domain-specific data.
  • Generalist LLMs risk data leakage when deployed outside organizational networks.
  • Closed source models lack transparency about training data and model architecture.
  • Domain-specific LLMs provide better accuracy for specialized tasks than generalist models.
  • Smaller specialized models require less computational power than large generalist models.
  • Fine-tuning costs significantly less than training models from scratch.
  • Regulatory compliance often requires domain-specific models with known training data.

FAQ

What is a Small Language Model (SLM)?Small Language Models are language models designed to perform natural language processing tasks like larger LLMs, but with fewer parameters, a smaller memory footprint, and lower computational requirements. They typically range from a few hundred million to a few billion parameters, often below 10 billion, making them suitable for mobile devices, edge devices, on-prem servers, and small clusters.
How are SLMs different from Large Language Models (LLMs)?SLMs and LLMs are based on the same core Transformer technology, so the main difference is scale rather than architecture. LLMs may have hundreds of billions of parameters, while SLMs are much smaller and optimized for efficiency, speed, local deployment, and lower energy consumption. SLMs trade some raw general-purpose capability for lower cost, privacy, and deployment flexibility.
Why are SLMs useful for domain-specific applications?SLMs can be specialized for specific domains and tasks using domain-specific data, private data, and expert knowledge at relatively low cost. This is especially useful in fields such as healthcare, pharma, biotech, manufacturing, chemistry, and finance, where accuracy, privacy, compliance, and specialized terminology are important.
What role do Transformers play in SLMs and LLMs?Both SLMs and LLMs are built on the Transformer architecture introduced in the 2017 paper “Attention is All You Need.” Transformers use self-attention to process entire input sequences at once and have no recurrent structure, which allows more parallelism and faster training compared with older RNN-based architectures.
What is self-supervised learning in language models?Self-supervised learning generates labels automatically from the data instead of relying on humans to label examples. For example, a model may be trained by removing the next word from a sentence and learning to predict it. This approach allows language models to train on huge amounts of unlabeled text and develop broad language capabilities.
What are BERT and GPT, and how do they differ?BERT and GPT are two major Transformer families. BERT, or Bidirectional Encoder Representations from Transformers, uses the encoder part of the original Transformer and is typically strong for classification and prediction tasks. GPT, or Generative Pre-trained Transformer, uses the decoder part and is typically better suited for generative text tasks.
What kinds of tasks can language models perform?Language models can perform many tasks beyond translation, including language understanding, text classification, text generation, question answering, document summarization, semantic parsing, pattern recognition, basic math solving, code generation, dialogue, general knowledge tasks, and logical inference chains.
Why are open source language models important?Open source models give organizations more choices and reduce dependence on proprietary closed-source vendors. Instead of developing and training a model from scratch, organizations can start from a pretrained open source model and fine-tune or instruct it on their own data, significantly reducing development and training costs.
What are the main risks of using closed-source generalist LLMs?Closed-source generalist LLMs can create risks around data leaving the organization, data leakage, lack of transparency, lack of reproducibility, limited interpretability, unknown training data, hallucinations, and potential misuse for generating unsafe code. These risks are especially serious in regulated industries or when handling sensitive or proprietary information.
When does a domain-specific LLM provide more value than a generalist LLM?A domain-specific LLM is often more valuable when tasks require specialized expertise, high accuracy, regulatory compliance, private data protection, or operation inside corporate network boundaries. By adapting a pretrained model with domain-specific data through transfer learning, organizations can achieve better performance and context understanding in a focused area.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Domain-Specific Small Language Models ebook for free