Small Models, Big Impact bundle

For years, the AI industry operated on a simple assumption: bigger models meant better results. But a quiet shift is underway. Across startups, enterprises, and research labs, leading teams are discovering that smaller, domain-specific models often outperform massive general-purpose systems while running faster and costing a fraction as much. This book bundle explains why the small language model (SLM) market is exploding from $6.5B in 2024 to $20.7B by 2030, and how practitioners are using domain-specific models to achieve better results at a fraction of the cost. While everyone chases bigger models, leading teams are building right-sized models that excel at specific tasks, cutting inference costs by 90% while improving accuracy. You'll learn when to choose small over large, how to architect task-specific models, and the optimization techniques that make small models punch above their weight.

This bundle contains these four eBooks:
  • Rearchitecting LLMs This is in eBook format This title is in MEAP
  • CUDA for Deep Learning This is in eBook format This title is in MEAP
  • Domain-Specific Small Language Models This is in eBook format This title is in MEAP
  • Building Reliable AI Systems This is in eBook format This title is in MEAP
$199.96 $94.99
you save $104.97 (52%)

Rearchitecting LLMs

By default, general purpose LLMs are not optimized for specific domains and business goals. Using techniques like specialized fine-tuning, pruning unnecessary neural components, and knowledge distillation, you can rearchitect your models to cost less, run faster, and deliver more accurate results.

Rearchitecting LLMs: Structural techniques for efficient models turns research from the latest AI papers into production-ready practices for domain-specific model optimization. As you work through this practical book, you’ll perform hands-on surgery on popular open-source models like Llama-3, Gemma, and Qwen to create cost-effective local small language models (SLMs). Along the way, you’ll learn how to combine behavioral analysis with structural modifications, identifying and removing parts that don’t contribute to your model’s goals, and even use “fair pruning” to reduce model bias at the neuron level.

CUDA for Deep Learning

CUDA (Compute Unified Device Architecture) provides a powerful parallel programming model AI engineers can use to tap the massive processing power of NVIDIA GPUs. CUDA delivers direct control, debugging power, and acceleration at the GPU level that can’t be matched by other types of optimizations.

CUDA for Deep Learning shows you how to work within the CUDA ecosystem, from your first kernel to implementing advanced LLM features like Flash Attention. You’ll learn to profile with Nsight Compute, identify bottlenecks, and understand why each optimization works. By solving problems at multiple levels of abstraction, you’ll develop a deep understanding of CUDA, along with a practical mastery of kernel-building skills. Written for the latest NVIDIA hardware, the book builds a deep understanding of CUDA fundamentals that will stay relevant as chips upgrade and evolve.

Domain-Specific Small Language Models

Domain-Specific Small Language Models teaches you how to create language models that deliver the power of LLMs for specific areas of knowledge. It provides a practical, application-focused counterpart to foundational texts like Sebastian Raschka’s Build a Large Language Model (From Scratch), showing you how to adapt large-scale concepts for efficient, specialized use. You’ll learn to minimize the computational horsepower your models require, while keeping high–quality performance times and output. You’ll appreciate the clear explanations of complex technical concepts alongside working code samples you can run and replicate on your laptop. Plus, you’ll learn to develop and deliver RAG systems and AI agents that rely solely on SLMs, and without the costs of foundation model access.

Building Reliable AI Systems

Building Reliable AI Systems is a comprehensive guide to creating LLM-based apps that are faster and more accurate. It takes you from training to production and beyond into the ongoing maintenance of an LLM. In each chapter, you’ll find in-depth code samples and hands-on projects—including building a RAG-powered chatbot and an agent created with LangChain. Deploying an LLM can be costly, so you’ll love the performance optimization techniques—prompt optimization, model compression, and quantization—that make your LLMs quicker and more efficient. Throughout, real-world case studies from e-commerce, healthcare, and legal work give concrete examples of how businesses have solved some of LLMs common problems.
Small Models, Big Impact bundle
$199.96 $94.99
you save $104.97 (52%)
Some bundled books and liveVideos are part of the Manning Early Access Program. You'll get all the available content now, new content as it's created, and the final product when it's ready.