Small Models, Big Impact

Small Models, Big Impact bundle

For years, the AI industry operated on a simple assumption: bigger models meant better results. But a quiet shift is underway. Across startups, enterprises, and research labs, leading teams are discovering that smaller, domain-specific models often outperform massive general-purpose systems while running faster and costing a fraction as much. This book bundle explains why the small language model (SLM) market is exploding from $6.5B in 2024 to $20.7B by 2030, and how practitioners are using domain-specific models to achieve better results at a fraction of the cost. While everyone chases bigger models, leading teams are building right-sized models that excel at specific tasks, cutting inference costs by 90% while improving accuracy. You'll learn when to choose small over large, how to architect task-specific models, and the optimization techniques that make small models punch above their weight.

This bundle contains these four eBooks:

Rearchitecting LLMs
CUDA for Deep Learning
Domain-Specific Small Language Models
Building Reliable AI Systems

~~$199.96~~ $94.99

you save $104.97 (52%)

Rearchitecting LLMs

By default, general purpose LLMs are not optimized for specific domains and business goals. Using techniques like specialized fine-tuning, pruning unnecessary neural components, and knowledge distillation, you can rearchitect your models to cost less, run faster, and deliver more accurate results.

Rearchitecting LLMs: Structural techniques for efficient models turns research from the latest AI papers into production-ready practices for domain-specific model optimization. As you work through this practical book, you’ll perform hands-on surgery on popular open-source models like Llama-3, Gemma, and Qwen to create cost-effective local small language models (SLMs). Along the way, you’ll learn how to combine behavioral analysis with structural modifications, identifying and removing parts that don’t contribute to your model’s goals, and even use “fair pruning” to reduce model bias at the neuron level.

CUDA for Deep Learning

CUDA (Compute Unified Device Architecture) provides a powerful parallel programming model AI engineers can use to tap the massive processing power of NVIDIA GPUs. CUDA delivers direct control, debugging power, and acceleration at the GPU level that can’t be matched by other types of optimizations.

CUDA for Deep Learning shows you how to work within the CUDA ecosystem, from your first kernel to implementing advanced LLM features like Flash Attention. You’ll learn to profile with Nsight Compute, identify bottlenecks, and understand why each optimization works. By solving problems at multiple levels of abstraction, you’ll develop a deep understanding of CUDA, along with a practical mastery of kernel-building skills. Written for the latest NVIDIA hardware, the book builds a deep understanding of CUDA fundamentals that will stay relevant as chips upgrade and evolve.

Domain-Specific Small Language Models

This is a practical book that shows you how to adapt pretrained open source models to your domain using transfer learning and parameter-efficient fine-tuning. You’ll learn to minimize cost through optimization and quantization, develop secure APIs to serve your models, and deploy SLMs on commodity hardware—including small devices. The hands-on examples include integrating SLMs into RAG systems and agentic workflows.

Building Reliable AI Systems

Building Reliable AI Systems is a comprehensive guide to creating LLM-based apps that are faster and more accurate. It takes you from training to production and beyond into the ongoing maintenance of an LLM. In each chapter, you’ll find in-depth code samples and hands-on projects—including building a RAG-powered chatbot and an agent created with LangChain. Deploying an LLM can be costly, so you’ll love the performance optimization techniques—prompt optimization, model compression, and quantization—that make your LLMs quicker and more efficient. Throughout, real-world case studies from e-commerce, healthcare, and legal work give concrete examples of how businesses have solved some of LLMs common problems.

Small Models, Big Impact bundle

~~$199.96~~ $94.99

you save $104.97 (52%)

Some bundled books and liveVideos are part of the Manning Early Access Program.
You'll get all the available content now, new content as it's created, and the final product when it's ready.