Overview

1 Getting started with MLOps and ML engineering

Production machine learning succeeds or fails on engineering, not just modeling. This chapter sets the stage for building reliable, scalable ML systems by framing MLOps as the bridge from promising experiments to durable, value-delivering services. It positions the reader—whether a data scientist, software engineer, or ML engineer—to gain confidence through a hands-on, real-world approach that favors practical patterns over theory, covers the full lifecycle from ideation to operation, and emphasizes the mindset required to keep models healthy in the wild.

The chapter walks through the end-to-end ML lifecycle: deciding whether ML is warranted, framing the problem, collecting and labeling data, versioning datasets, training, evaluating, and validating with stakeholders. It highlights pipeline orchestration and automation for reproducibility and velocity, then explains the shift to dev/staging/production where CI/CD triggers end-to-end runs, models are deployed as versioned services (often containerized), and systems are tested for performance, scalability, and rollback. Ongoing monitoring spans system health, business outcomes, and data/model drift, with retraining scheduled or threshold-driven. The skills mix spans strong software engineering, practical ML familiarity, data engineering, and automation—anchored by an appreciation for ethics, security, and reliability.

With that foundation, the chapter introduces building an ML platform incrementally, centering on Kubeflow and Kubeflow Pipelines for orchestration, and expanding with components such as a feature store, a model registry, and automated deployment via CI/CD. It advocates learning by assembling the platform—even if you later adopt managed offerings—so you understand trade-offs and can tailor solutions. Tool choices are pragmatic and use-case driven, and the platform naturally extends to LLMOps with additions like vector search and guardrails. Three projects anchor the journey: an OCR system, a tabular movie recommender, and a RAG-powered documentation assistant, each reinforcing iterative workflows, changing requirements, and the reusable patterns needed to design, ship, monitor, and evolve production ML services.

The experimentation phase of the ML life cycle
The dev/staging/production phase of the ML life cycle
MLOps is a mix of different skill sets
The mental map of an ML setup, detailing the project flow from planning to deployment and the tools typically involved in the process
Traditional MLOps (right) extended with LLMOps components (left) for production LLM systems. Chapters 12-13 explore these extensions in detail.
An automated pipeline being executed in Kubeflow.
Feature Stores take in transformed data (features) as input, and have facilities to store, catalog, and serve features.
The model registry captures metadata, parameters, artifacts, and the ML model and in turn exposes a model endpoint.
Model deployment consists of the container registry, CI/CD, and automation working in concert to deploy ML services.

Summary

  • The Machine Learning (ML) life cycle provides a framework for confidently taking ML projects from idea to production. While iterative in nature, understanding each phase helps you navigate the complexities of ML development.
  • Building reliable ML systems requires a combination of skills spanning software engineering, MLOps, and data science. Rather than trying to master everything at once, focus on understanding how these skills work together to create robust ML systems.
  • A well-designed ML Platform forms the foundation for confidently developing and deploying ML services. We'll use tools like Kubeflow Pipelines for automation, MLFlow for model management, and Feast for feature management - learning how to integrate them effectively for production use.
  • We'll apply these concepts by building two different types of ML systems: an OCR system and a Movie recommender. Through these projects, you'll gain hands-on experience with both image and tabular data, building confidence in handling diverse ML challenges.
  • Traditional MLOps principles extend naturally to Large Language Models through LLMOps - adding components for document processing, retrieval systems, and specialized monitoring. Understanding this evolution prepares you for the modern ML landscape.
  • The first step is to identify the problem the ML model is going to solve, followed by collecting and preparing the data to train and evaluate the model. Data versioning enables reproducibility, and model training is automated using a pipeline.
  • The ML life cycle serves as our guide throughout the book, helping us understand not just how to build models, but how to create reliable, production-ready ML systems that deliver real business value.

FAQ

Why is MLOps essential for production ML systems?MLOps focuses on building reliable, scalable, and maintainable ML systems. Many projects fail not because the model is complex, but because deployment, automation, monitoring, and operational concerns are missing. MLOps provides the processes and tooling to move beyond notebooks to production-grade services that can be audited, reproduced, and improved over time.
What are the main stages of the ML life cycle covered in this chapter?The chapter describes two phases: 1) the Experimentation Phase, which includes problem formulation, data collection and preparation, data versioning, model training, evaluation, and validation; and 2) the Dev/Staging/Production Phase, which adds full automation, deployment, monitoring, and retraining triggers.
What defines the Experimentation Phase, and why orchestrate it as a pipeline?Experimentation is highly iterative with frequent loops between steps like training, evaluation, and data preparation. Orchestrating experiments as pipelines ensures automation, reproducibility, and traceability of parameters and artifacts. This reduces manual errors and makes it easier to scale and compare experiments.
Why are data collection, labeling, and data versioning critical?High-quality, well-labeled data is foundational to model performance. Versioning data alongside code is necessary because changes in data can change model behavior. Proper versioning enables reproducibility—ensuring you can recreate results and track exactly which data and code produced a given model.
How do model evaluation and model validation differ?Evaluation is a technical sanity check using holdout datasets and metrics like precision, recall, or AUC to estimate performance on unseen data. Validation confirms the model meets business expectations and constraints, often conducted by stakeholders outside the model-building team.
What changes in the Dev/Staging/Production phase of the life cycle?This phase emphasizes full automation and operational excellence. Pipelines are triggered by CI or programmatic events, culminating in model deployment (often as a REST service) and ongoing monitoring. The system can retrigger training and deployment when performance degrades or thresholds are crossed.
What are best practices for model deployment and monitoring?Package inference behind a versioned API, containerize with Docker, and deploy to a scalable platform like Kubernetes. Perform load testing and set up autoscaling. Monitor both system metrics (RPS, latency, error rates) and ML-specific signals (data/model drift, and business KPIs such as churn or approvals) to catch performance issues early.
When should you retrain a model, and how can it be automated?Retraining depends on context: schedule-based (e.g., monthly) or event-driven (e.g., drift detected, business KPI drops). With automated pipelines and CI/CD, retraining can ingest fresh data, produce a new model, validate it, and deploy it automatically when it meets predefined criteria.
What skills and prerequisites are needed for MLOps/ML engineering?Strong software engineering fundamentals (debugging, performance, deployment) plus working knowledge of ML frameworks (TensorFlow/PyTorch/Scikit-learn) and data engineering basics. Comfort with automation, CI/CD, and Kubernetes is key for reproducibility and reliability. You don’t need expertise in everything—build skills incrementally.
What is an ML platform, and which components and tools will we build?An ML platform provides end-to-end support for the ML life cycle. In this book you’ll set up Kubeflow on Kubernetes, use Kubeflow Pipelines for orchestration, integrate a Feature Store to share and serve features (reducing training–serving skew), and add a Model Registry for artifacts and promotion. CI/CD, container registries, and deployment automation complete the loop. You’ll grow the platform incrementally, weigh build vs. buy (e.g., SageMaker, Vertex AI), and later extend the foundation to LLMOps with components like vector databases and guardrails.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Machine Learning Platform (From Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Machine Learning Platform (From Scratch) ebook for free