Overview

2 What is MLOps ?

MLOps is presented as the set of practices that transform machine learning from isolated modeling into a repeatable, production-grade capability that reliably delivers business value. Because models, data, and assumptions evolve, the chapter frames ML as a closed, iterative loop rather than a one-off build: start with a well-aligned problem definition and success metrics, then continuously learn from outcomes to refine the system. The core idea is to bridge the gap between business goals, technical requirements, and operational constraints through shared processes, clear ownership, and rigorous tracking so that models can be changed quickly and safely without sacrificing confidence.

The chapter walks through each stage of the loop—data collection, EDA, modeling and training, evaluation, deployment, monitoring, and maintenance—emphasizing lineage, versioning, and automation. Data must be relevant, representative, and carefully tracked; EDA validates assumptions and informs feature choices; modeling benefits from modular code, experiment tracking, and hyperparameter search to maximize reproducibility and velocity. Evaluation uses appropriate domain metrics and robust holdouts, including error analysis and (optionally) interpretability techniques. Deployment spans APIs and edge targets with environment-specific optimizations and staged rollouts. Monitoring detects drift, performance regressions, and errors, backed by alerting and strong logging. Maintenance closes the loop by feeding insights back into data, models, and infrastructure for continuous improvement.

Robust MLOps is necessary because real-world ML adds complexities that differ from traditional software: data is a first-class asset, models change without code edits, and compliance, bias, and drift must be actively managed. The chapter contrasts DevOps and MLOps—sharing automation and CI/CD principles but diverging on data stewardship, continuous training, interpretability, and performance monitoring. It also outlines organizational challenges (tooling fragmentation, cross-functional communication, scaling/optimization) and the benefits of maturity: faster experimentation, cost control, collaboration, repeatability, traceability, and reliable scaling. A maturity model (Level 0: manual, Level 1: continuous retraining pipelines, Level 2: pipeline automation) provides a path forward, underscoring that disciplined, automated processes reduce technical debt and build lasting confidence in production ML.

The mental map where we are focusing on defining the problem(1) and model design(2)
ML as a loop
Examples of the visual data in the MIDV500 dataset
Example of an annotated ID card, shown in CVAT, which is a web-based tool designed for annotating images and videos, commonly used to label data for computer vision models.
A view of a retraining pipeline using the modular codebase concept. This approach of keeping the model, code, configuration files, and data as distinct versioned components with lineage links ensures that the process remains flexible, fast, and adaptable while enabling experimentation, debugging, and iterative development.

Summary

  • ML exists to solve a business problem and it is important to understand the requirement in depth before starting an ML project.
  • MLOps is the iterative process of developing, monitoring and improving an ML model.
  • A model is an artifact of the ML loop that aims to improve model performance over time.
  • MLOps is hard due to data management, complex tooling, organizational setups, scaling challenges and the unpredictability of the real world.
  • Skipping established ML practices can appear to be faster in the short term, but duplication and technical debt will quickly erase any gains.
  • DevOps and MLOps have similarities, but differences in data and model management, among others, means that MLOps has some unique challenges.
  • Robust MLOps is a highly experimental, iterative process with room for institutional learning and rapid prototyping to identify things that work for you and your organization.

FAQ

What is MLOps and why is it essential for production ML?MLOps is the set of practices and principles that enable teams to reliably deliver business value with machine learning systems. It treats ML as an iterative, closed loop where models, data, and configurations evolve continuously. By standardizing processes across the lifecycle, MLOps improves repeatability, velocity, and confidence in deploying and operating ML in the real world.
How does MLOps differ from traditional DevOps?MLOps shares DevOps foundations like automation, CI/CD, and cross-functional collaboration. It differs by making data a first-class artifact, adding continuous training, model/version lineage, experiment tracking, and a focus on interpretability and bias. Because model performance can degrade as data shifts, MLOps emphasizes specialized monitoring and retraining beyond typical software practices.
What are the main stages of the iterative MLOps lifecycle?The lifecycle forms a closed loop: - Problem definition and data collection - Exploratory Data Analysis (EDA) - Modeling and training - Model evaluation - Deployment (staging and production) - Monitoring (data, performance, errors) - Maintenance, updates, and review (closing the loop via fixes, new data, retraining)
Why is precise problem definition and stakeholder alignment vital?Clear definitions align business goals, success metrics, timelines, and acceptable error tolerances with technical feasibility. Collaborating with business/product, technical, and legal/compliance stakeholders surfaces requirements early (e.g., metrics, data pipelines, compute, deployment, privacy, governance). This reduces risk, ensures scope clarity, and guides evaluation and decision-making throughout the loop.
What are best practices for data collection and dataset lineage?Collect data that is relevant, sufficiently large for problem complexity, high quality (minimizing bias and leakage), representative of the deployment environment, and diverse. Maintain rigorous lineage: versioned ETL, trace raw to annotated/augmented datasets, and record when/where/how/why data was gathered. Strong lineage enables reproducibility, debugging, compliance, and efficient dataset revisions.
What role does EDA play in reducing risk and informing modeling?EDA validates schema and data quality, examines distributions and class balance, assesses feature robustness/cost, detects cyclic patterns and external correlations, and identifies outliers. It makes assumptions explicit, adds early checks to prevent violations, and guides pivots if needed. Multivariate analysis and dimensionality reduction help uncover structure and inform better features and models.
Which MLOps capabilities are critical during modeling and training?Key capabilities include model and data versioning, experiment tracking, training pipelines, and automated hyperparameter optimization. A modular codebase that separates code, configuration, data, and model artifacts boosts reproducibility and iteration speed. Keep configurations pragmatic to avoid over-parameterization while preserving flexibility.
How should models be evaluated and deployed safely?Choose domain-appropriate metrics (e.g., precision/recall/F1 for classification, MAE/MSE for regression) aligned to business risk. Use a curated, evolving holdout set, automate evaluation, prevent data leakage, and analyze errors for systemic issues. Deploy via APIs/microservices or edge targets, optimize and test the final artifact for the target environment, use staging before production, and rely on versioning to roll back if needed.
What should be monitored in production and what triggers retraining?Monitor input data statistics for drift, model performance (e.g., accuracy, latency), and error patterns. Combine robust logging, model/version lineage, and reliable alerting to accelerate diagnosis. Detected drift, degraded KPIs, or systematic errors should trigger data updates, targeted collection of edge cases, and automated retraining/evaluation pipelines to close the loop.
What are the levels of MLOps maturity and their hallmarks?- Level 0 (Basic): Manual scripts, sparse releases, little/no monitoring. - Level 1 (Intermediate): Continuous retraining pipelines, modular components shared across teams, validation, lineage/metadata, and automated triggers (“experimental–operational symmetry”). - Level 2 (Advanced): Pipelines/components are productized and highly automated, org-wide ownership, with most steps automated except data/model analysis—maximizing velocity, reliability, and scalability.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Machine Learning Platform Engineering ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Machine Learning Platform Engineering ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Machine Learning Platform Engineering ebook for free