Machine Learning Platform Engineering you own this product

Build an internal developer platform for ML and AI systems

Benjamin Tan Wei Hao, Shanoop Padmanabhan, and Varun Mallya

February 2026
ISBN 9781633437333
504 pages

Included with a Manning Online subscription

printed in black & white

available in Korean, Russian

catalog / Data Science / Machine Learning

resources: Source code Datasets Book forum Source code on GitHub Register your pBook for a free eBook

table of content

Part 1 Laying the MLOps foundation

1 Getting started with MLOps and ML engineering

1.1 The ML life cycle

1.1.1 Experimentation phase

1.1.2 Development/staging/production phase

1.2 Skills needed for MLOps

1.2.1 Required skills for ML engineers

1.2.2 Prerequisites

1.3 Building an ML platform

1.3.1 Build vs. buy

1.3.2 Looking ahead: From MLOps to LLMOps

1.3.3 Tools used in this book

1.4 Building ML systems

1.4.1 Introducing the ML projects

1.4.2 ML projects

2 What is MLOps?

2.1 The iterative MLOps life cycle

2.1.1 Data collection

2.1.2 Exploratory Data Analysis

2.1.3 Modeling and training

2.1.4 Model evaluation

2.1.5 Deployment

2.1.6 Monitoring

2.1.7 Maintenance, updates, and review

2.2 Why is robust MLOps important ?

2.3 Role of MLOps in a mature organization

2.4 DevOps vs. MLOps

2.5 Levels of MLOps maturity

2.5.1 Level 0: Basic

2.5.2 Level 1: Intermediate

2.5.3 Level 2: Advanced

3 Building applications on Kubernetes

3.1 Containers and tooling

3.2 Docker

3.2.1 Write the application code

3.2.2 Write a Dockerfile

3.2.3 Building and pushing a Docker image

3.3 Kubernetes

3.3.1 Kubernetes architecture overview

3.3.2 Kubectl

3.3.3 Kubernetes objects

3.3.4 Networking and services

3.3.5 Other objects

3.3.6 Helm charts

3.3.7 Conclusion

3.4 Continuous integration and deployment

3.4.1 GitLab CI

3.4.2 Argo CD

3.5 Prometheus and Grafana

Part 2 Building core ML platform capabilities

4 Designing reliable ML systems

4.1 MLflow for experiment tracking

4.1.1 Data exploration

4.1.2 MLflow tracking

4.1.3 MLflow model registry

4.2 Feast as a feature store

4.2.1 Registering features

4.2.2 Retrieving features

4.2.3 Feature server

4.2.4 Using the Feast UI

5 Orchestrating ML pipelines

5.1 Kubeflow Pipelines: Task orchestrator

5.1.1 Kubeflow components

5.1.2 Income classifier pipeline

6 Productionizing ML models

6.1 BentoML as a deployment platform

6.1.1 Building a Bento

6.1.2 Building and pushing the Bento

6.1.3 Deploying a Bento

6.2 Evidently for data drift monitoring

6.2.1 Data drift detection report and dashboard

6.2.2 Data drift detection Kubeflow pipeline component

6.2.3 Data drift detection for a model deployed as an API

Part 3 Applying MLOps in practice

7 Data analysis and preparation

7.1 Data analysis

7.1.1 Launching a notebook server in Kubeflow

7.1.2 Workspace and data volumes

7.1.3 Configurations and affinity/tolerations

7.1.4 Customizing the menu

7.1.5 Creating a custom Kubeflow notebook image

7.2 Data passing

7.2.1 Scenario 1: Passing simple values to downstream components

7.2.2 Scenario 2: Passing paths for larger data

7.2.3 Overview of KFP v2 artifact types

7.3 Data preparation in action

7.3.1 Data preparation: Object detection

7.3.2 Data preparation: Movie recommender

8 Model training and validation: Part 1

8.1 Training an object detection model

8.1.1 Training YOLO on a custom dataset

8.1.2 Training the model

8.1.3 Container components for system dependencies

8.1.4 Creating the validation component

8.1.5 Creating the pipeline

8.1.6 Executing the pipeline

8.1.7 Validating model artifacts

9 Model training and validation: Part 2

9.1 Storing data with PersistentVolumeClaim

9.1.1 Refactoring the pipeline with a PVC

9.1.2 Efficient dataset management

9.1.3 Creating a VolumeOp

9.1.4 Download Op using PVC

9.1.5 Splitting the dataset directly

9.1.6 Simplifying model training

9.1.7 Simplifying model validation

9.2 Tracking training with TensorBoard

9.2.1 Launching a new TensorBoard

9.2.2 Exploring YOLOv8’s default graphs

9.3 Movie recommender project

9.3.1 Reading data from MinIO and quality assurance

9.3.2 Model training component

9.3.3 Metrics for evaluation

9.3.4 Experiment tracking with MLflow

9.3.5 Model registry with MLflow

9.3.6 Creating a pipeline from components

9.3.7 Local inference in a notebook

10 Model inference and serving

10.1 Model deployment is hard

10.2 BentoML: Simplifying model deployment

10.3 A whirlwind tour of BentoML

10.3.1 BentoML Service and Runners

10.4 Executing a BentoML Service locally

10.4.1 Loading a model with BentoML Runner

10.5 Building Bentos: Packaging your service for deployment

10.5.1 Bento tags: Versioning and managing your Bentos

10.6 BentoML and MLflow inference

10.7 Using only MLflow to create an inference service

10.8 KServe: An alternative to BentoML

11 Monitoring and explainability

11.1 Monitoring

11.1.1 Basic monitoring

11.1.2 Custom metrics

11.1.3 Logging

11.1.4 Alerting

11.2 Data drift detection

11.2.1 Object detection

11.2.2 Movie recommender

11.3 Explainability

11.3.1 Object detection

11.3.2 Movie recommendation

Part 4 Extending MLOps for large language models

12 Designing LLM-powered systems

12.1 LLMOps: New challenges, familiar principles

12.1.1 What makes LLM applications different

12.1.2 Extending our ML platform for LLMs

12.1.3 Essential tools for LLM applications

12.2 Building DataKrypt’s DakkaBot: A simple RAG architecture

12.2.1 What you’ll build

12.2.2 Beyond single API calls: Designing for composability

12.2.3 Google’s Gemini LLM and embeddings

12.2.4 The retrieval component

12.2.5 The augmentation component

12.2.6 The generation component

12.3 Giving DakkaBot a UI

12.4 Observability for LLM applications

12.4.1 Set up Langfuse via Docker

12.4.2 Integrating Langfuse with DakkaBot

12.4.3 Enhanced observability in DakkaBotCore

12.4.4 Beyond traditional metrics

13 Production LLM system design

13.1 Prompt engineering: Code for the generative AI era

13.1.1 Treating prompts as critical infrastructure

13.1.2 Langfuse prompt management for DakkaBot

13.1.3 Langfuse prompt management for production

13.2 Testing LLM applications

13.2.1 Evaluation framework for LLM responses

13.2.2 Safety and adversarial testing

13.3 Governance and safety in production

13.3.1 Implementing safety guardrails

13.4 Cost optimization strategies

13.4.1 Understanding LLM economics

13.4.2 Model selection strategy

13.4.3 Caching strategies

13.4.4 Prompt optimization for efficiency

13.4.5 Production cost monitoring

13.4.6 From traditional ML to LLMOps

Appendices

Appendix A: Installation and setup

A.1 Local installation of command-line tools (Mac and Linux)

A.1.1 The yq YAML processor

A.1.2 Kustomize

A.1.3 Kubectl

A.1.4 K8s distribution

A.1.5 K3s installation

A.1.6 MicroK8s installation

A.1.7 Argo CD

A.1.8 Kubeflow

A.1.9 Cloud provider K8s setup

A.1.10 MLflow setup

A.2 Deploy MLflow

A.2.1 Redis online store setup

A.2.2 BentoML and Yatai setup

A.2.3 Evidently UI setup

Appendix B: Basics of YAML

B.1 Basic YAML files

B.1.1 Comments

B.1.2 Scalar values

B.1.3 Lists

B.1.4 Nested structures (maps)

B.1.5 Quoted strings

B.1.6 Multiline strings

B.1.7 Data types in YAML

B.2 Aliases and anchors

B.2.1 References (merging and reusing data)

B.2.2 Complex data types

B.2.3 Custom data types

B.2.4 Block style vs. flow style

B.2.5 Key sorting and case sensitivity

B.2.6 Best practices

Overview

2 What is MLOps ?

MLOps is presented as the set of practices that transform machine learning from isolated modeling into a repeatable, production-grade capability that reliably delivers business value. Because models, data, and assumptions evolve, the chapter frames ML as a closed, iterative loop rather than a one-off build: start with a well-aligned problem definition and success metrics, then continuously learn from outcomes to refine the system. The core idea is to bridge the gap between business goals, technical requirements, and operational constraints through shared processes, clear ownership, and rigorous tracking so that models can be changed quickly and safely without sacrificing confidence.

The chapter walks through each stage of the loop—data collection, EDA, modeling and training, evaluation, deployment, monitoring, and maintenance—emphasizing lineage, versioning, and automation. Data must be relevant, representative, and carefully tracked; EDA validates assumptions and informs feature choices; modeling benefits from modular code, experiment tracking, and hyperparameter search to maximize reproducibility and velocity. Evaluation uses appropriate domain metrics and robust holdouts, including error analysis and (optionally) interpretability techniques. Deployment spans APIs and edge targets with environment-specific optimizations and staged rollouts. Monitoring detects drift, performance regressions, and errors, backed by alerting and strong logging. Maintenance closes the loop by feeding insights back into data, models, and infrastructure for continuous improvement.

Robust MLOps is necessary because real-world ML adds complexities that differ from traditional software: data is a first-class asset, models change without code edits, and compliance, bias, and drift must be actively managed. The chapter contrasts DevOps and MLOps—sharing automation and CI/CD principles but diverging on data stewardship, continuous training, interpretability, and performance monitoring. It also outlines organizational challenges (tooling fragmentation, cross-functional communication, scaling/optimization) and the benefits of maturity: faster experimentation, cost control, collaboration, repeatability, traceability, and reliable scaling. A maturity model (Level 0: manual, Level 1: continuous retraining pipelines, Level 2: pipeline automation) provides a path forward, underscoring that disciplined, automated processes reduce technical debt and build lasting confidence in production ML.

The mental map where we are focusing on defining the problem(1) and model design(2)

ML as a loop

Examples of the visual data in the MIDV500 dataset

Example of an annotated ID card, shown in CVAT, which is a web-based tool designed for annotating images and videos, commonly used to label data for computer vision models.

A view of a retraining pipeline using the modular codebase concept. This approach of keeping the model, code, configuration files, and data as distinct versioned components with lineage links ensures that the process remains flexible, fast, and adaptable while enabling experimentation, debugging, and iterative development.

Summary

ML exists to solve a business problem and it is important to understand the requirement in depth before starting an ML project.
MLOps is the iterative process of developing, monitoring and improving an ML model.
A model is an artifact of the ML loop that aims to improve model performance over time.
MLOps is hard due to data management, complex tooling, organizational setups, scaling challenges and the unpredictability of the real world.
Skipping established ML practices can appear to be faster in the short term, but duplication and technical debt will quickly erase any gains.
DevOps and MLOps have similarities, but differences in data and model management, among others, means that MLOps has some unique challenges.
Robust MLOps is a highly experimental, iterative process with room for institutional learning and rapid prototyping to identify things that work for you and your organization.

FAQ

What is MLOps and why is it essential for production ML?

MLOps is the set of practices and principles that enable teams to reliably deliver business value with machine learning systems. It treats ML as an iterative, closed loop where models, data, and configurations evolve continuously. By standardizing processes across the lifecycle, MLOps improves repeatability, velocity, and confidence in deploying and operating ML in the real world.

How does MLOps differ from traditional DevOps?

MLOps shares DevOps foundations like automation, CI/CD, and cross-functional collaboration. It differs by making data a first-class artifact, adding continuous training, model/version lineage, experiment tracking, and a focus on interpretability and bias. Because model performance can degrade as data shifts, MLOps emphasizes specialized monitoring and retraining beyond typical software practices.

What are the main stages of the iterative MLOps lifecycle?

The lifecycle forms a closed loop: - Problem definition and data collection - Exploratory Data Analysis (EDA) - Modeling and training - Model evaluation - Deployment (staging and production) - Monitoring (data, performance, errors) - Maintenance, updates, and review (closing the loop via fixes, new data, retraining)

Why is precise problem definition and stakeholder alignment vital?

Clear definitions align business goals, success metrics, timelines, and acceptable error tolerances with technical feasibility. Collaborating with business/product, technical, and legal/compliance stakeholders surfaces requirements early (e.g., metrics, data pipelines, compute, deployment, privacy, governance). This reduces risk, ensures scope clarity, and guides evaluation and decision-making throughout the loop.

What are best practices for data collection and dataset lineage?

Collect data that is relevant, sufficiently large for problem complexity, high quality (minimizing bias and leakage), representative of the deployment environment, and diverse. Maintain rigorous lineage: versioned ETL, trace raw to annotated/augmented datasets, and record when/where/how/why data was gathered. Strong lineage enables reproducibility, debugging, compliance, and efficient dataset revisions.

What role does EDA play in reducing risk and informing modeling?

EDA validates schema and data quality, examines distributions and class balance, assesses feature robustness/cost, detects cyclic patterns and external correlations, and identifies outliers. It makes assumptions explicit, adds early checks to prevent violations, and guides pivots if needed. Multivariate analysis and dimensionality reduction help uncover structure and inform better features and models.

Which MLOps capabilities are critical during modeling and training?

Key capabilities include model and data versioning, experiment tracking, training pipelines, and automated hyperparameter optimization. A modular codebase that separates code, configuration, data, and model artifacts boosts reproducibility and iteration speed. Keep configurations pragmatic to avoid over-parameterization while preserving flexibility.

How should models be evaluated and deployed safely?

Choose domain-appropriate metrics (e.g., precision/recall/F1 for classification, MAE/MSE for regression) aligned to business risk. Use a curated, evolving holdout set, automate evaluation, prevent data leakage, and analyze errors for systemic issues. Deploy via APIs/microservices or edge targets, optimize and test the final artifact for the target environment, use staging before production, and rely on versioning to roll back if needed.

What should be monitored in production and what triggers retraining?

Monitor input data statistics for drift, model performance (e.g., accuracy, latency), and error patterns. Combine robust logging, model/version lineage, and reliable alerting to accelerate diagnosis. Detected drift, degraded KPIs, or systematic errors should trigger data updates, targeted collection of edge cases, and automated retraining/evaluation pipelines to close the loop.

What are the levels of MLOps maturity and their hallmarks?

- Level 0 (Basic): Manual scripts, sparse releases, little/no monitoring. - Level 1 (Intermediate): Continuous retraining pipelines, modular components shared across teams, validation, lineage/metadata, and automated triggers (“experimental–operational symmetry”). - Level 2 (Advanced): Pipelines/components are productized and highly automated, org-wide ownership, with most steps automated except data/model analysis—maximizing velocity, reliability, and scalability.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

eBook

pdf, ePub, online

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more