Machine Learning Platform Engineering you own this product

Build an internal developer platform for ML and AI systems

Benjamin Tan Wei Hao, Shanoop Padmanabhan, and Varun Mallya

February 2026
ISBN 9781633437333
504 pages

Included with a Manning Online subscription

printed in black & white

available in Korean, Russian

catalog / Data Science / Machine Learning

resources: Source code Datasets Book forum Source code on GitHub Register your pBook for a free eBook

table of content

Part 1 Laying the MLOps foundation

1 Getting started with MLOps and ML engineering

1.1 The ML life cycle

1.1.1 Experimentation phase

1.1.2 Development/staging/production phase

1.2 Skills needed for MLOps

1.2.1 Required skills for ML engineers

1.2.2 Prerequisites

1.3 Building an ML platform

1.3.1 Build vs. buy

1.3.2 Looking ahead: From MLOps to LLMOps

1.3.3 Tools used in this book

1.4 Building ML systems

1.4.1 Introducing the ML projects

1.4.2 ML projects

2 What is MLOps?

2.1 The iterative MLOps life cycle

2.1.1 Data collection

2.1.2 Exploratory Data Analysis

2.1.3 Modeling and training

2.1.4 Model evaluation

2.1.5 Deployment

2.1.6 Monitoring

2.1.7 Maintenance, updates, and review

2.2 Why is robust MLOps important ?

2.3 Role of MLOps in a mature organization

2.4 DevOps vs. MLOps

2.5 Levels of MLOps maturity

2.5.1 Level 0: Basic

2.5.2 Level 1: Intermediate

2.5.3 Level 2: Advanced

3 Building applications on Kubernetes

3.1 Containers and tooling

3.2 Docker

3.2.1 Write the application code

3.2.2 Write a Dockerfile

3.2.3 Building and pushing a Docker image

3.3 Kubernetes

3.3.1 Kubernetes architecture overview

3.3.2 Kubectl

3.3.3 Kubernetes objects

3.3.4 Networking and services

3.3.5 Other objects

3.3.6 Helm charts

3.3.7 Conclusion

3.4 Continuous integration and deployment

3.4.1 GitLab CI

3.4.2 Argo CD

3.5 Prometheus and Grafana

Part 2 Building core ML platform capabilities

4 Designing reliable ML systems

4.1 MLflow for experiment tracking

4.1.1 Data exploration

4.1.2 MLflow tracking

4.1.3 MLflow model registry

4.2 Feast as a feature store

4.2.1 Registering features

4.2.2 Retrieving features

4.2.3 Feature server

4.2.4 Using the Feast UI

5 Orchestrating ML pipelines

5.1 Kubeflow Pipelines: Task orchestrator

5.1.1 Kubeflow components

5.1.2 Income classifier pipeline

6 Productionizing ML models

6.1 BentoML as a deployment platform

6.1.1 Building a Bento

6.1.2 Building and pushing the Bento

6.1.3 Deploying a Bento

6.2 Evidently for data drift monitoring

6.2.1 Data drift detection report and dashboard

6.2.2 Data drift detection Kubeflow pipeline component

6.2.3 Data drift detection for a model deployed as an API

Part 3 Applying MLOps in practice

7 Data analysis and preparation

7.1 Data analysis

7.1.1 Launching a notebook server in Kubeflow

7.1.2 Workspace and data volumes

7.1.3 Configurations and affinity/tolerations

7.1.4 Customizing the menu

7.1.5 Creating a custom Kubeflow notebook image

7.2 Data passing

7.2.1 Scenario 1: Passing simple values to downstream components

7.2.2 Scenario 2: Passing paths for larger data

7.2.3 Overview of KFP v2 artifact types

7.3 Data preparation in action

7.3.1 Data preparation: Object detection

7.3.2 Data preparation: Movie recommender

8 Model training and validation: Part 1

8.1 Training an object detection model

8.1.1 Training YOLO on a custom dataset

8.1.2 Training the model

8.1.3 Container components for system dependencies

8.1.4 Creating the validation component

8.1.5 Creating the pipeline

8.1.6 Executing the pipeline

8.1.7 Validating model artifacts

9 Model training and validation: Part 2

9.1 Storing data with PersistentVolumeClaim

9.1.1 Refactoring the pipeline with a PVC

9.1.2 Efficient dataset management

9.1.3 Creating a VolumeOp

9.1.4 Download Op using PVC

9.1.5 Splitting the dataset directly

9.1.6 Simplifying model training

9.1.7 Simplifying model validation

9.2 Tracking training with TensorBoard

9.2.1 Launching a new TensorBoard

9.2.2 Exploring YOLOv8’s default graphs

9.3 Movie recommender project

9.3.1 Reading data from MinIO and quality assurance

9.3.2 Model training component

9.3.3 Metrics for evaluation

9.3.4 Experiment tracking with MLflow

9.3.5 Model registry with MLflow

9.3.6 Creating a pipeline from components

9.3.7 Local inference in a notebook

10 Model inference and serving

10.1 Model deployment is hard

10.2 BentoML: Simplifying model deployment

10.3 A whirlwind tour of BentoML

10.3.1 BentoML Service and Runners

10.4 Executing a BentoML Service locally

10.4.1 Loading a model with BentoML Runner

10.5 Building Bentos: Packaging your service for deployment

10.5.1 Bento tags: Versioning and managing your Bentos

10.6 BentoML and MLflow inference

10.7 Using only MLflow to create an inference service

10.8 KServe: An alternative to BentoML

11 Monitoring and explainability

11.1 Monitoring

11.1.1 Basic monitoring

11.1.2 Custom metrics

11.1.3 Logging

11.1.4 Alerting

11.2 Data drift detection

11.2.1 Object detection

11.2.2 Movie recommender

11.3 Explainability

11.3.1 Object detection

11.3.2 Movie recommendation

Part 4 Extending MLOps for large language models

12 Designing LLM-powered systems

12.1 LLMOps: New challenges, familiar principles

12.1.1 What makes LLM applications different

12.1.2 Extending our ML platform for LLMs

12.1.3 Essential tools for LLM applications

12.2 Building DataKrypt’s DakkaBot: A simple RAG architecture

12.2.1 What you’ll build

12.2.2 Beyond single API calls: Designing for composability

12.2.3 Google’s Gemini LLM and embeddings

12.2.4 The retrieval component

12.2.5 The augmentation component

12.2.6 The generation component

12.3 Giving DakkaBot a UI

12.4 Observability for LLM applications

12.4.1 Set up Langfuse via Docker

12.4.2 Integrating Langfuse with DakkaBot

12.4.3 Enhanced observability in DakkaBotCore

12.4.4 Beyond traditional metrics

13 Production LLM system design

13.1 Prompt engineering: Code for the generative AI era

13.1.1 Treating prompts as critical infrastructure

13.1.2 Langfuse prompt management for DakkaBot

13.1.3 Langfuse prompt management for production

13.2 Testing LLM applications

13.2.1 Evaluation framework for LLM responses

13.2.2 Safety and adversarial testing

13.3 Governance and safety in production

13.3.1 Implementing safety guardrails

13.4 Cost optimization strategies

13.4.1 Understanding LLM economics

13.4.2 Model selection strategy

13.4.3 Caching strategies

13.4.4 Prompt optimization for efficiency

13.4.5 Production cost monitoring

13.4.6 From traditional ML to LLMOps

Appendices

Appendix A: Installation and setup

A.1 Local installation of command-line tools (Mac and Linux)

A.1.1 The yq YAML processor

A.1.2 Kustomize

A.1.3 Kubectl

A.1.4 K8s distribution

A.1.5 K3s installation

A.1.6 MicroK8s installation

A.1.7 Argo CD

A.1.8 Kubeflow

A.1.9 Cloud provider K8s setup

A.1.10 MLflow setup

A.2 Deploy MLflow

A.2.1 Redis online store setup

A.2.2 BentoML and Yatai setup

A.2.3 Evidently UI setup

Appendix B: Basics of YAML

B.1 Basic YAML files

B.1.1 Comments

B.1.2 Scalar values

B.1.3 Lists

B.1.4 Nested structures (maps)

B.1.5 Quoted strings

B.1.6 Multiline strings

B.1.7 Data types in YAML

B.2 Aliases and anchors

B.2.1 References (merging and reusing data)

B.2.2 Complex data types

B.2.3 Custom data types

B.2.4 Block style vs. flow style

B.2.5 Key sorting and case sensitivity

B.2.6 Best practices

Overview

4 Designing reliable ML systems

This chapter presents a pragmatic path from ad‑hoc experimentation to reliable, production‑grade ML, emphasizing reproducibility, traceability, and consistent data/feature access. It assembles a “mini” ML platform around core responsibilities—experiment tracking, model versioning, feature management, deployment, and monitoring—so teams can collaborate safely, compare results rigorously, and promote models with confidence across environments. The narrative keeps a real‑world lens, showing how to stitch together focused tools into a coherent workflow that supports both batch and real‑time use cases while remaining flexible to project needs.

The workflow begins with exploratory analysis and iterative modeling, then formalizes it with MLflow for experiment tracking and model governance. Runs capture artifacts, datasets, parameters, and metrics; object storage backs dataset lineage; and autologging reduces boilerplate across libraries like scikit‑learn and XGBoost. With metrics centralized, the team can query past runs to select the best candidate and register it in the MLflow Model Registry, enabling versioned promotion (e.g., Staging to Production), reproducibility of training conditions, and clear answers to “what’s in production” and “how was it trained.”

To guarantee feature consistency across training and inference, the chapter introduces Feast as a feature store. Features are organized into entities and feature views, stored in an offline repository (e.g., files in MinIO) and materialized to an online store (e.g., Redis) for low‑latency lookups. Feast’s point‑in‑time joins, TTLs, and SDK/API access ensure reproducible training datasets and up‑to‑date online features, promoting reuse and collaboration. Rounding out the platform, the chapter situates batch inference with Kubeflow Pipelines, real‑time serving with BentoML, and drift monitoring with Evidently, laying a reliable foundation that can be automated and scaled in subsequent chapters.

The mental map where we are now focusing on feature store(D), experiment tracking(C) and model registry(B).

A plot comparing the workclass categories distribution with the target variable. For example self-employed people are more likely to be earning greater than 50k.

MLflow UI - On the left we have the list of experiments which shows our newly created income-classifier experiment. After starting a MLflow run and saving the plots we can see a new entry under Run Name.

All the plots are present under artifacts. Run artifacts can include plots, files, and any object that can be saved on a disk.

The model metrics, and artifacts can all be seen under their respective tabs.

Auto-logging logs the model parameters and datasets without explicit logging. We even get feature importance plots automatically.

Using the MLflow UI to query runs that have a test AUC score < 0.8. Displaying the results in a chart view.

MLflow registered models can be seen under the models tab of the UI. Our Random Forest model can be seen here.

We split our single file into three files that represent three separate feature categories - demographic, relationship, and occupation.

Feast feature store design involves a feature pipeline populating the offline store and periodically Feast materializes the offline features to the online store. Feature registry holds feature definitions along with online and offline store information. Feast SDK provides methods to retrieve features from online and offline stores which can be used for training and inference purposes.

Feast UI gives us an easy way to visualize the details of feature views and entities for all our projects.

Summary

An experiment tracker such as MLflow can be used for tracking model performance and hyperparameters during model training and evaluation
MLflow Model Registry is a platform for managing, organizing, and versioning machine learning models, facilitating collaboration and deployment.
Feast the feature store streamlines the management and sharing of curated, ready-to-use features for machine learning, enhancing model development and deployment.
Feast enables point-in-time join to ensure the freshness of features at inference time
Feast supports both historical feature retrieval using offline stores and low-latency retrieval using online stores.

FAQ

Why do I need an experiment tracker, and what does MLflow track?

Reliable ML requires reproducibility, fair model comparison, and performance tracking over time. MLflow captures parameters (including hyperparameters and data references), metrics, and artifacts (plots, models, files) in a centralized tracking server so teams can compare, reproduce, and collaborate on experiments.

How do I set up and use the MLflow tracking server locally?

Install MLflow and start the UI with “mlflow ui” (default at http://localhost:5000). In your notebook/script, point MLflow to the server with set_tracking_uri, create or select an experiment with set_experiment, then wrap work in start_run blocks to log metrics, parameters, datasets, and artifacts.

What is an MLflow “run,” and how do I log artifacts like EDA plots?

A run represents one execution of an experiment and has a unique run ID. Place your EDA or training code inside start_run; save plots to disk and use mlflow.log_artifacts to persist the directory so plots and files are attached to that run for future inspection.

How can I link datasets to runs for full reproducibility?

Save datasets to an object store (e.g., MinIO, S3, GCS), then create MLflow dataset objects (mlflow.data.from_pandas) with a “source” URI. Log them via mlflow.log_input using contexts like “training,” “testing,” and “reference” so the run records exactly which data was used.

When should I use MLflow autologging versus manual logging?

Autologging (e.g., mlflow.xgboost.autolog()) automatically records many parameters, metrics, models, and plots for supported libraries, reducing boilerplate. You may still need manual logs for custom metrics or to control dataset formats (autologging may log arrays instead of dataframes), so a hybrid approach is common.

How do I pick the best model and register it in the MLflow Model Registry?

Use the UI’s search (e.g., filter by metrics.roc_auc_score_test and sort descending) or programmatically with MlflowClient.search_runs to find top runs. Register the chosen run’s model (runs:/<run_id>/<artifact_path>) with register_model, then manage lifecycle stages (e.g., Staging, Production) in the registry.

What problems does a feature store like Feast solve?

Feast ensures consistent, reusable features across training and inference, provides point-in-time correct joins for historical retrieval, and centralizes feature definitions in a registry. It bridges offline storage (e.g., files/warehouse) and an online low-latency store (e.g., Redis) for real-time serving.

What are entities, FeatureViews, and TTL in Feast?

An entity (e.g., user_id) identifies the subject for which features are computed. A FeatureView groups related features, declares schema and data source, and sets TTL to bound how far back Feast looks when assembling historical datasets, favoring the freshest valid features near the event time.

How do I configure Feast with MinIO (offline) and Redis (online)?

Define FileSource paths as s3 URIs with an endpoint override pointing to MinIO. In feature_store.yaml, set the registry location, provider, offline_store type (file), and online_store type (redis) with its connection string. Run “feast apply” to register entities and FeatureViews and provision online infrastructure.

How do I retrieve features for training and for real-time inference?

For training/batch, use get_historical_features with an entity dataframe (must include user_id and event_timestamp) and a list of features. Materialize the offline data to the online store for a time window, then use get_online_features for the latest features at inference; optionally expose HTTP endpoints with “feast serve” and browse definitions with “feast ui.”

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

eBook

pdf, ePub, online

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more