How Machine Learning Works
Mostafa Samir Abd El-Fattah
  • MEAP began August 2019
  • Publication in December 2020 (estimated)
  • ISBN 9781617294884
  • 400 pages (estimated)
  • printed in black & white
We regret that Manning Publications will not be publishing this title.

A great base for getting started on Machine Learning theory and learning how to use Python tools to create models.

Elias Rangel
Many libraries and services treat machine learning like a black box—you just plug in your data and trust that the answer is correct. To really understand machine learning you need to know what’s going on inside the system. How Machine Learning Works is an introduction to core ML techniques and algorithms with a focus on understanding the underlying theory and mathematics. With this invaluable guide, you’ll acquire the competitive edge that comes from knowing what to do and why it works.

About the Technology

Machine learning is the general term for a collection of data analysis techniques that accurately and efficiently identify patterns and relationships in data and then use those models to make predictions about new data. ML drives many features of modern applications, such as tailored product recommendations, social media feeds, forecasting consumer trends, customized therapy for people with developmental challenges, and other world-changing innovations. To understand, create, and apply new ML models you need both practical skills using ML tools and libraries and a deep understanding of the theory and math under the hood.

About the book

How Machine Learning Works gives you an in-depth look at the mathematical and theoretical foundations of machine learning. Seasoned practitioner Mostafa Samir Abd El-Fattah takes you step by step through a real-world ML projects. In it, you’ll learn the components that make up a machine learning problem and explore supervised and unsupervised learning. Blending theoretical foundations with practical ML skills, you’ll learn to read existing datasets using pandas, a fast and powerful Python library for data analysis and manipulation. Then, you’ll move on to choosing and implementing ML models with scikit-learn, a popular Python framework that provides a diverse range of ML models and algorithms.

Along the way, you’ll be practicing important math skills, including working with probability, random variables, mean, variance, vectors, matrices, linear algebra, and statistics. You’ll also discover similarity-based methods like K-nearest neighbor and K-means clustering; decision tree-based methods like classification and regression trees; and linear methods like regularization and logical regression. Instead of simply applying black-box methods and techniques to ML problems, you’ll grok their underlying structure and apply a robust mathematical understanding alongside your practical skills. By the end of this comprehensive guide, you’ll be able to comfortably explore and understand the latest ML research as well as identify and tackle novel ML problems!
Table of Contents detailed table of contents

Part 0: Setting the Stage

1 The Traveling Diabetes Clinic: A first take at the problem

1.1 The Traveling Diabetes Clinic Problem

1.1.1 Reading the data with pandas

1.2 A Simple ML Attempt with scikit-learn

1.2.1 Choosing a Model

1.2.2 Implementing the Model with scikit-learn

1.2.3 Establishing a Baseline

2 Grokking the Problem: What does the data look like?

2.1 Populations and Samples

2.2 Descriptive Statistics

2.2.1 Mean, Mode, and Median

2.2.2 Ranges, Sample Variance, and Sample Standard Deviation

2.2.3 Histogram Plots

3 Grokking Deeper: Where did the data come from?

3.1 Probability and Distributions

3.1.1 Random Variables, Distributions, and their Properties

3.1.2 How to read math?

3.1.3 Expectation, Variance, and Estimations

3.2 Conditional Probability

3.2.1 The Bayes Rule

3.2.2 Independent Random Variables

3.3 Applying the Naive Bayes Model with scikit-learn

4 Setting the Stage

4.1 Generative and Discriminative Models

4.1.1 Generative Models

4.1.2 Discriminative Models and the Target Function

4.1.3 Which Is Better?

4.2 Types of Machine Learning Problems

Part 1: Similarity Based Methods

5 K-Nearest Neighbors Method

5.1 A Basic K-NN Classifier

5.1.1 The “Can I eat that?” App

5.1.2 The Intuition Behind k-NN

5.1.3 How to Measure Similarity?

5.1.4 k-NN in Action

5.1.5 Boosting Performance with NumPy

5.2 A Better k-NN Classifier

5.2.1 Doing Faster Neighborhood Search Using K-d trees

5.2.2 Using k-d Trees with scikit-learn

5.2.3 Tuning the Value of k

5.2.4 Choosing the Metric

5.3 Is K-nn Reliable?

5.3.1 The Bayes Optimal Classifier

5.3.2 Reliability of 1-NN

6 K-means Clustering

6.1 A New Marketing Plan for a Wholesale Distributor

6.1.1 The K-means Method

6.1.2 All Features Shall be Equal

6.1.3 Applying K-means with scikit-learn

6.2 Tuning the Value of k with Silhouette Score

6.2.1 Creating Marketing Plans Against the Detected Customers Segments

6.3 Limitations of K-means

6.3.1 The Math Beyond the Circular Tendency

Part 2: Tree-Based Methods

7 Decision Trees

7.1 Predicting the Price of a Used Car

7.1.1 Modeling the Problem with Decision Tress

7.1.2 How to Build a Decision Tree

7.1.3 Coding a Primitive Decision Tree

7.2 Training a Decision Tree with scikit-learn

7.2.1 Preparing the Data

7.2.2 Training and Evaluating the Decision Tree

7.3 Trim the Tree or Grow Yourself Forest

7.3.1 Pruning the Tree

7.3.2 Random Forests

7.4 What Controls Generalization?

7.4.1 Why do Machines Learn from Data?

7.4.2 Generalization Bounds

7.4.3 The Bias-Variance Trade-off

7.4.4 Why do Random Forests Work so Well?

8 Hierarchical Clustering

Part 3: Linear Methods

9 Linear and Logistic Regression

10 Support Vector Machines

11 Principal Component Analysis


Appendix A: Anaconda Distribution and Jupyter Notebooks

A.1 Installing the Anaconda Distribution

A.1.1 Installing Anaconda for Linux-based OS (like Ubuntu)

A.1.2 Installing Anaconda for macOS

A.1.3 Installing Anaconda for Windows

A.2 Working with Jupyter Notebooks

A.2.1 Exploring a Jupyter Notebook

A.2.2 Markdown Cells

A.2.3 Code Cells

A.2.4 How does all this work?

What's inside

  • Understanding machine learning problems
  • A review of probability and statistics
  • Similarity-based, tree-based, and linear ML methods
  • Working with neural networks
  • An introduction to deep learning
  • Probabilistic models

About the reader

For programmers with basic Python skills, average math skills, and a keen interest in the fundamentals of machine learning.

About the author

Mostafa Samir Abd El-Fattah has a BSc. in Computer Science and is currently working as a Senior Machine Learning Research Engineer at Mawdoo3, with a focus on developing solutions for Arabic Natural Language Processing and Understanding (NLP & NLU). He also blogs about AI and ML his blog

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks