Grokking Machine Learning
Luis G. Serrano
  • MEAP began May 2019
  • Publication in Spring 2021 (estimated)
  • ISBN 9781617295911
  • 350 pages (estimated)
  • printed in black & white

Written in an approachable manner with great use of very illustrative and applicable examples.

Borko Djurkovic
It's time to dispel the myth that machine learning is difficult. Grokking Machine Learning teaches you how to apply ML to your projects using only standard Python code and high school-level math. No specialist knowledge is required to tackle the hands-on exercises using readily-available machine learning tools!

About the Technology

Machine learning is a collection of mathematically-based techniques and algorithms that enable computers to identify patterns and generate predictions from data. This revolutionary data analysis approach is behind everything from recommendation systems to self-driving cars, and is transforming industries from finance to art. Whatever your field, knowledge of machine learning is becoming an essential skill. Python, along with its libraries like NumPy, Pandas, and scikit-learn, has become the go-to language for machine learning.

About the book

In Grokking Machine Learning, expert machine learning engineer Luis Serrano introduces the most valuable ML techniques and teaches you how to make them work for you. You’ll only need high school math to dive into popular approaches and algorithms. Practical examples illustrate each new concept to ensure you’re grokking as you go. You’ll build models for spam detection, language analysis, and image recognition as you lock in each carefully-selected skill. Packed with easy-to-follow Python-based exercises and mini-projects, this book sets you on the path to becoming a machine learning expert. When you’re done, you’ll have an intuitive understanding of the right approach for any machine learning task or project.
Table of Contents detailed table of contents

1 What is machine learning?

1.1 Why this book?

1.2 Is machine learning hard?

1.3 But what exactly is machine learning?

1.3.1 What is the difference between artificial intelligence and machine learning?

1.3.2 What about deep learning?

1.4 Humans use the remember-formulate-predict framework to make decisions (and so can machines!)

1.4.1 How do humans think?

1.4.2 How do machines think?

1.5 What is this book about?

1.6 Summary

2 Types of machine learning

2.1 What is the difference between labelled and unlabelled data?

2.2 What is supervised learning?

2.2.1 Regression models predict numbers

2.2.2 Classification models predict a state

2.3 What is unsupervised learning?

2.3.1 Clustering algorithms split a dataset into similar groups

2.3.2 Dimensionality reduction simplifies data without losing much information

2.3.3 Matrix factorization and other types of unsupervised learning

2.4. What is reinforcement learning?

2.5 Summary

3 Drawing a line close to our points: Linear regression

3.1 The problem: We need to predict the price of a house

3.2 The solution: Building a regression model for housing prices

3.2.1 The remember step: looking at the prices of existing houses

3.2.2 The formulate step: formulating a rule that estimates the price of the house

3.2.3 The predict step: what do we do when a new house comes in the market?

3.2.4 Some questions that arise and some quick answers

3.3 How to get the computer to draw this line: the linear regression algorithm

3.3.1 Crash course on slope and y-intercept

3.3.2 A simple trick to move a line closer to a set of points, one point at a time.

3.3.3 The square trick: A much more clever way of moving our line closer to one of the points

3.3.4 The linear regression algorithm: Repeating the square trick many times

3.3.5 Plotting dots and lines

3.3.6 Using the linear regression algorithm in our dataset

3.4 Applications of linear regression

3.4.1 The absolute error

3.4.2 The square error

3.4.3 Gradient descent

3.4.4 Plotting the error function

3.5 Applications

3.5.1 Applications of linear regression

3.6 Summary

4 Using lines to split our points: The perceptron algorithm

4.1 The problem: We are in an alien planet, and we don’t know their language!

4.1.1 A slightly more complicated planet

4.1.2 The bias, the y-intercept, and the inherent mood of a quiet alien

4.1.3 More general cases

4.2 How do we determine if a classifier is good or bad? The error function

4.2.1 How to compare classifiers? The error function

4.3 How to find a good classifier? The perceptron algorithm

4.3.1 The perceptron trick

4.4 Repeating the perceptron trick many times: The perceptron algorithm

4.5 Coding the perceptron algorithm

4.5.1 Coding the perceptron trick

4.6 Applications

4.6.1 Applications of the perceptron algorithm

4.7 Some drawbacks of the perceptron algorithm, which will be addressed very soon!

4.8 Summary

5 A continuous approach to splitting points: Logistic regression

5.1 Logistic Regression (or continuous perceptrons)

5.1.1 A probability approach to classification - The sigmoid function

5.1.2 The error functions - Absolute, square, and log loss

5.1.3 More on the log loss error function

5.2 Reducing the log loss error: The logistic regression trick

5.2.1 An example with a discrete perceptron and a continuous perceptron

5.2.2 A second example with a discrete perceptron and a continuous perceptron

5.2.3 Moving the line to fit the points - The logistic regression algorithm

5.2.4 Coding the logistic regression algorithm

5.2.5 The logistic regression algorithm in Turi Create

5.3 Classifying into multiple classes - The softmax function

5.4 Summary

6 Using probability to its maximum: The naive Bayes algorithm

6.1 Sick or healthy? A story with Bayes Theorem

6.1.1 Prelude to Bayes Theorem: The prior, the event, and the posterior

6.2 Use-case: Spam detection model

6.2.1 Finding the prior: The probability that any email is spam

6.2.2 Finding the posterior: The probability that an email is spam knowing that it contains a particular word

6.2.3 What the math just happened? Turning ratios into probabilities

6.2.4 What about two words? The naive Bayes algorithm

6.2.5 What about more than two words?

6.3 Building a spam detection model with real data

6.3.1 Data preprocessing

6.3.2 Finding the priors

6.3.3 Finding the posteriors with Bayes theorem

6.3.4 Implementing the naive Bayes algorithm

6.3.5 Further work

6.4 Summary

7 Splitting data by asking questions: Decision trees

7.1 The problem: We need to recommend apps to users according to what they are likely to download

7.2 The solution: Building an app recommendation system

7.2.1 The remember-formulate-predict framework

7.2.2 First step to build the model: Asking the best question

7.2.3 Next and final step: Iterate by asking the best question every time

7.2.4 Using the model by making predictions

7.3 Building the tree: How to pick the right feature to split

7.3.1 How to pick the best feature to split our data: Accuracy

7.3.2 How to pick the best feature to split our data: Gini impurity

7.4 Back to recommending apps: Building our decision tree using Gini index

7.5 When do we stop building the tree? Hyperparameters

7.6 Beyond questions like yes/no

7.6.1 Features with more categories, such as Dog/Cat/Bird

7.6.2 Continuous features, such as a number

7.7 Coding a decision tree with sklearn

7.8 A slightly larger example: Spam detection again!

7.9 Applications

7.9.1 Building the simplest possible classifier - A decision tree that predicts the same value for each data point

7.9.2 Iterating on the simplest possible classifier by reducing the total square error

7.10 Applications

7.10.1 Decision trees are widely used in health care

7.10.2 Decision trees are useful in recommendation systems

7.11 Summary

8 Combining building blocks to gain more power: neural networks

8.1 The problem - A more complicated alien planet!

8.1.1 Solution - If one line is not enough, use two lines to classify your dataset

8.1.2 Why two lines? Is happiness not linear?

8.1.3 Perceptrons and how to combine them

8.1.4 From discrete perceptrons to continuous perceptrons - a trick to improve our training

8.2 The general scenario - Neural networks

8.2.1 The architecture of a neural network

8.2.2 Bias vs Threshold - Two equivalent ways of describing the constant term in the perceptron

8.3 Training neural networks

8.3.1 Error function - A way to measure how our neural network is performing

8.3.2 Backpropagation - The key step in reducing the error function in order to train the neural network

8.3.3 Potential problems with neural networks - From overfitting to vanishing gradients

8.3.4 Techniques for training your neural network - Dropout, regularization

8.3.5 Different activation functions - Sigmoid, hyperbolic tangent (tanh), and the rectified linear unit (ReLU)

8.3.6 More than one input? No problem, the softmax function is here to help

8.3.7 Hyperparameters - what we fine tune to improve our training

8.3.8 Can neural networks predict values instead of classes? Yes we can! - Neural networks for regression

8.4 How to code a neural network in Keras

8.4.1 Categorizing our data - a way to turn categorical features into numbers

8.4.2 The architecture of a neural network that we’ll use to train this dataset

8.4.3 Defining the model in Keras - Number of layers, size of each layer, and activation functions

8.4.4 Training the model in Keras

8.5 Other more complicated architectures and some sci-fi applications

8.5.1 How neural networks see - Image recognition

8.5.2 How neural networks talk - Natural language processing

8.5.3 How neural networks generate faces that look real - Generative adversarial networks

8.6 Summary

9 Finding boundaries with style: Support vector machines and the kernel method

9.1 Using a new error function to build better classifiers

9.1.1 Classification error function - trying to classify the points correctly

9.1.2 Distance error function - trying to space our two lines as far apart as possible

9.1.3 Adding the two error functions to obtain the error function

9.1.4 Using a dial to decide how we want our model: The C parameter

9.2 Coding support vector machines in sklearn

9.2.1 Coding a simple SVM

9.2.2 Introducing the C parameter

9.3 Going from lines to circles, parabolas, etc. - The kernel method

9.3.1 Using polynomial equations (circles, parabolas, hyperbolas, etc.) to our benefit - The polynomial kernel

9.3.2 Using bumps in higher dimensions to our benefit - The radial basis function (rbf) kernel

9.3.3 Training an SVM with the rbf kernel

9.3.4 Coding the kernel method

9.4 Summary

10 Combining models to maximize results: ensemble learning

10.1 With a little help from our friends

10.2 Why an ensemble of learners? Why not just one really good learner?

10.3 Bagging - Joining some classifiers together to build a stronger classifier

10.3.1 Building random forests by joining several trees

10.3.2 Coding a random forest in sklearn

10.4 Boosting - Joining some classifiers together in a smarter way to get a stronger classifier

10.4.1 A big picture of AdaBoost

10.4.2 A detailed (mathematical) picture of AdaBoost

10.4.3 Coding AdaBoost in Sklearn

10.5 XGboost - An extreme way to do gradient boosting

10.5.1 XGBoost similarity score

10.5.2 Building the learners

10.6 Applications of ensemble methods

10.7 Summary

Appendixes

Appendix A: The math behind the algorithms

What's inside

  • Different types of machine learning, including supervised and unsupervised learning
  • Algorithms for simplifying, classifying, and splitting data
  • Machine learning packages and tools
  • Hands-on exercises with fully-explained Python code samples

About the reader

For readers with intermediate programming knowledge in Python or a similar language. No machine learning experience or advanced math skills necessary.

About the author

Luis G. Serrano has worked as the Head of Content for Artificial Intelligence at Udacity and as a Machine Learning Engineer at Google, where he worked on the YouTube recommendations system. He holds a PhD in mathematics from the University of Michigan, a Bachelor and Masters from the University of Waterloo, and worked as a postdoctoral researcher at the University of Quebec at Montreal. He shares his machine learning expertise on a YouTube channel with over 2 million views and 35 thousand subscribers, and is a frequent speaker at artificial intelligence and data science conferences.

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $29.99 $49.99 pBook + eBook + liveBook
Additional shipping charges may apply
Grokking Machine Learning (print book) added to cart
continue shopping
go to cart

eBook $31.99 $39.99 3 formats + liveBook
Grokking Machine Learning (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks