Grokking Deep Learning
Andrew W. Trask
  • January 2019
  • ISBN 9781617293702
  • 336 pages
  • printed in black & white

An excellent introduction and overview of deep learning by a masterful teacher who guides, illuminates, and encourages you along the way.

Kelvin D. Meeks, International Technology Ventures

Grokking Deep Learning teaches you to build deep learning neural networks from scratch! In his engaging style, seasoned deep learning expert Andrew Trask shows you the science under the hood, so you grok for yourself every detail of training neural networks.

About the Technology

Deep learning, a branch of artificial intelligence, teaches computers to learn by using neural networks, technology inspired by the human brain. Online text translation, self-driving cars, personalized product recommendations, and virtual voice assistants are just a few of the exciting modern advancements possible thanks to deep learning.

About the book

Grokking Deep Learning teaches you to build deep learning neural networks from scratch! In his engaging style, seasoned deep learning expert Andrew Trask shows you the science under the hood, so you grok for yourself every detail of training neural networks. Using only Python and its math-supporting library, NumPy, you’ll train your own neural networks to see and understand images, translate text into different languages, and even write like Shakespeare! When you’re done, you’ll be fully prepared to move on to mastering deep learning frameworks.

Table of Contents detailed table of contents

1 introducing deep learning: why you should learn it

Why you should learn deep learning

Will this be difficult to learn?

Why you should read this book

What you need to get started

You’ll probably need some Python knowledge


2 fundamental concepts: how do machines learn?

What is deep learning?

Supervised machine learning

Unsupervised machine learning

Parametric vs. nonparametric learning

Supervised parametric learning

Unsupervised parametric learning

Nonparametric learning


3 introduction to neural prediction: forward propagation

Step 1: Predict

A simple neural network making a prediction

What is a neural network?

What does this neural network do?

Making a prediction with multiple inputs

Multiple inputs: What does this neural network do?

Multiple inputs: Complete runnable code

Making a prediction with multiple outputs

Predicting with multiple inputs and outputs

Multiple inputs and outputs: How does it work?

Predicting on predictions

A quick primer on NumPy


4 introduction to neural learning: gradient descent

Predict, compare, and learn



Compare: Does your network make

good predictions?

Why measure error?

What’s the simplest form of neural learning?

Hot and cold learning

Characteristics of hot and cold learning

Calculating both direction and amount from error

One iteration of gradient descent

Learning is just reducing error

Let’s watch several steps of learning

Why does this work? What is weight_delta, really?

Tunnel vision on one concept

A box with rods poking out of it

Derivatives: Take two

What you really need to know

What you don’t really need to know

How to use a derivative to learn

Look familiar?

Breaking gradient descent

Visualizing the overcorrections


Introducing alpha

Alpha in code


5 learning multiple weights at a time: generalizing gradient descent

Gradient descent learning with multiple inputs

Gradient descent with multiple inputs explained

Let’s watch several steps of learning

Freezing one weight: What does it do?

Gradient descent learning with multiple outputs

Gradient descent with multiple inputs and outputs

What do these weights learn?

Visualizing weight values

Visualizing dot products (weighted sums)


6 building your first deep neural network: introduction to backpropagation

The streetlight problem

Preparing the data

Matrices and the matrix relationship

Creating a matrix or two in Python

Building a neural network

Learning the whole dataset

Full, batch, and stochastic gradient descent

Neural networks learn correlation

Up and down pressure

Edge case: Overfitting

Edge case: Conflicting pressure

Learning indirect correlation

Creating correlation

Stacking neural networks: A review

Backpropagation: Long-distance error attribution

Backpropagation: Why does this work?

Linear vs. nonlinear

Why the neural network still doesn’t work

The secret to sometimes correlation

A quick break

Your first deep neural network

Backpropagation in code

One iteration of backpropagation

Putting it all together

Why do deep networks matter?

7 how to picture neural networks: in your head and on paper

It’s time to simplify

Correlation summarization

The previously overcomplicated visualization

The simplified visualization

Simplifying even further

Let’s see this network predict

Visualizing using letters instead of pictures

Linking the variables

Everything side by side

The importance of visualization tools

8 learning signal and ignoring noise: introduction to regularization and batching

Three-layer network on MNIST

Well, that was easy

Memorization vs. generalization

Overfitting in neural networks

Where overfitting comes from

The simplest regularization: Early stopping

Industry standard regularization: Dropout

Why dropout works: Ensembling works

Dropout in code

Dropout evaluated on MNIST

Batch gradient descent


9 modeling probabilities and nonlinearities: activation functions

What is an activation function?

Standard hidden-layer activation functions

Standard output layer activation functions

The core issue: Inputs have similarity

softmax computation

Activation installation instructions

Multiplying delta by the slope

Converting output to slope (derivative)

Upgrading the MNIST network

10 neural learning about edges and corners: intro to convolutional neural networks

Reusing weights in multiple places

The convolutional layer

A simple implementation in NumPy


11 neural networks that understand language: king — man + woman == ?

What does it mean to understand language?

Natural language processing (NLP)

Supervised NLP

IMDB movie reviews dataset

Capturing word correlation in input data

Predicting movie reviews

Intro to an embedding layer

Interpreting the output

Neural architecture

Comparing word embeddings

What is the meaning of a neuron?

Filling in the blank

Meaning is derived from loss

King — Man + Woman ~= Queen

Word analogies


12 neural networks that write like Shakespeare: recurrent layers for variable-length data

The challenge of arbitrary length

Do comparisons really matter?

The surprising power of averaged word vectors

How is information stored in these embeddings?

How does a neural network use embeddings?

The limitations of bag-of-words vectors

Using identity vectors to sum word embeddings

Matrices that change absolutely nothing

Learning the transition matrices

Learning to create useful sentence vectors

Forward propagation in Python

How do you backpropagate into this?

Let’s train it!

Setting things up

Forward propagation with arbitrary length

Backpropagation with arbitrary length

Weight update with arbitrary length

Execution and output analysis


13 introducing automatic optimization: let’s build a deep learning framework

What is a deep learning framework?

Introduction to tensors

Introduction to automatic gradient computation (autograd)

A quick checkpoint

Tensors that are used multiple times

Upgrading autograd to support multiuse tensors

How does addition backpropagation work?

Adding support for negation

Adding support for additional functions

Using autograd to train a neural network

Adding automatic optimization

Adding support for layer types

Layers that contain layers

Loss-function layers

How to learn a framework

Nonlinearity layers

The embedding layer

Adding indexing to autograd

The embedding layer (revisited)

The cross-entropy layer

The recurrent neural network layer


14 learning to write like Shakespeare: long short-term memory

Character language modeling

The need for truncated backpropagation

Truncated backpropagation

A sample of the output

Vanishing and exploding gradients

A toy example of RNN backpropagation

Long short-term memory (LSTM) cells

Some intuition about LSTM gates

The long short-term memory layer

Upgrading the character language model

Training the LSTM character language model

Tuning the LSTM character language model


15 deep learning on unseen data: introducing federated learning

The problem of privacy in deep learning

Federated learning

Learning to detect spam

Let’s make it federated

Hacking into federated learning

Secure aggregation

Homomorphic encryption

Homomorphically encrypted federated learning


16 where to go from here: a brief guide


Step 1: Start learning PyTorch

Step 2: Start another deep learning course

Step 3: Grab a mathy deep learning textbook

Step 4: Start a blog, and teach deep learning

Step 5: Twitter

Step 6: Implement academic papers

Step 7: Acquire access to a GPU (or many)

Step 8: Get paid to practice

Step 9: Join an open source project

Step 10: Develop your local community

What's inside

  • The science behind deep learning
  • Building and training your own neural networks
  • Privacy concepts, including federated learning
  • Tips for continuing your pursuit of deep learning

About the reader

For readers with high school-level math and intermediate programming skills.

About the author

Andrew Trask is a PhD student at Oxford University and a research scientist at DeepMind. Previously, Andrew was a researcher and analytics product manager at Digital Reasoning, where he trained the world’s largest artificial neural network and helped guide the analytics roadmap for the Synthesys cognitive computing platform.

We interviewed Andrew as a part of our Six Questions series. Check it out here.

placing your order...

Don't refresh or navigate away from the page.
print book $29.99 $49.99 pBook + eBook + liveBook
Additional shipping charges may apply
Grokking Deep Learning (print book) added to cart
continue shopping
go to cart

eBook $31.99 $39.99 3 formats + liveBook
Grokking Deep Learning (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks