Click the table of contents to start reading.

An excellent introduction and overview of deep learning by a masterful teacher who guides, illuminates, and encourages you along the way.

*Grokking Deep Learning* teaches you to build deep learning neural networks from scratch! In his engaging style, seasoned deep learning expert Andrew Trask shows you the science under the hood, so you grok for yourself every detail of training neural networks.

## 1 introducing deep learning: why you should learn it

#### Why you should learn deep learning

#### Will this be difficult to learn?

#### Why you should read this book

#### What you need to get started

#### You’ll probably need some Python knowledge

#### Summary

## 2 fundamental concepts: how do machines learn?

#### What is deep learning?

#### Supervised machine learning

#### Unsupervised machine learning

#### Parametric vs. nonparametric learning

#### Supervised parametric learning

#### Unsupervised parametric learning

#### Nonparametric learning

#### Summary

## 3 introduction to neural prediction: forward propagation

#### Step 1: Predict

#### A simple neural network making a prediction

#### What is a neural network?

#### What does this neural network do?

#### Making a prediction with multiple inputs

#### Multiple inputs: What does this neural network do?

#### Multiple inputs: Complete runnable code

#### Making a prediction with multiple outputs

#### Predicting with multiple inputs and outputs

#### Multiple inputs and outputs: How does it work?

#### Predicting on predictions

#### A quick primer on NumPy

#### Summary

## 4 introduction to neural learning: gradient descent

#### Predict, compare, and learn

#### Compare

#### Learn

#### Compare: Does your network make

#### good predictions?

#### Why measure error?

#### What’s the simplest form of neural learning?

#### Hot and cold learning

#### Characteristics of hot and cold learning

#### Calculating both direction and amount from error

#### One iteration of gradient descent

#### Learning is just reducing error

#### Let’s watch several steps of learning

#### Why does this work? What is weight_delta, really?

#### Tunnel vision on one concept

#### A box with rods poking out of it

#### Derivatives: Take two

#### What you really need to know

#### What you don’t really need to know

#### How to use a derivative to learn

#### Look familiar?

#### Breaking gradient descent

#### Visualizing the overcorrections

#### Divergence

#### Introducing alpha

#### Alpha in code

#### Memorizing

## 5 learning multiple weights at a time: generalizing gradient descent

#### Gradient descent learning with multiple inputs

#### Gradient descent with multiple inputs explained

#### Let’s watch several steps of learning

#### Freezing one weight: What does it do?

#### Gradient descent learning with multiple outputs

#### Gradient descent with multiple inputs and outputs

#### What do these weights learn?

#### Visualizing weight values

#### Visualizing dot products (weighted sums)

#### Summary

## 6 building your first deep neural network: introduction to backpropagation

#### The streetlight problem

#### Preparing the data

#### Matrices and the matrix relationship

#### Creating a matrix or two in Python

#### Building a neural network

#### Learning the whole dataset

#### Full, batch, and stochastic gradient descent

#### Neural networks learn correlation

#### Up and down pressure

#### Edge case: Overfitting

#### Edge case: Conflicting pressure

#### Learning indirect correlation

#### Creating correlation

#### Stacking neural networks: A review

#### Backpropagation: Long-distance error attribution

#### Backpropagation: Why does this work?

#### Linear vs. nonlinear

#### Why the neural network still doesn’t work

#### The secret to sometimes correlation

#### A quick break

#### Your first deep neural network

#### Backpropagation in code

#### One iteration of backpropagation

#### Putting it all together

#### Why do deep networks matter?

## 7 how to picture neural networks: in your head and on paper

#### It’s time to simplify

#### Correlation summarization

#### The previously overcomplicated visualization

#### The simplified visualization

#### Simplifying even further

#### Let’s see this network predict

#### Visualizing using letters instead of pictures

#### Linking the variables

#### Everything side by side

#### The importance of visualization tools

## 8 learning signal and ignoring noise: introduction to regularization and batching

#### Three-layer network on MNIST

#### Well, that was easy

#### Memorization vs. generalization

#### Overfitting in neural networks

#### Where overfitting comes from

#### The simplest regularization: Early stopping

#### Industry standard regularization: Dropout

#### Why dropout works: Ensembling works

#### Dropout in code

#### Dropout evaluated on MNIST

#### Batch gradient descent

#### Summary

## 9 modeling probabilities and nonlinearities: activation functions

#### What is an activation function?

#### Standard hidden-layer activation functions

#### Standard output layer activation functions

#### The core issue: Inputs have similarity

#### softmax computation

#### Activation installation instructions

#### Multiplying delta by the slope

#### Converting output to slope (derivative)

#### Upgrading the MNIST network

## 10 neural learning about edges and corners: intro to convolutional neural networks

#### Reusing weights in multiple places

#### The convolutional layer

#### A simple implementation in NumPy

#### Summary

## 11 neural networks that understand language: king — man + woman == ?

#### What does it mean to understand language?

#### Natural language processing (NLP)

#### Supervised NLP

#### IMDB movie reviews dataset

#### Capturing word correlation in input data

#### Predicting movie reviews

#### Intro to an embedding layer

#### Interpreting the output

#### Neural architecture

#### Comparing word embeddings

#### What is the meaning of a neuron?

#### Filling in the blank

#### Meaning is derived from loss

#### King — Man + Woman ~= Queen

#### Word analogies

#### Summary

## 12 neural networks that write like Shakespeare: recurrent layers for variable-length data

#### The challenge of arbitrary length

#### Do comparisons really matter?

#### The surprising power of averaged word vectors

#### How is information stored in these embeddings?

#### How does a neural network use embeddings?

#### The limitations of bag-of-words vectors

#### Using identity vectors to sum word embeddings

#### Matrices that change absolutely nothing

#### Learning the transition matrices

#### Learning to create useful sentence vectors

#### Forward propagation in Python

#### How do you backpropagate into this?

#### Let’s train it!

#### Setting things up

#### Forward propagation with arbitrary length

#### Backpropagation with arbitrary length

#### Weight update with arbitrary length

#### Execution and output analysis

#### Summary

## 13 introducing automatic optimization: let’s build a deep learning framework

#### What is a deep learning framework?

#### Introduction to tensors

#### Introduction to automatic gradient computation (autograd)

#### A quick checkpoint

#### Tensors that are used multiple times

#### Upgrading autograd to support multiuse tensors

#### How does addition backpropagation work?

#### Adding support for negation

#### Adding support for additional functions

#### Using autograd to train a neural network

#### Adding automatic optimization

#### Adding support for layer types

#### Layers that contain layers

#### Loss-function layers

#### How to learn a framework

#### Nonlinearity layers

#### The embedding layer

#### Adding indexing to autograd

#### The embedding layer (revisited)

#### The cross-entropy layer

#### The recurrent neural network layer

#### Summary

## 14 learning to write like Shakespeare: long short-term memory

#### Character language modeling

#### The need for truncated backpropagation

#### Truncated backpropagation

#### A sample of the output

#### Vanishing and exploding gradients

#### A toy example of RNN backpropagation

#### Long short-term memory (LSTM) cells

#### Some intuition about LSTM gates

#### The long short-term memory layer

#### Upgrading the character language model

#### Training the LSTM character language model

#### Tuning the LSTM character language model

#### Summary

## 15 deep learning on unseen data: introducing federated learning

#### The problem of privacy in deep learning

#### Federated learning

#### Learning to detect spam

#### Let’s make it federated

#### Hacking into federated learning

#### Secure aggregation

#### Homomorphic encryption

#### Homomorphically encrypted federated learning

#### Summary

## 16 where to go from here: a brief guide

#### Congratulations!

#### Step 1: Start learning PyTorch

#### Step 2: Start another deep learning course

#### Step 3: Grab a mathy deep learning textbook

#### Step 4: Start a blog, and teach deep learning

#### Step 5: Twitter

#### Step 6: Implement academic papers

#### Step 7: Acquire access to a GPU (or many)

#### Step 8: Get paid to practice

#### Step 9: Join an open source project

#### Step 10: Develop your local community

## About the Technology

Deep learning, a branch of artificial intelligence, teaches computers to learn by using neural networks, technology inspired by the human brain. Online text translation, self-driving cars, personalized product recommendations, and virtual voice assistants are just a few of the exciting modern advancements possible thanks to deep learning.

## About the book

*Grokking Deep Learning* teaches you to build deep learning neural networks from scratch! In his engaging style, seasoned deep learning expert Andrew Trask shows you the science under the hood, so you grok for yourself every detail of training neural networks. Using only Python and its math-supporting library, NumPy, you’ll train your own neural networks to see and understand images, translate text into different languages, and even write like Shakespeare! When you’re done, you’ll be fully prepared to move on to mastering deep learning frameworks.

## What's inside

- The science behind deep learning
- Building and training your own neural networks
- Privacy concepts, including federated learning
- Tips for continuing your pursuit of deep learning

## About the author

**Andrew Trask** is a PhD student at Oxford University and a research scientist at DeepMind. Previously, Andrew was a researcher and analytics product manager at Digital Reasoning, where he trained the world’s largest artificial neural network and helped guide the analytics roadmap for the Synthesys cognitive computing platform.