Artificial Intelligence is one of the most exciting technologies of the century, and Deep Learning is in many ways the "brain" behind some of the world's smartest Artificial Intelligence systems out there. Loosely based on neuron behavior inside of human brains, these systems are rapidly catching up with the intelligence of their human creators, defeating the world champion Go player, achieving superhuman performance on video games, driving cars, translating languages, and sometimes even helping law enforcement fight crime. Deep Learning is a revolution that is changing every industry across the globe.

Dig deeper into the world of deep learning with *Grokking Deep Learning in Motion*. Together, this book and video combo covers everything you need to truly grok this exciting world.

# Part 1: Neural Network Basics

## 1 Introducing Deep Learning: Why you should learn it

### 1.1 Welcome to Grokking Deep Learning

### 1.2 Why you should learn Deep Learning

### 1.3 Will this be hard to learn?

### 1.4 Why you should read this book

### 1.5 Why you should read this book (cont.)

### 1.6 What you need to get started

### 1.7 You’ll probably need some Python knowledge

### 1.8 How much coding experience should you have?

### 1.9 Conclusion and Primer for Chapter 2

## 2 Fundamental Concepts: How do machines learn?

### 2.1 What is Deep Learning?

### 2.2 What is Machine Learning?

### 2.3 Supervised Machine Learning

### 2.4 Unsupervised Machine Learning

### 2.5 Parametric vs Non-Parametric Learning

### 2.6 Supervised Parametric Learning

### 2.7 Step 1: Predict

### 2.8 Step 2: Compare to Truth Pattern

### 2.9 Step 3: Learn the Pattern

### 2.10 Unsupervised Parametric Learning

### 2.11 Non-Parametric Learning

### 2.12 Conclusion

## 3 Introduction to Neural Prediction: Forward Propagation

### 3.1 Step 1: Predict

### 3.2 A Simple Neural Network Making a Prediction

### 3.3 What is a Neural Network?

### 3.4 What does this Neural Network do?

### 3.5 Making a Prediction with Multiple Inputs

### 3.6 Multiple Inputs - What does this Neural Network do?

### 3.7 Multiple Inputs - Complete Runnable Code

### 3.8 Making a Prediction with Multiple Outputs

### 3.9 Predicting with Multiple Inputs & Outputs

### 3.10 Multiple Inputs & Outputs - How does it work?

### 3.11 Predicting on Predictions

### 3.12 A Quick Primer on Numpy

### 3.13 Conclusion

## 4 Introduction to Neural Learning: Gradient Descent

### 4.1 Predict, Compare, and Learn

### 4.2 Compare

### 4.3 Learn

### 4.4 Compare: Does our network make good predictions?

### 4.5 Why measure error?

### 4.6 What’s the Simplest Form of Neural Learning?

### 4.7 Hot and Cold Learning

### 4.8 Characteristics of Hot and Cold Learning

### 4.9 Calculating Both direction and amount from error

### 4.10 One Iteration of Gradient Descent

### 4.11 Learning Is Just Reducing Error

### 4.12 Let’s Watch Several Steps of Learning

### 4.13 Why does this work? What really is weight_delta?

### 4.14 Tunnel Vision on One Concept

### 4.15 A Box With Rods Poking Out of It

### 4.16 Derivatives… take Two

### 4.17 What you really need to know…

### 4.18 What you don’t really need to know…

### 4.19 How to use a Derivative to learn

### 4.20 Look Familiar?

### 4.21 Breaking Gradient Descent

### 4.22 Visualizing the Overcorrections

### 4.23 Divergence

### 4.24 Introducing…. Alpha

### 4.25 Alpha In Code

### 4.26 Memorizing

## 5 Learning Multiple Weights at a Time: Generalizing Gradient Descent

### 5.1 Gradient Descent Learning with Multiple Inputs

### 5.2 Gradient Descent with Multiple Inputs - Explained

### 5.3 Let’s Watch Several Steps of Learning

### 5.4 Freezing One Weight - What Does It Do?

### 5.5 Gradient Descent Learning with Multiple Outputs

### 5.6 Gradient Descent with Multiple Inputs & Outputs

### 5.7 What do these weights learn?

### 5.8 Visualizing Weight Values

### 5.9 Visualizing Dot Products (weighted sums)

### 5.10 Conclusion

## 6 Building Your First "Deep" Neural Network: Introduction to Backpropagation

### 6.1 The Street Light Problem

### 6.2 Preparing our Data

### 6.3 Matrices and the Matrix Relationship

### 6.4 Creating a Matrix or Two in Python

### 6.5 Building Our Neural Network

### 6.6 Learning the whole dataset!

### 6.7 Full / Batch / Stochastic Gradient Descent

### 6.8 Neural Networks Learn Correlation

### 6.9 Up and Down Pressure

### 6.10 Up and Down Pressure (cont.)

### 6.11 Edge Case: Overfitting

### 6.12 Edge Case: Conflicting Pressure

### 6.13 Edge Case: Conflicting Pressure (cont.)

### 6.14 Learning Indirect Correlation

### 6.15 Creating Our Own Correlation

### 6.16 Stacking Neural Networks - A Review

### 6.17 Backpropagation: Long Distance Error Attribution

### 6.18 Backpropagation: Why does this work?

### 6.19 Linear vs Non-Linear

### 6.20 Why The Neural Network Still Doesn’t Work

### 6.21 The Secret to "Sometimes Correlation"

### 6.22 A Quick Break

### 6.23 Our First "Deep" Neural Network

### 6.24 Backpropagation in Code

### 6.25 One Iteration of Backpropagation

### 6.26 Putting it all together

### 6.27 Why do deep networks matter?

## 7 How to Picture Neural Networks: In Your Head and on Paper

### 7.1 It’s Time to Simplify

### 7.2 This is the key to sanely moving forward to more advanced neural networks.

### 7.3 Our Previously Overcomplicated Visualization

### 7.4 Our Simplified Visualization

### 7.5 Simplifying Even Further

### 7.6 Let’s See This Network Predict

### 7.7 Visualizing Using Letters Instead of Pictures

### 7.8 Linking Our Variables

### 7.9 Everything Side-by-Side

### 7.10 The Importance of Visualization Tools

## Learning Signal and Ignoring Noise: Introduction to Regularization & Batching

### 8.1 3 Layer Network on MNIST

### 8.2 Well… that was easy!

### 8.3 Memorization vs Generalization

### 8.4 Overfitting in Neural Networks

### 8.5 Where Overfitting Comes From

### 8.6 The Simplest Regularization: Early Stopping

### 8.7 Industry Standard Regularization: Dropout

### 8.8 Why Dropout Works: Ensembling Works

### 8.9 Dropout In Code

### 8.10 Dropout Evaluated on MNIST

### 8.11 Batch Gradient Descent

### 8.12 Batch Gradient Descent (con’t)

### 8.13 Conclusion

## 9 Modeling Probabilities and Non-Linearities: Activation Functions

### 9.1 What is an Activation Function?

### 9.2 Standard Hidden Layer Activation Functions

### 9.3 Standard Output Layer Activation Functions

### 9.4 The Core Issue: Inputs Have Similarity

### 9.5 Softmax Computation

### 9.6 Activation Installation Instructions

### 9.7 Multiplying Delta By The Slope

### 9.8 Converting Output to Slope (derivative)

### 9.9 Upgrading our MNIST Network

# Part 2: Advanced Layers and Architectures

## 10 Neural Learning About Edges and Corners: Intro to Convolutional Neural Networks

### 10.1 Re-Using Weights in Multiple Places

### 10.2 The Convolutional Layer

### 10.3 The Convolutional Layer (cont.)

### 10.4 A Simple Implementation in Numpy

### 10.5 A Simple Implementation in Numpy

### 10.6 Conclusion

## 11 Neural Networks that Understand Language: King - Man + Woman == ?

### 11.1 What does it mean to Understand Language?

### 11.2 Natural Language Processing (NLP)

### 11.3 Supervised NLP

### 11.4 IMDB Movie Reviews Dataset

### 11.5 Capturing Word Correlation in Input Data

### 11.6 Predicting Movie Reviews

### 11.7 Intro to an Embedding Layer

### 11.8 Predicting Movie Reviews

### 11.9 Interpreting the Output

### 11.10 Neural Architecture

### 11.11 Neural Architecture (cont.)

### 11.12 Comparing Word Embeddings

### 11.13 What is the Meaning of a Neuron?

### 11.14 Filling in The Blank

### 11.15 Filling in The Blank (con’t)

### 11.16 Meaning is Derived from Loss

### 11.17 Meaning is Derived from Loss (con’t)

### 11.18 Meaning is Derived from Loss (cont.)

### 11.19 King - Man + Woman ~= Queen

### 11.20 Word Analogies

### 11.21 Conclusion

## 12 Neural Networks that Write like Shakesphere: Recurrent Layers for Variable Length Data

### 12.1 The Challenge of Arbitrary Length

### 12.2 Do Comparisons Really Matter?

### 12.3 The Surprising Power of Averaged Word Vectors

### 12.4 How is Information Stored in These Embeddings?

### 12.5 How does a Neural Network Use Embeddings?

### 12.6 The Limitations of Bag-of-Words Vectors

### 12.7 Using Identity Vectors to Sum Word Embeddings

### 12.8 Matrices That Change Absolutely Nothing

### 12.9 Learning the Transition Matrices

### 12.10 Learning To Create Useful Sentence Vectors

### 12.11 Forward Propagation in Python

### 12.12 How do we Backpropagate into this?

### 12.13 Let’s Train it!

### 12.14 Setting Things Up

### 12.15 Forward Propagation with Arbitrary Length

### 12.16 Backpropagation with Arbitrary Length

### 12.17 Weight Update with Arbitrary Length

### 12.18 Execution and Output Analysis

### 12.19 Execution and Output Analysis (cont.)

### 12.20 Conclusion and Review

## 13 Introducing Automatic Optimization: Let’s Build a Deep Learning Framework

### 13.1 What is a Deep Learning Framework?

### 13.2 Introduction to Tensors

### 13.3 Introduction to Autograd

### 13.4 A Quick Checkpoint

### 13.5 Tensors That Are Used Multiple Times

### 13.6 Upgrading Autograd To Support Mult-Use Tensors

### 13.7 How does our addition backpropagation work?

### 13.8 Add Support for Negation

### 13.9 Add Support for Additional Functions

### 13.10 Add Support for Additional Functions (cont.)

### 13.11 Use Autograd to Train a Neural Network

### 13.12 Adding Automatic Optimization

### 13.13 Adding Support for Layer Types

### 13.14 Layers Which Contain Layers

### 13.15 Loss Function Layers

### 13.16 How to Learn a Framework

### 13.17 Nonlinearity Layers

### 13.18 Nonlinearity Layers (cont.)

### 13.19 The Embedding Layer

### 13.20 Add Indexing to Autograd

### 13.21 The Embedding Layer (revisited)

### 13.22 The Cross Entropy Layer

### 13.23 The Recurrent Neural Network Layer

### 13.24 The Recurrent Neural Network Layer (cont)

### 13.25 Conclusion

## 14 Learning to Write Like Shakespeare: Long-Short Term Memory

### 14.1 Character Language Modeling

### 14.2 The Need for Truncated Backpropagation

### 14.3 Truncated Backpropagation

### 14.4 Truncated Backpropagation (cont.)

### 14.5 A Sample of the Output

### 14.6 Vanishing and Exploding Gradients

### 14.7 A Toy Example of RNN Backpropagation

### 14.8 Long-Short Term Memory (LSTM) Cells

### 14.9 Some Intuition about LSTM Gates

### 14.10 The Long-Short Term Memory Layer

### 14.11 Upgrading our Character Language Model

### 14.12 Train our LSTM Character Language Model

### 14.13 Tuning our LSTM Character Language Model

### 14.14 Closing Thoughts

## 15 Deep Learning On Unseen Data: Introducing Federated Learning

### 15.1 The Problem of Privacy in Deep Learning

### 15.2 Federated Learning

### 15.3 Learning to Detect Spam

### 15.4 Let’s Make It Federated!

### 15.5 Hacking Into Federated Learning

### 15.6 Secure Aggregation

### 15.7 Homomorphic Encryption

### 15.8 Homomorphically Encrypted Federated Learning

### 15.9 Conclusion

## 16 Where to Go From Here: A Brief Guide

### 16.1 Step 1: Start Learning PyTorch

### 16.2 Step 2: Start Another Deep Learning Course

### 16.3 Step 3: Grab a "Mathy" Deep Learning Textbook

### 16.4 Step 4: Start a Blog and Teach Deep Learning

### 16.5 Step 5: Twitter?

### 16.6 Step 6: Implement Academic Papers

### 16.7 Step 7: Acquire Access to a GPU (or many)

### 16.8 Step 8: Get Paid to Practice

### 16.9 Step 9: Join an Open Source Project

### 16.10 Step 10: Develop Your Local Community

## About the Technology

Artificial Intelligence is one of the most exciting technologies of the century, and Deep Learning is in many ways the "brain" behind some of the world's smartest Artificial Intelligence systems out there.## About the book

*Grokking Deep Learning*is the perfect place to begin your deep learning journey. Rather than just learn the "black box" API of some library or framework, you will actually understand how to build these algorithms completely from scratch. You will understand how Deep Learning is able to learn at levels greater than humans. You will be able to understand the "brain" behind state-of-the-art Artificial Intelligence. Furthermore, unlike other courses that assume advanced knowledge of Calculus and leverage complex mathematical notation, if you're a Python hacker who passed high-school algebra, you're ready to go. And at the end, you'll even build an A.I. that will learn to defeat you in a classic Atari game.

## What's inside

- How neural networks "learn"
- You will build neural networks that can see and understand images
- You will build neural networks that can translate text between languages and even write like Shakespeare
- You will build neural networks that can learn how to play videogames

## About the reader

Written for readers with high school-level math and intermediate programming skills. Experience with Calculus is helpful but NOT required.## About the author

**Andrew Trask**is a PhD student at Oxford University, funded by the Oxford-DeepMind Graduate Scholarship, where he researches Deep Learning approaches with special emphasis on human language. Previously, Andrew was a researcher and analytics product manager at Digital Reasoning where he trained the world's largest artificial neural network with over 160 billion parameters, and helped guide the analytics roadmap for the Synthesys cognitive computing platform which tackles some of the most complex analysis tasks across government intelligence, finance, and healthcare industries.

**Manning Early Access Program (MEAP)**Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.

**FREE domestic shipping** on three or more pBooks