Welcome to Manning India!

We are pleased to be able to offer regional eBook pricing for Indian residents.
All eBook prices are discounted 40% or more!
Deep Reinforcement Learning in Action
Alexander Zai and Brandon Brown
  • MEAP began June 2018
  • Publication in April 2020 (estimated)
  • ISBN 9781617295430
  • 277 pages (estimated)
  • printed in black & white

I was curious about deep reinforcement learning for a while, but couldn't find anything that wasn't overloaded with math or just too simplistic. Your book is just what I was looking for!

Maxim Pankratov
Humans learn best from feedback—we are encouraged to take actions that lead to positive results while deterred by decisions with negative consequences. This reinforcement process can be applied to computer programs allowing them to solve more complex problems that classical programming cannot. Deep Reinforcement Learning in Action teaches you the fundamental concepts and terminology of deep reinforcement learning, along with the practical skills and techniques you’ll need to implement it into your own projects.
Table of Contents detailed table of contents

Part 1: Foundations

1 What is Reinforcement Learning?

1.1 The Journey Here

1.2 Supervised and Unsupervised Learning

1.3 Problem Structuring in Control Tasks

1.4 The Standard Model

1.5 What can I do with Reinforcement Learning?

1.6 Why Deep Reinforcement Learning?

1.7 Why this book?

1.8 What’s next?

1.9 Summary


2 Modeling Reinforcement Learning Problems: Markov Decision Processes

2.1 String Diagrams and our teaching methods

2.2 Solving the Multi-Arm Bandit

2.3 Applying Bandits to Optimize Ad Placements

2.4 Building Networks with PyTorch

2.5 Solving Contextual Bandits

2.6 The Markov Property

2.7 Predicting Future Rewards: Value and Policy Functions

2.8 Chapter Summary

2.9 What’s next?

3 Predicting the Best States and Actions: Deep Q-Networks

3.1 The Q-function

3.2 Navigating with Q-learning

3.3 Preventing Catastrophic Forgetting: Experience Replay

3.4 Improving Stability with a Target Network

3.5 Summary

3.6 What’s next?

4 Learning to Pick the Best Policy: Policy Gradient Methods

4.1 Policy Function using Neural Networks

4.2 Reinforcing Good Actions: The Policy Gradient Algorithm

4.3 Working with OpenAI Gym

4.4 The REINFORCE Algorithm

4.5 Summary and what’s next

5 Tackling more complex problems with Actor-Critic methods

5.1 Combining the value and policy function

5.2 Distributed Training

5.3 Advantage Actor-Critic

5.4 N-Step Actor-Critic

5.5 Summary and what’s next

Part 2: Above and beyond

6 Alternative Optimization Methods: Evolutionary Strategies

6.1 A Different Approach to Reinforcement Learning

6.2 Reinforcement Learning with Evolution Strategies

6.2.1 Fitness

6.2.2 Selecting for the Fittest Agents

6.2.3 Recombining Agents to Produce New Agents

6.2.4 Introducing Mutations

6.2.5 Evolution takes multiple generations

6.2.6 Full training loop

6.3 Pros and Cons of Evolutionary Algorithms

6.3.1 Evolutionary Algorithms Explore More

6.3.2 Evolutionary algorithms are incredibly sample intensive

6.3.3 Simulators

6.3.4 Gradient-Free Algorithms could faster to train

6.4 Evolutionary Strategies are Parallelizable

6.4.1 ES as a scalable alternative

6.4.2 Parallel vs Serial Processing

6.4.3 Scaling Efficiency

6.4.4 Communicating Between Nodes

6.4.5 Sending only two numbers

6.4.6 Seeding

6.4.7 Scaling Linearly

6.4.8 Scaling Gradient Based Approaches

6.5 Summary

7 Distributional DQN: Getting the full story

7.1 What’s wrong with Q-learning?

7.2 Probability and Statistics Revisited

7.2.1 Priors and Posteriors

7.2.2 Expectation and Variance

7.3 The Bellman Equation (Optional)

7.4 Distributional Q-learning

7.4.1 Representing a probability distribution in Python

7.4.2 Implementing the Dist-DQN

7.5 Comparing Probability Distributions

7.6 Dist-DQN on Simulated Data

7.7 Distributional Q-learning to play Freeway

7.8 Summary

8 Curiosity-Driven Exploration

8.1 Tackling Sparse Rewards with Predictive Coding

8.2 Inverse Dynamics Prediction

8.3 Setting up Super Mario Bros.

8.4 Preprocessing and the Q-network

8.5 Setting up the Q-network and Policy Function

8.6 Intrinsic Curiosity Module

8.7 Alternative Intrinsic Reward Mechanisms

8.8 Summary

9 Multi-Agent Reinforcement Learning

9.1 From one to many agents

9.2 Neighborhood Q-learning

9.3 The 1-Dimensional Ising Model

9.4 Mean Field Q-Learning and the 2D Ising Model

9.5 Mixed Cooperative-Competitive Games

9.6 Summary

10 Interpretable Reinforcement Learning: Attention and Relational Models

10.1 Machine Learning Interpretability with Attention and Relational Biases

10.1.1 Invariance and Equivariance

10.2 Relational Reasoning with Attention

10.2.1 Attention Models

10.2.2 Relational Reasoning

10.2.3 Self-Attention Models

10.3 Implementing Self-Attention for MNIST

10.3.1 Transformed MNIST

10.3.2 The Relational Module

10.3.3 Tensor Contractions and Einstein Notation

10.3.4 Training the Relational Module

10.4 Multi-Head Attention and Relational DQN

10.5 Double Q-learning

10.6 Training and Attention Visualization

10.7 Summary

11 In Conclusion: A Review and Roadmap

11.1 What did we learn?

11.2 The Uncharted Topics in Deep Reinforcement Learning

11.2.1 Prioritized Experience Replay

11.2.2 Proximal Policy Optimization (PPO)

11.2.3 Hierarchical Reinforcement Learning and the Options Framework

11.2.4 Model-Based Planning

11.2.5 Monte Carlo Tree Search (MCTS)

11.3 The End


Appendix A: Mathematics, Deep Learning, PyTorch

A.1 Mathematics of Deep Learning

A.1.1 Linear Algebra

A.1.2 Calculus

A.1.3 Deep Learning

A.1.4 PyTorch

About the Technology

Deep reinforcement learning is a form of machine learning in which AI agents learn optimal behavior from their own raw sensory input. The system perceives the environment, interprets the results of its past decisions, and uses this information to optimize its behavior for maximum long-term return. Deep reinforcement learning famously contributed to the success of AlphaGo but that’s not all it can do! More exciting applications wait to be discovered. Let’s get started.

About the book

Deep Reinforcement Learning in Action teaches you how to program agents that learn and improve based on direct feedback from their environment. You’ll build networks with the popular PyTorch deep learning framework to explore reinforcement learning algorithms ranging from Deep Q-Networks to Policy Gradients methods to Evolutionary Algorithms. As you go, you’ll apply what you know to hands-on projects like controlling simulated robots, automating stock market trades, and even building a bot to play Go.

What's inside

  • Structuring problems as Markov Decision Processes
  • Popular algorithms such Deep Q-Networks, Policy Gradient method and Evolutionary Algorithms and the intuitions that drive them
  • Applying reinforcement learning algorithms to real-world problems

About the reader

You’ll need intermediate Python skills and a basic understanding of deep learning.

About the author

Alexander Zai is a Machine Learning Engineer at Amazon AI working on MXNet that powers a suite of AWS machine learning products. Brandon Brown is a Machine Learning and Data Analysis blogger at outlace.com committed to providing clear teaching on difficult topics for newcomers.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo $49.99 pBook + eBook + liveBook
MEAP eBook $39.99 pdf + ePub + kindle + liveBook
Prices displayed in rupees will be charged in USD when you check out.

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks