Deep Reinforcement Learning in Action
Alexander Zai and Brandon Brown
  • March 2020
  • ISBN 9781617295430
  • 384 pages
  • printed in black & white

A thorough introduction to reinforcement learning. Fun to read and highly relevant.

Helmut Hauschild, PharmaTrace
Humans learn best from feedback—we are encouraged to take actions that lead to positive results while deterred by decisions with negative consequences. This reinforcement process can be applied to computer programs allowing them to solve more complex problems that classical programming cannot. Deep Reinforcement Learning in Action teaches you the fundamental concepts and terminology of deep reinforcement learning, along with the practical skills and techniques you’ll need to implement it into your own projects.

About the Technology

Deep reinforcement learning AI systems rapidly adapt to new environments, a vast improvement over standard neural networks. A DRL agent learns like people do, taking in raw data such as sensor input and refining its responses and predictions through trial and error.

About the book

Deep Reinforcement Learning in Action teaches you how to program AI agents that adapt and improve based on direct feedback from their environment. In this example-rich tutorial, you’ll master foundational and advanced DRL techniques by taking on interesting challenges like navigating a maze and playing video games. Along the way, you’ll work with core algorithms, including deep Q-networks and policy gradients, along with industry-standard tools like PyTorch and OpenAI Gym.
Table of Contents detailed table of contents

Part 1: Foundations

1 What is Reinforcement Learning?

1.1 The “deep” in deep reinforcement learning

1.2 Reinforcement learning

1.3 Dynamic programming versus Monte Carlo

1.4 The reinforcement learning framework

1.5 What can I do with reinforcement learning?

1.6 Why deep reinforcement learning?

1.7 Our didactic tool: String diagrams

1.8 What’s next?


2 Modeling reinforcement learning problems: Markov decision processes

2.1 String diagrams and our teaching methods

2.2 Solving the multi-arm bandit

2.2.1 Exploration and exploitation

2.2.2 Epsilon-greedy strategy

2.2.3 Softmax selection policy

2.3 Applying bandits to optimize ad placements

2.3.1 Contextual bandits

2.3.2 States, actions, rewards

2.4 Building networks with PyTorch

2.4.1 Automatic differentiation

2.4.2 Building Models

2.5 Solving contextual bandits

2.6 The Markov property

2.7 Predicting future rewards: Value and policy functions

2.7.1 Policy functions

2.7.2 Optimal policy

2.7.3 Value functions


3 Predicting the best states and actions: Deep Q-networks

3.1 The Q function

3.2 Navigating with Q-learning

3.2.1 What is Q-learning?

3.2.2 Tackling Gridworld

3.2.3 Hyperparameters

3.2.4 Discount factor

3.2.5 Building the network

3.2.6 Introducing the Gridworld game engine

3.2.7 A neural network as the Q function

3.3 Preventing catastrophic forgetting: Experience replay

3.3.1 Catastrophic forgetting

3.3.2 Experience replay

3.4 Improving stability with a target network

3.4.1 Learning instability

3.5 Review


4 Learning to pick the best policy: Policy gradient methods

4.1 Policy function using neural networks

4.1.1 Neural network as the policy function

4.1.2 Stochastic policy gradient

4.1.3 Exploration

4.2 Reinforcing good actions: The policy gradient algorithm

4.2.1 Defining an objective

4.2.2 Action reinforcement

4.2.3 Log probability

4.2.4 Credit assignment

4.3 Working with OpenAI Gym

4.3.1 CartPole

4.3.2 The OpenAI Gym API

4.4 The REINFORCE algorithm

4.4.1 Creating the policy network

4.4.2 Having the agent interact with the environment

4.4.3 Training the model

4.4.4 The full training loop

4.4.5 Chapter conclusion


5 Tackling more complex problems with actor-critic methods

5.1 Combining the value and policy function

5.2 Distributed training

5.3 Advantage actor-critic

5.4 N-step actor-critic


Part 2: Above and beyond

6 Alternative optimization methods: Evolutionary algorithms

6.1 A Different Approach to Reinforcement Learning

6.2 Reinforcement learning with evolution strategies

6.2.1 Evolution in theory

6.2.2 Evolution in practice

6.3 A genetic algorithm for CartPole

6.4 Pros and Cons of Evolutionary Algorithms

6.4.1 Evolutionary algorithms explore more

6.4.2 Evolutionary algorithms are incredibly sample intensive

6.4.3 Simulators

6.5 Evolutionary algorithms as a scalable alternative

6.5.1 Scaling evolutionary algorithms

6.5.2 Parallel vs. serial processing

6.5.3 Scaling efficiency

6.5.4 Communicating between nodes

6.5.5 Scaling linearly

6.5.6 Scaling gradient-based approaches


7 Distributional DQN: Getting the full story

7.1 What’s wrong with Q-learning?

7.2 Probability and statistics revisited

7.2.1 Priors and posteriors

7.2.2 Expectation and variance

7.3 The Bellman equation

7.3.1 The distributional Bellman equation

7.4 Distributional Q-learning

7.4.1 Representing a probability distribution in Python

7.4.2 Implementing the Dist-DQN

7.5 Comparing probability distributions

7.6 Dist-DQN on simulated data

7.7 Using distributional Q-learning to play Freeway


8 Curiosity-driven exploration

8.1 Tackling sparse rewards with predictive coding

8.2 Inverse dynamics prediction

8.3 Setting up Super Mario Bros.

8.4 Preprocessing and the Q-network

8.5 Setting up the Q-network and policy function

8.6 Intrinsic curiosity module

8.7 Alternative intrinsic reward mechanisms


9 Multi-agent reinforcement learning

9.1 From one to many agents

9.2 Neighborhood Q-learning

9.3 The 1D Ising model

9.4 Mean field Q-learning and the 2D Ising model

9.5 Mixed cooperative-competitive games


10 Interpretable reinforcement learning: Attention and relational models

10.1 Machine learning interpretability with attention and relational biases

10.1.1 Invariance and equivariance

10.2 Relational reasoning with attention

10.2.1 Attention models

10.2.2 Relational reasoning

10.2.3 Self-attention models

10.3 Implementing self-attention for MNIST

10.3.1 Transformed MNIST

10.3.2 The relational module

10.3.3 Tensor contractions and Einstein notation

10.3.4 Training the relational module

10.4 Multi-head attention and relational DQN

10.5 Double Q-learning

10.6 Training and attention visualization

10.6.1 Maximum entropy learning

10.6.2 Curriculum learning

10.6.3 Visualizing attention weights


11 In conclusion: A review and roadmap

11.1 What did we learn?

11.2 The uncharted topics in deep reinforcement learning

11.2.1 Prioritized experience replay

11.2.2 Proximal policy optimization (PPO)

11.2.3 Hierarchical reinforcement learning and the options framework

11.2.4 Model-based planning

11.2.5 Monte Carlo tree search (MCTS)

11.3 The End


Appendix A: Mathematics, deep learning, PyTorch

A.1 Mathematics of Deep Learning

A.1.1 Linear Algebra

A.1.2 Calculus

A.1.3 Deep Learning

A.1.4 PyTorch

What's inside

  • Building and training DRL networks
  • The most popular DRL algorithms for learning and problem solving
  • Evolutionary algorithms for curiosity and multi-agent learning
  • All examples available as Jupyter Notebooks

About the reader

For readers with intermediate skills in Python and deep learning.

About the author

Alexander Zai is a machine learning engineer at Amazon AI. Brandon Brown is a machine learning and data analysis blogger.

placing your order...

Don't refresh or navigate away from the page.
print book $49.99 pBook + eBook + liveBook
Additional shipping charges may apply
Deep Reinforcement Learning in Action (print book) added to cart
continue shopping
go to cart

eBook $39.99 3 formats + liveBook
Deep Reinforcement Learning in Action (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks