Welcome to Manning India!

We are pleased to be able to offer regional eBook pricing for Indian residents.
All eBook prices are discounted 40% or more!
Grokking Deep Reinforcement Learning
Miguel Morales
  • MEAP began May 2018
  • Publication in Spring 2020 (estimated)
  • ISBN 9781617295454
  • 450 pages (estimated)
  • printed in black & white

The must-have book, for anyone that wants to have a profound understanding of deep reinforcement learning.

Julien Pohie

We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore and learn based on the responses of the environment.

Grokking Deep Reinforcement Learning introduces this powerful machine learning approach, using examples, illustrations, exercises, and crystal-clear teaching. You'll love the perfectly paced teaching and the clever, engaging writing style as you dig into this awesome exploration of reinforcement learning fundamentals, effective deep learning techniques, and practical applications in this emerging field.

Table of Contents detailed table of contents

1 Introduction to deep reinforcement learning

1.1 What is deep reinforcement learning?

1.1.1 Deep reinforcement learning is a machine learning approach to artificial intelligence

1.1.2 Deep reinforcement learning is concerned with creating computer programs

1.1.3 Deep reinforcement learning agents can solve problems that require intelligence

1.1.4 Deep reinforcement learning agents improve their behavior through trial-and-error learning

1.1.5 Deep reinforcement learning agents learn from sequential feedback

1.1.6 Deep reinforcement learning agents learn from evaluative feedback

1.1.7 Deep reinforcement learning agents learn from sampled feedback

1.1.8 Deep reinforcement learning agents utilize powerful non-linear function approximators

1.2 The past, present, and future of deep reinforcement learning

1.2.1 Recent history of artificial intelligence and deep reinforcement learning

1.2.2 Artificial intelligence winters

1.2.3 The current state of artificial intelligence

1.2.4 Progress in deep reinforcement learning

1.2.5 Opportunities ahead

1.3 The suitability of deep reinforcement learning

1.3.1 What are the pros and cons?

1.3.2 Deep reinforcement learning’s strengths

1.3.3 Deep reinforcement learning’s weaknesses

1.4 Setting clear two-way expectations

1.4.1 What to expect from the book?

1.4.2 How to get the most out of the book?

1.4.3 Deep reinforcement learning development environment

1.5 Summary

2 Mathematical foundations of reinforcement learning

2.1 Components of reinforcement learning

2.1.1 Examples of problems, agents, and environments

2.1.2 The agent

2.1.3 The environment

2.1.4 Agent-environment interactions

2.2 Components of the environment

2.2.1 Environment states

2.2.2 Available actions

2.2.3 Consequences of agent actions

2.2.4 Reinforcement signal

2.2.5 Time

2.2.6 Extensions to MDPs

2.2.7 Putting it all together

2.3 Summary

3 Balancing immediate and long-term goals

3.1 Objective of a decision-making agent

3.1.1 Policies of action

3.1.2 Value of state

3.1.3 Value of taking an action

3.1.4 Advantage of taking an action

3.1.5 Optimality

3.1.6 Evaluating policies of action

3.1.7 Improving policies of behavior

3.1.8 Improving upon improved policies

3.1.9 Improving policies early

3.2 Summary

4 Balancing the gathering and utilization of information

4.1 The challenge of interpreting evaluative feedback

4.1.1 Single state decision problem

4.1.2 Maximizing reward while minimizing regret

4.1.3 Approaches to solving MAB environments

4.1.4 Be greedy and always exploit

4.1.5 Learn forever and avoid the real world

4.1.6 Almost always pick the action with the highest value

4.1.7 First maximize exploration, then maximize exploitation

4.1.8 Start off believing it’s a wonderful world

4.2 Strategic exploration

4.2.1 Select actions randomly in proportion to their estimates

4.2.2 It’s not about just optimism; it’s about realistic optimism

4.2.3 Balancing reward and risk

4.3 Summary

5 Estimating the value of agents' behaviors

5.1 Learning to estimate policies

5.1.1 Learning to predict using complete episodes

5.1.2 First- and every-visit Monte-Carlo prediction

5.1.3 Learning to predict on every time step

5.2 Learn more often and more accurately

5.2.1 Learning to predict from any time step

5.2.2 Learning to predict from all time steps

5.2.3 Learning to predict from all time steps on every time step

5.3 Summary

6 Improving agents’ behaviors

7 Achieving goals more effectively and efficiently

8 Introduction to value-based deep reinforcement learning

8.1 The kind of feedback a deep reinforcement learning agent deals with

8.1.1 Deep reinforcement learning deals with sequential feedback

8.1.2 But, if it is not sequential, what is it?

8.2 Deep reinforcement learning deals with evaluative feedback

8.2.1 But, if it is not evaluative, what is it?

8.2.2 Deep reinforcement learning deals with sampled feedback

8.2.3 But, if it is not sampled, what is it?

8.2.4 Deep reinforcement learning deals with the most challenging sides of all dimensions

8.3 Introduction to value-function approximation

8.3.1 What’s a high-dimensional state space?

8.3.2 How about continuous state space?

8.3.3 But, why to use a function approximator?

8.4 NFQ: A first attempt to value-based deep reinforcement learning

8.4.1 First decision point: Selecting a value function to approximate

8.4.2 Second decision point: Selecting a neural network architecture

8.4.3 Third decision point: Selecting what to optimize

8.4.4 Fourth decision point: Targets for policy evaluation

8.4.5 Fifth decision point: Balancing exploration and exploitation

8.4.6 Sixth decision point: Selecting a loss function

8.4.7 Seventh decision point: Selecting an optimization method to minimize the loss function

8.4.8 Regrets: Things that could (and do) go wrong

8.5 Summary

9 More stable value-based methods

9.1 DQN: Making reinforcement learning more like supervised learning

9.1.1 Common problems in value-based deep reinforcement learning

9.1.2 Using a target network

9.1.3 Use larger networks

9.1.4 Experience Replay

9.1.5 Using other exploration strategies

9.2 Double DQN: Mitigating the overestimation of approximate action-value functions

9.2.1 The problem of overestimation

9.2.2 Separating action selection and action evaluation

9.2.3 A solution

9.2.4 A more practical solution

9.2.5 A more forgiving loss function

9.2.6 Things we can still improve on

9.3 Summary

10 Sample-efficient value-based methods

10.1 Dueling DDQN: A reinforcement-learning-aware neural network architecture

10.1.1 Reinforcement learning is not a supervised learning problem

10.1.2 Value-based deep reinforcement learning methods nuances

10.1.3 Advantage of using advantages

10.1.4 A reinforcement-learning-aware architecture

10.1.5 Building a dueling network

10.1.6 Reconstructing the action-value function

10.1.7 Continuously updating the target network

10.1.8 What does the dueling network bring to the table?

10.2 PER: Prioritizing the replay of important experiences

10.2.1 A smarter way to replay experiences

10.2.2 Then, what is a good measure of "important" experiences?

10.2.3 Sampling by TD error

10.2.4 Prioritizing errors stochastically

10.2.5 Proportional prioritization

10.2.6 Rank-based prioritization

10.2.7 Prioritization bias

10.3 Summary

11 Policy-based and actor-critic methods

12 Towards artificial general intelligence

About the Technology

Deep reinforcement learning is a form of machine learning in which AI agents learn optimal behavior on their own from raw sensory input. The system perceives the environment, interprets the results of its past decisions and uses this information to optimize its behavior for maximum long-term return. It has been said that deep reinforcement learning, which is the use of deep learning and reinforcement learning techniques to solve problems decision-making problems, is the solution to the full artificial intelligence problem.

Deep reinforcement learning famously contributed to the success of AlphaGo and all its successors (AlphaGo, AlphaGo Zero and AlphaZero, etc), which recently beat the world’s best human player in the world’s most difficult board game. But, that is not the only thing you can do with deep reinforcement learning. These are some of the most notable applications:

Learn to play ATARI games just by looking at the raw image.
Learn to trade and manage portfolios effectively.
Learn low-level control policies for a variety of real-world models.
Discover tactics and collaborative behavior for improved campaign performance.
From low-level control, to high-level tactical actions, deep reinforcement learning can solve large, complex decision-making problems.

But, deep reinforcement learning is an emerging approach, so the best ideas are still yours to discover. We can’t wait to see how you apply deep reinforcement learning to solve some of the most challenging problems in the world.

About the book

Grokking Deep Reinforcement Learning is a beautifully balanced approach to teaching, offering numerous large and small examples, annotated diagrams and code, engaging exercises, and skillfully crafted writing. You'll explore, discover, and learn as you lock in the ins and outs of reinforcement learning, neural networks, and AI agents. You will go from small grid world environments and some of the foundational algorithms to some of the most challenging environments out there today and cutting-edge techniques to solve these environments.

Exciting, fun, and maybe even a little dangerous. Let's get started!

What's inside

  • Foundational reinforcement learning concepts and methods
  • The most popular deep reinforcement learning agents solving high-dimensional environments
  • Cutting-edge agents that emulate human-like behavior and techniques for artificial general intelligence

About the reader

Written for developers with some understanding of deep learning algorithms. Experience with reinforcement learning is not required. Perfect for readers of Deep Learning in Python or Grokking Deep Learning.

About the author

Miguel Morales is a Senior Software Engineer at Lockheed Martin, Missile and Fire Control-Autonomous Systems. He is also a faculty member at Georgia Institute of Technology where he works as an Instructional Associate for the Reinforcement Learning and Decision Making graduate course. Miguel has worked for numerous other educational and technology companies including Udacity, AT&T, Cisco, and HPE.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo $49.99 pBook + eBook + liveBook
MEAP eBook $39.99 pdf + ePub + kindle + liveBook
Prices displayed in rupees will be charged in USD when you check out.

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks