Reinforcement Learning in Motion
Phil Tabor
  • Course duration: 3h 3m
    Estimated full duration: 6h
    12 exercises
  • MEAP began July 2018
  • Publication in October 2018 (estimated)
We all learn by interacting with the world around us, constantly experimenting and interpreting the results. Reinforcement learning is a machine learning technique that follows this same explore-and-learn approach. Ideally suited to improve applications like automatic controls, simulations, and other adaptive systems, a RL algorithm takes in data from its environment and improves its accuracy based on the positive and negative outcomes of these interactions. This liveVideo course will get you started!
Table of Contents detailed table of contents

Introduction to reinforcement learning

Course introduction

Getting acquainted with machine learning

How reinforcement learning fits in

Required software

Key concepts

Understanding the agent

Defining the environment

Designing the reward

How the agent learns

Choosing actions

Coding the environment

Finishing the maze-running robot problem

Beating the casino: the explore-exploit dilemma

Introducing the multi-armed bandit problem

Action-value methods

Coding the multi-armed bandit test bed

Moving the goal posts: Nonstationary problems

Optimistic initial values and upper confidence bound action selection

Wrapping up the explore-exploit dilemma

Skating the frozen lake: Markov decision processes

Introducing Markov decision processes and the frozen lake environment

Even robots have goals

Handling uncertainty with policies and value functions

Achieving mastery: Optimal policies and value functions

Skating off the frozen lake

Navigating Gridworld with dynamic programming

Crash-landing on planet Gridworld

Let’s make a plan: Policy evaluation in Gridworld

The best laid plans: Policy improvement in the Gridworld

Hastening our escape with policy iteration

Creating a backup plan with value iteration

Wrapping up dynamic programming

Navigating the windy Gridworld with Monte Carlo methods

The windy Gridworld problem

Monte who?

No substitute for action: Policy evaluation with Monte Carlo methods

Monte Carlo control and exploring starts

Monte Carlo control without exploring starts

Off-policy Monte Carlo methods

Return to the frozen lake and wrapping up Monte Carlo methods

Balancing the cart pole: temporal difference learning

The cart pole problem

TD(0) prediction

On-policy TD control: SARSA

Off-policy TD control: Q learning

Back to school with double learning

Wrapping up temporal difference learning

Climbing the mountain with approximation methods

The continuous mountain car problem

Why approximation methods?

Stochastic gradient descent: The intuition

Stochastic gradient descent: The mathematics

Approximate Monte Carlo predictions

Linear methods and tiling

TD(0) semi-gradient prediction

Episodic semi-gradient control: SARSA

Over the hill: wrapping up approximation methods and the mountain car problem

Summary

Course recap

The frontiers of reinforcement learning

What to do next

About the subject

With reinforcement learning, an AI agent learns from its environment, constantly responding to the feedback it gets. The agent optimizes its behavior to avoid negative consequences and enhance positive outcomes. The resulting algorithms are always looking for the most positive and efficient outcomes!

Importantly, with reinforcement learning you don’t need a mountain of data to get started. You just let your AI agent poke and prod its environment, which makes it much easier to take on novel research projects without well-defined training datasets.

About the video

Reinforcement Learning in Motion introduces you to the exciting world of machine systems that learn from their environments! Developer, data scientist, and expert instructor Phil Tabor guides you from the basics all the way to programming your own constantly-learning AI agents. In this course, he’ll break down key concepts like how RL systems learn, how to sense and process environmental data, and how to build and train AI agents. As you learn, you’ll master the core algorithms and get to grips with tools like Open AI Gym, numpy, and Matplotlib.

Reinforcement systems learn by doing, and so will you in this interactive, hands-on course! You’ll build and train a variety of algorithms as you go, each with a specific purpose in mind. The rich and interesting examples include simulations that train a robot to escape a maze, help a mountain car get up a steep hill, and balance a pole on a sliding cart. You’ll even teach your agents how to navigate Windy Gridworld, a standard exercise for finding the optimal path even with special conditions!

Prerequisites

You’ll need to be familiar with Python and machine learning basics. Examples use Python libraries like NumPy and Matplotlib. You'll also need some understanding of linear algebra and calculus, please see the equations in the Free Downloads section for examples.

What you will learn

  • What is a reinforcement learning agent?
  • An introduction to the Open AI Gym
  • Identifying appropriate algorithms
  • Implementing RL algorithms using Numpy
  • Visualizing performance with Matplotlib

About the instructor

Phil Tabor is a lifelong coder with a passion for simplifying and teaching complex topics. A physics PhD and former Intel process engineer, he works as a data scientist, teaches machine learning on YouTube, and contributes to Sensenet, an open source project using deep reinforcement learning to teach robots to identify objects by touch.

Manning Early Access Program (MEAP) Watch raw videos as they are added, and get the entire course, complete with transcript and exercises, when it is finished.