table of content

1 Intuition of AI

1.1 What is Artificial Intelligence (AI)?

1.1.1 Defining AI

1.1.2 Data is the fuel for AI algorithms

1.1.3 Algorithms are like recipes

1.1.4 Algorithms vs. models

1.2 The evolution of AI

1.3 Different types of problems

1.3.1 Search problems: Find a path to a solution

1.3.2 Optimization problems: Find a good solution

1.3.3 Prediction and classification problems: Learn from patterns in data

1.3.4 Clustering problems: Identify patterns in data

1.3.5 Deterministic models: Same result each time it’s calculated

1.3.6 Probabilistic models: Potentially different result each time it’s calculated

1.4 Intuition of AI concepts

1.4.1 Narrow intelligence: Specific-purpose solutions

1.4.2 General intelligence: Humanlike solutions

1.4.3 Super intelligence: The great unknown

1.4.4 Old AI and new AI

1.4.5 Search algorithms

1.4.6 Biology-inspired algorithms

1.4.7 Machine learning algorithms

1.4.8 Deep learning algorithms

1.4.9 Generative models

1.5 Some uses for AI algorithms

1.5.1 Agriculture: Optimizing plant growth

1.5.2 Banking: Preventing fraudulent transactions

1.5.3 Cybersecurity: Safeguarding email inboxes

1.5.4 Health care: Diagnosing patients

1.5.5 Logistics: Finding the best delivery route

1.5.6 Fitness and Health: Optimizing your body

1.5.7 Games: Adapting in complexity

1.6 Summary of Intuition of AI

2 Search fundamentals

2.1 What are planning and searching?

2.2 Cost of computation: The reason for smart algorithms

2.3 Problems applicable to searching algorithms

2.4 Representing state: Creating a framework to represent problem spaces and solutions

2.4.1 Graphs: Representing search problems and solutions

2.4.2 Representing a graph as a concrete data structure

2.4.3 Trees: The concrete structures used to represent search solutions

2.5 Uninformed search: Looking blindly for solutions

2.6 Breadth-first search: Looking wide before looking deep

2.7 Depth-first search: Looking deep before looking wide

2.8 Use cases for uninformed search algorithms

2.9 Optional: More about graph categories

2.10 Optional: More ways to represent graphs

2.10.1 Incidence matrix

2.10.2 Adjacency list

2.11 Summary of search fundamentals

3 Intelligent search

3.1 Defining heuristics: Designing educated guesses

3.2 Informed search: Looking for solutions with guidance

3.2.1 A* search

3.2.2 Use cases for informed search algorithms

3.3 Adversarial search: Looking for solutions in a changing environment

3.3.1 A simple adversarial problem

3.3.2 Min-max search: Simulate actions and choose the best future

3.3.3 Alpha-beta pruning: Optimize by exploring the sensible paths only

3.3.4 Use cases for adversarial search algorithms

3.4 Summary of Intelligent search

4 Evolutionary algorithms

4.1 What is evolution?

4.2 Problems applicable to evolutionary algorithms

4.3 Genetic algorithm: Life cycle

4.4 Encoding the solution spaces

4.4.1 Binary encoding: Representing possible solutions with zeros and ones

4.5 Creating a population of solutions

4.6 Measuring fitness of individuals in a population

4.7 Selecting parents based on their fitness

4.7.1 Steady state: Replacing a portion of the population each generation

4.7.2 Generational: Replacing the entire population each generation

4.7.3 Roulette wheel: Selecting parents and surviving individuals

4.8 Reproducing individuals from parents

4.8.1 Single-point crossover: Inheriting one part from each parent

4.8.2 Two-point crossover: Inheriting more parts from each parent

4.8.3 Uniform crossover: Inheriting many parts from each parent

4.8.4 Bit-string mutation for binary encoding

4.8.5 Flip-bit mutation for binary encoding

4.9 Populating the next generation

4.9.1 Exploration vs. exploitation

4.9.2 Stopping conditions

4.10 Configuring the parameters of a genetic algorithm

4.11 Use cases for evolutionary algorithms

4.12 Summary of evolutionary algorithms

5 Advanced evolutionary approaches

5.1 Evolutionary algorithm life cycle

5.2 Alternative selection strategies

5.2.1 Rank selection: Even the playing field

5.2.2 Tournament selection: Let them fight

5.2.3 Elitism selection: Choose only the best

5.3 Real-value encoding: Working with real numbers

5.3.1 Real-value encoding at its core

5.3.2 Arithmetic crossover: Reproduce with math

5.3.3 Boundary mutation

5.3.4 Arithmetic mutation

5.4 Order encoding: Working with sequences

5.4.1 Importance of the fitness function

5.4.2 Order encoding at its core

5.4.3 Order mutation: Order / permutation encoding

5.5 Tree encoding: Working with hierarchies

5.5.1 Tree encoding at its core

5.5.2 Tree crossover: Inheriting portions of a tree

5.5.3 Change node mutation: Changing the value of a node

5.6 Common types of evolutionary algorithms

5.6.1 Genetic programming

5.6.2 Evolutionary programming

5.7 Glossary of evolutionary algorithm terms

5.8 More use cases for evolutionary algorithms

5.9 Summary of advanced evolutionary approaches

6 Swarm intelligence: Ants

6.1 What is swarm intelligence?

6.2 Problems applicable to ant colony optimization

6.3 Representing state: What do paths and ants look like?

6.4 The ant colony optimization algorithm life cycle

6.4.1 Initialize the pheromone trails

6.4.2 Set up the population of ants

6.4.3 Choose the next visit for each ant

6.4.4 Update the pheromone trails

6.4.5 Update the best solution

6.4.6 Determine the stopping criteria

6.5 Use cases for ant colony optimization algorithms

6.6 Summary of ant colony optimization

7 Swarm intelligence: Particles

7.1 What is particle swarm optimization?

7.2 Optimization problems: A slightly more technical perspective

7.3 Problems applicable to particle swarm optimization

7.4 Representing state: What do particles look like?

7.5 Particle swarm optimization life cycle

7.5.1 Initialize the population of particles

7.5.2 Calculate the fitness of each particle

7.5.3 Update the position of each particle

7.5.4 Determine the stopping criteria

7.6 Use cases for particle swarm optimization algorithms

7.7 Summary of particle swarm optimization

8 Machine learning

8.1 What is machine learning?

8.2 Problems applicable to machine learning

8.2.1 Supervised learning

8.2.2 Unsupervised learning

8.2.3 Reinforcement learning

8.3 A machine learning workflow

8.3.1 Collecting and understanding data: Know your context

8.3.2 Preparing data: Clean and wrangle

8.3.3 Training a model: Predict with linear regression

8.3.4 Testing the model: Determine the accuracy of the model

8.3.5 Improving accuracy

8.4 Classification with decision trees

8.4.1 Classification problems: Either this or that

8.4.2 The basics of decision trees

8.4.3 Training decision trees

8.4.4 Classifying examples with decision trees

8.5 Other popular machine learning algorithms

8.6 Use cases for machine learning algorithms

8.7 Summary of machine learning

9 Artificial neural networks

9.1 What are artificial neural networks?

9.2 The Perceptron: A representation of a neuron

9.3 Defining artificial neural networks

9.4 Forward propagation: Using a trained ANN

9.5 Backpropagation: Training an ANN

9.5.1 Phase A: Setup

9.5.2 Phase B: Forward propagation

9.5.3 Phase C: Training

9.6 Options for activation functions

9.7 Designing artificial neural networks

9.7.1 Inputs and outputs

9.7.2 Hidden layers and nodes

9.7.3 Weights

9.7.4 Bias

9.7.5 Activation functions

9.7.6 Cost function

9.7.7 Learning rate

9.8 Expressing ANNs mathematically

9.8.1 The weighted sum as a dot product

9.8.2 The hidden layer as matrix multiplication

9.8.3 Adding the activation function

9.8.4 The output layer

9.8.5 The final neural network equation

9.8.6 The cost function

9.8.7 Expressing backpropagation mathematically

9.9 Artificial neural network types and use cases

9.9.1 Recurrent neural network

9.9.2 Convolutional neural network

9.9.3 Generative adversarial network

9.10 Summary of artificial neural networks

10 Reinforcement learning

10.1 What is reinforcement learning?

10.1.1 The inspiration for reinforcement learning

10.2 Problems applicable to reinforcement learning

10.3 The life cycle of reinforcement learning

10.3.1 Simulation and data: Make the environment come alive

10.3.2 Training with the simulation using Q-learning

10.3.3 Testing with the simulation and Q-table

10.3.4 Measuring the performance of training

10.3.5 Model-free and model-based learning

10.4 Deep learning approaches to reinforcement learning

10.4.1 Training with an artificial neural network

10.5 Use cases for reinforcement learning

10.5.1 Robotics

10.5.2 Recommendation engines

10.5.3 Financial trading

10.5.4 Game playing

10.6 Summary of reinforcement learning

11 Large Language Models (LLMs)

11.1 What are large language models?

11.2 The intuition behind language prediction

11.2.1 Why the size of tokens and parameters matter

11.2.2 An LLM training workflow

11.3 Preparing training data

11.3.1 Selecting and collecting data

11.3.2 Cleaning and preprocessing data

11.4 Encoding: From text to numbers

11.4.1 Tokenization

11.4.2 Vectorization

11.5 Designing the ANN architecture (And why transformers)

11.6 Encoding: Creating trainable embeddings

11.6.1 Sampling a batch of tokens

11.6.2 Creating a trainable embedding matrix

11.6.3 Creating positional encodings

11.6.4 Combining the embedding matrix and positional encodings

11.7 Self-attention: Start training the LLM

11.7.1 Linear weight matrix projections

11.7.2 Ask every other token

11.7.3 Calculating attention weights

11.7.4 Weighted sum

11.7.5 Multiple attention heads

11.7.6 Layer normalization

11.8 Decoding: Meaning through neural networks

11.8.1 Project up layer

11.8.2 Project down layer

11.8.3 Layer normalization

11.8.4 Stacking Transformer blocks

11.8.5 Making a prediction

11.8.6 Backpropagation and calculating loss

11.9 Controlling the LLM

11.9.1 Training epochs

11.9.2 Saving checkpoints

11.9.3 Stopping mechanisms

11.9.4 Hyperparameter tuning

11.9.5 Few-shot and zero-shot learning

11.10 Refining LLMs with Reinforcement Learning

11.11 LLMs and Mixture of Experts (MoE)

11.12 LLMs and Retrieval-Augmented Generation (RAG)

11.13 Use cases for large language models

11.13.1 Content generation

11.13.2 Information synthesis

11.13.3 Coding assistants

11.13.4 Enhancing digital products

11.14 Summary of Large Language Models

12 Generative Image Models

12.1 What are generative image models?

12.2 The intuition behind image generation

12.2.1 A generative image model training workflow

12.3 Preparing image training data

12.3.1 Selecting and collecting image data

12.3.2 Cleaning and preprocessing image data

12.4 Embedding: From images to numbers

12.5 Designing the architecture (and why U-Nets)

12.5.1 Convolutional Neural Networks (CNNs)

12.5.2 The U-Net (A specialized CNN)

12.6 Denoising: From numbers to an image

12.6.1 Encoder: Down-sampling layers

12.6.2 Bridge (also known as the bottleneck)

12.6.3 Decoder: Up-sampling layers

12.7 Learning: Calculating loss and backpropagation

12.7.1 Calculating loss

12.7.2 Backpropagation

12.8 Generating an image

12.8.1 Starting with a blank canvas (of pure noise)

12.8.2 Denoising the data

12.9 Controlling the diffusion model

12.9.1 Training data composition and diversity

12.9.2 Timesteps and noise schedule

12.9.3 Attention layers and cross-attention injection

12.9.4 Training epochs

12.10 Inpainting and Outpainting

12.11 LoRA (Low-Rank Adaptation)

12.12 High-Resolution Fix and Upscalers

12.13 ControlNets and IP-Adapters

12.14 Refining Aesthetics with Human Feedback

12.15 Use cases for image generation

12.15.1 Creative ideation and concept art

12.15.2 Commercial design and advertising

12.15.3 Content creation and media

12.15.4 Personalization and Photo Editing

12.16 Summary of Generative Image Models

Overview

10 Reinforcement learning

Reinforcement learning is presented as a trial‑and‑error approach to decision making inspired by behavioral psychology, where an agent interacts with an environment and learns through rewards and penalties to maximize long‑term cumulative return. Unlike supervised and unsupervised learning, it does not rely on labeled datasets or purely pattern discovery; instead, it learns action sequences that achieve a known goal, balancing short‑term gains against long‑term outcomes. Time, order of actions, and feedback loops are central, and the overall process is framed with the Markov Decision Process to quantify states, actions, transitions, and rewards.

The chapter grounds these ideas in a simulated parking‑lot scenario, defining clear states, actions, rewards, and terminal conditions, and shows how design choices in the simulator shape what the agent can learn. It develops Q‑learning as a model‑free method that stores estimated action values in a Q‑table and improves them via an exploration–exploitation strategy, a learning rate, and a discount factor using a Bellman‑style update across many episodes so that value propagates backward from goals. Practical considerations include crafting reward functions to avoid perverse behaviors, choosing state representations that generalize beyond a single map, measuring progress via penalties or average reward, and understanding the trade‑offs between model‑free and model‑based approaches.

To scale beyond tabular methods, the chapter introduces deep reinforcement learning, where neural networks approximate Q‑values directly from state inputs and are trained with temporal‑difference targets and backpropagation. It compares compact scalar encodings with more expressive one‑hot encodings of local surroundings, outlines sensible network architectures and ReLU activations, and highlights techniques such as randomized starts and the evolving shift from exploration to exploitation. The discussion closes with impactful use cases—robotics, recommendation systems, financial trading, and game playing—emphasizing the need for realistic simulators and carefully designed reward signals so agents can learn robust, long‑horizon strategies in complex, dynamic environments.

How reinforcement learning fits into machine learning

Categorization of machine learning, deep learning, and reinforcement learning

Example of reinforcement learning: teaching a dog to sit by using food as a reward

An example of possible actions that have long-term consequences

The self-driving car in a parking lot problem

The Markov Decision Process for reinforcement learning

Agent actions in the parking-lot environment

Rewards due to specific events in the environment due to actions performed

A bad solution to the parking-lot problem

A good solution to the parking-lot problem

An example Q-table and states that it represents

A better example of a Q-table and states that it represents

Life cycle of a Q-learning reinforcement learning algorithm

An example initialized Q-table

Example Q-table update calculation for state 1

Example Q-table update calculation for state 2

Example Q-table update calculation for state 1 after several iterations

Referencing a Q-table to determine what action to take

Examples of model-based and model-free reinforcement learning

The difference between using a Q-table and ANN for the parking-lot problem

Example 8 input ANN for the parking-lot problem

Example 32 input ANN for the parking-lot problem

Summary of reinforcement learning

FAQ

What is reinforcement learning, and how does it differ from supervised and unsupervised learning?

Reinforcement learning (RL) is inspired by behavioral psychology. Instead of learning from labeled examples (supervised) or discovering structure in unlabeled data (unsupervised), an RL agent learns by interacting with an environment, taking actions, and receiving rewards or penalties. The goal is to learn a policy that maximizes cumulative (long-term) reward.

When is reinforcement learning a good fit for a problem?

RL is best when you know the goal and the allowed actions, but not the best sequence of actions to achieve the goal. It excels in sequential decision-making where actions compound over time, such as strategic planning, robotics, and industrial control, and where cumulative reward matters more than individual step rewards.

What are the key terms: agent, environment, state, action, and reward?

- Agent: The decision-maker (e.g., the car).
- Environment: The world the agent interacts with (e.g., the parking lot).
- State: The situation or observation at a point in time (e.g., agent’s position and nearby cells).
- Action: A choice the agent makes (e.g., move north/south/east/west).
- Reward: Feedback after an action (positive for good outcomes, negative for bad ones).

How do short-term rewards vs. long-term rewards factor into RL?

RL aims to maximize long-term reward, not just immediate gains. The discount factor (gamma) controls how much the agent values future rewards. A low gamma favors instant gratification; a high gamma encourages planning ahead by valuing future returns.

What is the Markov Decision Process (MDP) in the context of RL?

An MDP formalizes decision-making where outcomes are partly random and partly under the agent’s control. It provides the framework for the RL loop: observe state, choose action, receive reward, transition to next state, and repeat over episodes until a terminal condition is reached.

How does Q-learning work, and what is a Q-table?

Q-learning is a model-free RL method that learns a Q-table mapping state-action pairs to expected long-term value (Q-values). During training, after each action, the Q-value is updated using a Bellman-style update that blends the immediate reward with the best estimated future value. Key hyperparameters are learning rate (how fast values change), discount factor (future vs. immediate rewards), and exploration rate (chance of taking a random action).

What does the simulator need to provide for RL training?

A useful simulator should: reset to a start state; report the current state; apply an action; compute the reward and next state; and signal when the goal or episode end is reached. Careful reward design is critical to avoid unintended behaviors (e.g., looping moves to farm rewards).

How should state be represented, and why does it matter?

State design affects generalization. Using the full map can overfit to one layout. A more general approach is to encode local observations (e.g., the 8 surrounding cells), which enables the agent to learn short-term good moves that transfer across layouts. However, this can greatly increase the number of possible states, requiring more training.

How do we evaluate RL performance?

Evaluation is task-specific, but common metrics include: average reward per action (higher is better), counts of penalties (fewer is better), and success rate across episodes. An episode spans from the initial state to either reaching the goal or a stopping condition.

When should we move from Q-tables to deep reinforcement learning?

Q-tables don’t generalize to unseen states and scale poorly with large or continuous state spaces. Deep RL replaces the table with a neural network that approximates Q-values (or policies), enabling generalization. For the parking-lot example, inputs can be encoded as compact scalars or one-hot vectors; the network outputs one value per action and is trained via backpropagation using targets derived from rewards and discounted future estimates.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $28.79

you save $19.20 (40%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $28.79

you save $19.20 (40%)

eBook

pdf, ePub, online

$47.99 $28.79

you save $19.20 (40%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more