Concise descriptions of algorithms with their mathematical foundation and sample code in Python.

*Algorithms of the Intelligent Web, Second Edition* teaches the most important approaches to algorithmic web data analysis, enabling you to create your own machine learning applications that crunch, munge, and wrangle data collected from users, web applications, sensors and website logs.

## 1. Building applications for the intelligent web

### 1.1. An intelligent algorithm in action: Google Now

### 1.2. The intelligent algorithm lifecycle

### 1.3. Further examples of intelligent algorithms

### 1.4. Things that intelligent applications are not

#### 1.4.1. Intelligent algorithms are not all-purpose thinking machines

#### 1.4.2. Intelligent algorithms are not a drop-in replacement for humans

#### 1.4.3. Intelligent algorithms are not discovered by accident

### 1.5. Classes of intelligent algorithm

#### 1.5.1. Artificial intelligence

#### 1.5.2. Machine learning

#### 1.5.3. Predictive analytics

### 1.6. Evaluating the performance of intelligent algorithms

#### 1.6.1. Evaluating intelligence

#### 1.6.2. Evaluating predictions

### 1.7. Important notes about intelligent algorithms

#### 1.7.1. Your data is not reliable

#### 1.7.2. Inference does not happen instantaneously

#### 1.7.3. Size matters!

#### 1.7.4. Different algorithms have different scaling characteristics

#### 1.7.5. Everything is not a nail!

#### 1.7.6. Data isn’t everything

#### 1.7.7. Training time can be variable

#### 1.7.8. Generalization is the goal

#### 1.7.9. Human intuition is problematic

#### 1.7.10. Think about engineering new features

#### 1.7.11. Learn many different models

#### 1.7.12. Correlation is not the same as causation

### 1.8. Summary

## 2. Extracting structure from data: clustering and transforming your data

### 2.1. Data, structure, bias, and noise

### 2.2. The curse of dimensionality

### 2.3. The k-means algorithm

#### 2.3.1. k-means in action

### 2.4. The Gaussian mixture model

#### 2.4.1. What is the Gaussian distribution?

#### 2.4.2. Expectation maximization and the Gaussian distribution

#### 2.4.3. The Gaussian mixture model

#### 2.4.4. An example of learning using a Gaussian mixture model

### 2.5. The relationship between k-means and GMM

### 2.6. Transforming the data axis

#### 2.6.1. Eigenvectors and eigenvalues

#### 2.6.2. Principal component analysis

#### 2.6.3. An example of principal component analysis

### 2.7. Summary

## 3. Recommending relevant content

### 3.1. Setting the scene: an online movie store

### 3.2. Distance and similarity

#### 3.2.1. A closer look at distance and similarity

#### 3.2.2. Which is the best similarity formula?

### 3.3. How do recommender engines work?

### 3.4. User-based collaborative filtering

### 3.5. Model-based recommendation using singular value decomposition

#### 3.5.1. Singular value decomposition

#### 3.5.2. Recommendation using SVD; choosing movies for a given user

#### 3.5.3. Recommendation using SVD; choosing users for a given movie

### 3.6. The Netflix Prize

### 3.7. Evaluating your recommender

### 3.8. Summary

## 4. Classification: placing things where they belong

### 4.1. The need for classification

### 4.2. An overview of classifiers

#### 4.2.1. Structural classification algorithms

#### 4.2.2. Statistical classification algorithms

#### 4.2.3. The lifecycle of a classifier

### 4.3. Fraud Detection with Logistic Regression

#### 4.3.1. A linear regression primer

#### 4.3.2. From linear to logistic regression

#### 4.3.3. Implementing fraud detection

### 4.4. Are your results credible?

### 4.5. Classification with very large datasets

### 4.6. Summary

## 5. Case study: click prediction for online advertising

### 5.1. History and background

### 5.2. The exchange

#### 5.2.1. Cookie matching

#### 5.2.2. Bid

#### 5.2.3. Bid win (or loss) notification

#### 5.2.4. Ad placement

#### 5.2.5. Ad monitoring

### 5.3. What is a bidder?

#### 5.3.1. Requirements of a bidder

### 5.4. What is a decisioning engine?

#### 5.4.1. Information about the user

#### 5.4.2. Information about the placement

#### 5.4.3. Contextual information

#### 5.4.4. Data preparation

#### 5.4.5. Decisioning engine model

#### 5.4.6. Mapping predicted click-through rate to bid price

#### 5.4.7. Feature engineering

#### 5.4.8. Model training

### 5.5. Click prediction with Vowpal Wabbit

#### 5.5.1. Vowpal Wabbit data format

#### 5.5.2. Preparing our dataset

#### 5.5.3. Testing your model

#### 5.5.4. Model calibration

### 5.6. Complexities of building a decisioning engine

### 5.7. The future of real-time prediction

### 5.8. Conclusions

## 6. Deep learning and neural networks

### 6.1. An intuitive approach to deep learning

### 6.2. Neural networks

### 6.3. The perceptron

#### 6.3.1. Training

#### 6.3.2. Training a perceptron in scikit-learn

#### 6.3.3. A geometric interpretation of the perceptron for two inputs

### 6.4. Multilayer perceptrons

#### 6.4.1. Training using backpropagation

#### 6.4.2. Activation functions

#### 6.4.3. Intuition behind backpropagation

#### 6.4.4. Backpropagation theory

#### 6.4.5. MLNN in scikit-learn

#### 6.4.6. A learned MLP

### 6.5. Going deeper: from multilayer neural networks to deep learning

#### 6.5.1. Restricted Boltzmann Machines

#### 6.5.2. The Bernoulli Restricted Boltzmann Machine

#### 6.5.3. RBMS in action

### 6.6. Summary

## 7. Making the right choice

### 7.1. A/B Testing

#### 7.1.1. The Theory

#### 7.1.2. The Code

#### 7.1.3. Suitability of A/B

### 7.2. Multi-Armed bandits

#### 7.2.1. Multi-armed bandit strategies

### 7.3. Bayesian bandits in the wild

### 7.4. A/B vs. the Bayesian Bandit

### 7.5. Extensions to multi-armed bandits

#### 7.5.1. Contextual bandits

#### 7.5.2. Adversarial bandits

### 7.6. Summary

## 8. The future of the intelligent web

### 8.1. Future applications of the intelligent web

#### 8.1.1. The Internet of Things

#### 8.1.2. Home healthcare

#### 8.1.3. The self-driving vehicle

#### 8.1.4. Personalized physical advertising

#### 8.1.5. The semantic web

### 8.2. Social implications of the intelligent web

# Appendixes

## Appendix A: Capturing data on the web

### A.1. A motivating example: showing adverts online

#### A.1.1. Data available for online advertising

### A.2. Data collection: a naive approach

### A.3. Managing Data Collection At Scale

### A.4. Introducing Kafka

#### A.4.1. Replication in Kafka

#### A.4.2. Consumer Groups, balancing and ordering

#### A.4.3. Putting it all together

### A.5. Evaluating Kafka: Data Collection at Scale

### A.6. Kafka Design Patterns

#### A.6.1. Kafka + Storm

#### A.6.2. Kafka + Hadoop

### A.7. Summary

## About the Technology

Valuable insights are buried in the tracks web users leave as they navigate pages and applications. You can uncover them by using intelligent algorithms like the ones that have earned Facebook, Google, and Twitter a place among the giants of web data pattern extraction.

## About the book

*Algorithms of the Intelligent Web, Second Edition* teaches you how to create machine learning applications that crunch and wrangle data collected from users, web applications, and website logs. In this totally revised edition, you’ll look at intelligent algorithms that extract real value from data. Key machine learning concepts are explained with code examples in Python’s scikit-learn. This book guides you through algorithms to capture, store, and structure data streams coming from the web. You’ll explore recommendation engines and dive into classification via statistical algorithms, neural networks, and deep learning.

## What's inside

- Introduction to machine learning
- Extracting structure from data
- Deep learning and neural networks
- How recommendation engines work

## About the authors

**Douglas McIlwraith** is a machine learning expert and data science practitioner in the field of online advertising. **Dr. Haralambos Marmanis** is a pioneer in the adoption of machine learning techniques for industrial solutions. **Dmitry Babenko** designs applications for banking, insurance, and supply-chain management.