Algorithms of the Intelligent Web, Second Edition
Douglas G. McIlwraith, Haralambos Marmanis, and Dmitry Babenko
Foreword by Yike Guo
  • August 2016
  • ISBN 9781617292583
  • 240 pages
  • printed in black & white

Concise descriptions of algorithms with their mathematical foundation and sample code in Python.

From the Foreword by Yike Guo, Data Science Institute, Imperial College London

GET MORE WITH MANNING

An eBook copy of the previous edition, Algorithms of the Intelligent Web (First Edition), is included at no additional cost. It will be automatically added to your Manning Bookshelf within 24 hours of purchase.


Algorithms of the Intelligent Web, Second Edition teaches the most important approaches to algorithmic web data analysis, enabling you to create your own machine learning applications that crunch, munge, and wrangle data collected from users, web applications, sensors and website logs.

Table of Contents detailed table of contents

1. Building applications for the intelligent web

1.1. An intelligent algorithm in action: Google Now

1.2. The intelligent algorithm lifecycle

1.3. Further examples of intelligent algorithms

1.4. Things that intelligent applications are not

1.4.1. Intelligent algorithms are not all-purpose thinking machines

1.4.2. Intelligent algorithms are not a drop-in replacement for humans

1.4.3. Intelligent algorithms are not discovered by accident

1.5. Classes of intelligent algorithm

1.5.1. Artificial intelligence

1.5.2. Machine learning

1.5.3. Predictive analytics

1.6. Evaluating the performance of intelligent algorithms

1.6.1. Evaluating intelligence

1.6.2. Evaluating predictions

1.7. Important notes about intelligent algorithms

1.7.1. Your data is not reliable

1.7.2. Inference does not happen instantaneously

1.7.3. Size matters!

1.7.4. Different algorithms have different scaling characteristics

1.7.5. Everything is not a nail!

1.7.6. Data isn’t everything

1.7.7. Training time can be variable

1.7.8. Generalization is the goal

1.7.9. Human intuition is problematic

1.7.10. Think about engineering new features

1.7.11. Learn many different models

1.7.12. Correlation is not the same as causation

1.8. Summary

2. Extracting structure from data: clustering and transforming your data

2.1. Data, structure, bias, and noise

2.2. The curse of dimensionality

2.3. The k-means algorithm

2.3.1. k-means in action

2.4. The Gaussian mixture model

2.4.1. What is the Gaussian distribution?

2.4.2. Expectation maximization and the Gaussian distribution

2.4.3. The Gaussian mixture model

2.4.4. An example of learning using a Gaussian mixture model

2.5. The relationship between k-means and GMM

2.6. Transforming the data axis

2.6.1. Eigenvectors and eigenvalues

2.6.2. Principal component analysis

2.6.3. An example of principal component analysis

2.7. Summary

3. Recommending relevant content

3.1. Setting the scene: an online movie store

3.2. Distance and similarity

3.2.1. A closer look at distance and similarity

3.2.2. Which is the best similarity formula?

3.3. How do recommender engines work?

3.4. User-based collaborative filtering

3.5. Model-based recommendation using singular value decomposition

3.5.1. Singular value decomposition

3.5.2. Recommendation using SVD; choosing movies for a given user

3.5.3. Recommendation using SVD; choosing users for a given movie

3.6. The Netflix Prize

3.7. Evaluating your recommender

3.8. Summary

4. Classification: placing things where they belong

4.1. The need for classification

4.2. An overview of classifiers

4.2.1. Structural classification algorithms

4.2.2. Statistical classification algorithms

4.2.3. The lifecycle of a classifier

4.3. Fraud Detection with Logistic Regression

4.3.1. A linear regression primer

4.3.2. From linear to logistic regression

4.3.3. Implementing fraud detection

4.4. Are your results credible?

4.5. Classification with very large datasets

4.6. Summary

5. Case study: click prediction for online advertising

5.1. History and background

5.2. The exchange

5.2.2. Bid

5.2.3. Bid win (or loss) notification

5.2.4. Ad placement

5.2.5. Ad monitoring

5.3. What is a bidder?

5.3.1. Requirements of a bidder

5.4. What is a decisioning engine?

5.4.1. Information about the user

5.4.2. Information about the placement

5.4.3. Contextual information

5.4.4. Data preparation

5.4.5. Decisioning engine model

5.4.6. Mapping predicted click-through rate to bid price

5.4.7. Feature engineering

5.4.8. Model training

5.5. Click prediction with Vowpal Wabbit

5.5.1. Vowpal Wabbit data format

5.5.2. Preparing our dataset

5.5.3. Testing your model

5.5.4. Model calibration

5.6. Complexities of building a decisioning engine

5.7. The future of real-time prediction

5.8. Conclusions

6. Deep learning and neural networks

6.1. An intuitive approach to deep learning

6.2. Neural networks

6.3. The perceptron

6.3.1. Training

6.3.2. Training a perceptron in scikit-learn

6.3.3. A geometric interpretation of the perceptron for two inputs

6.4. Multilayer perceptrons

6.4.1. Training using backpropagation

6.4.2. Activation functions

6.4.3. Intuition behind backpropagation

6.4.4. Backpropagation theory

6.4.5. MLNN in scikit-learn

6.4.6. A learned MLP

6.5. Going deeper: from multilayer neural networks to deep learning

6.5.1. Restricted Boltzmann Machines

6.5.2. The Bernoulli Restricted Boltzmann Machine

6.5.3. RBMS in action

6.6. Summary

7. Making the right choice

7.1. A/B Testing

7.1.1. The Theory

7.1.2. The Code

7.1.3. Suitability of A/B

7.2. Multi-Armed bandits

7.2.1. Multi-armed bandit strategies

7.3. Bayesian bandits in the wild

7.4. A/B vs. the Bayesian Bandit

7.5. Extensions to multi-armed bandits

7.5.1. Contextual bandits

7.5.2. Adversarial bandits

7.6. Summary

8. The future of the intelligent web

8.1. Future applications of the intelligent web

8.1.1. The Internet of Things

8.1.2. Home healthcare

8.1.3. The self-driving vehicle

8.1.4. Personalized physical advertising

8.1.5. The semantic web

8.2. Social implications of the intelligent web

Appendixes

Appendix A: Capturing data on the web

A.1. A motivating example: showing adverts online

A.1.1. Data available for online advertising

A.2. Data collection: a naive approach

A.3. Managing Data Collection At Scale

A.4. Introducing Kafka

A.4.1. Replication in Kafka

A.4.2. Consumer Groups, balancing and ordering

A.4.3. Putting it all together

A.5. Evaluating Kafka: Data Collection at Scale

A.6. Kafka Design Patterns

A.6.1. Kafka + Storm

A.6.2. Kafka + Hadoop

A.7. Summary

About the Technology

Valuable insights are buried in the tracks web users leave as they navigate pages and applications. You can uncover them by using intelligent algorithms like the ones that have earned Facebook, Google, and Twitter a place among the giants of web data pattern extraction.

About the book

Algorithms of the Intelligent Web, Second Edition teaches you how to create machine learning applications that crunch and wrangle data collected from users, web applications, and website logs. In this totally revised edition, you?ll look at intelligent algorithms that extract real value from data. Key machine learning concepts are explained with code examples in Python?s scikit-learn. This book guides you through algorithms to capture, store, and structure data streams coming from the web. You?ll explore recommendation engines and dive into classification via statistical algorithms, neural networks, and deep learning.

What's inside

  • Introduction to machine learning
  • Extracting structure from data
  • Deep learning and neural networks
  • How recommendation engines work

About the reader

Knowledge of Python is assumed.

About the authors

Douglas McIlwraith is a machine learning expert and data science practitioner in the field of online advertising. Dr. Haralambos Marmanis is a pioneer in the adoption of machine learning techniques for industrial solutions. Dmitry Babenko designs applications for banking, insurance, and supply-chain management.


Buy
combo $44.99 pBook + eBook + liveBook
eBook $35.99 pdf + ePub + kindle + liveBook
Already own this book?
You can add audio for liveBook from your bookshelf!

FREE domestic shipping on three or more pBooks

This second edition brings fresh life to an all-time classic.

Marius Butuc, Shopify

Covers the most essential areas of machine learning application in the real world. A great hands-on approach.

Radha Ranjan Madhav, Amazon

Great balance between theory and practice.

Dike E. Kalu, Fara Frica