Practical Recommender Systems
Kim Falk
  • MEAP began July 2015
  • Publication in Summer 2017 (estimated)
  • ISBN 9781617292705
  • 375 pages (estimated)
  • printed in black & white

Practical Recommender Systems goes behind the curtain to show you how recommender systems work and, more importantly, how to create and apply them for your site. After you've covered the basics of how recommender systems work, you'll discover how to collect user data and produce personalized recommendations. Next, you'll learn how and where to use the most popular recommendation algorithms and see examples of them in action on sites like Amazon and Netflix. Finally, this hands-on guide covers scaling problems and other issues you may encounter as your site grows.

Table of Contents detailed table of contents

1. Introduction

1.1. Real-life recommender system

1.1.1. Recommender systems are at home on the internet

1.1.2. The Netflix recommender system

1.1.3. Recommender system definition

1.2. Taxonomy of recommender systems

1.2.1. Domain

1.2.2. Purpose

1.2.3. Context

1.2.4. Personalization level

1.2.5. Whose opinions

1.2.6. Privacy and trustworthiness

1.2.7. Interface

1.2.8. Algorithms

1.3. Machine learning and the Netflix Prize

1.4. The Movie GEEKs website

1.4.1. Design and specification

1.4.2. Architecture

1.5. Summary

Part 1: Introduction to Recommender Systems

2. User behavior and how to collect it

2.1. How (I think) Netflix gathers evidence while you browse

2.1.1. The evidence Netflix collects

2.2. Finding useful user behavior

2.2.1. Capturing visitor impressions

2.2.2. What you can learn from a browser

2.2.3. Act of buying

2.2.4. Consuming products

Visitor ratings

2.2.5. Getting to know your customers the Netflix way

2.2.6. Identifying users

2.3. Getting visitor data from other sources

2.4. The collector

2.4.1. Build the project files

2.4.2. The snitch—client-side evidence collector

2.5. Integrate the collector into MovieGEEK

2.6. What is a user in the system And how to model them

2.7. Summary

3. Analytics primer and implementing a dashboard

3.1. Why adding a dashboard is a good idea

3.1.1. Answering "How are we doing?"

3.2. Doing the analytics

3.2.1. Web analytics

3.2.2. The basics statistics

3.2.3. Conversions

3.2.4. Analyzing the path up to conversion

3.2.5. Conversion path

3.3. MovieGEEKs dashboard.

3.3.1. Specification and design of the analytics dashboard

3.3.2. Analytics dashboard wireframe

3.3.3. Architecture

3.4. Summary and what's to come

4. On ratings and how to calculate them

4.1. User-item preferences

4.1.1. Definition of ratings

4.1.2. User-item matrix

4.2. What data can be trusted.

4.2.1. How we use trusted sources for recs

4.3. Revisiting explicit ratings

4.4. What are implicit ratings

4.4.1. People suggestions

4.4.2. Considerations of calculating ratings

4.5. Calculating implicit ratings

4.5.1. Looking at the behavioral data

4.5.2. This could be considered a machine-learning problem

4.6. How to implement these calculations implicit ratings

4.6.1. Adding the time aspect

4.7. Summary

5. Non-personalized recommendations

5.1. What is a non-personalized recommendation

5.1.1. What is a recommendation and what is a commercial.

5.1.2. What is non-personalized recommendation

5.2. How to make recommendations when you don't have any data.

5.3. Top 10 - A chart of Items.

5.4. Implementing the chart and, in the process, the groundwork for the Recommender system component

5.4.1. The recommender system component

5.4.2. Code from Github

5.4.3. A recommender system

5.4.4. Adding chart to Movie Geeks

5.5. Seeded recommendations

5.5.1. Top 10 items bought by same user as the one you are viewing.

5.5.2. Association rules

5.5.3. Implementing association rules

5.5.4. Saving the association rules in the database.

5.5.5. Use different events to create the association rules

5.6. Summary

6. The user (and content) who came in from the Cold

6.1. What is a Cold Start?

6.1.1. Cold product

6.1.2. A cold visitor

6.1.3. Gray sheep

6.1.4. So what can we do about cold starts?

6.2. Keeping track of visitors

6.2.1. Persisting anonymous users

6.3. Three ways to address cold start problem with algorithms.

6.3.1. Using Association Rules to create recs for cold users.

6.3.2. Using domain knowledge and Business rules.

6.3.3. Using Segments

6.3.4. A possible way to get around the Gray Sheep problem and how to introduce cold product

6.4. He who does not ask, will not know

6.4.1. When the visitor is not new any longer

6.5. Implementing Greeting visitors for the first time with association rules.

6.5.1. Find the Collected items

6.5.2. Retrieve Association rules and order them according to confidence.

6.5.3. Display the recs.

6.5.4. Implementation evaluation

6.6. Summary

7. Finding similarities between users and between content

7.1. Why do we need to talk about Similarity?

7.1.1. What is a Similarity functions

7.2. Essential Similarity functions?

7.2.1. Jaccard distance

7.2.2. Lp-norms

7.2.3. Cosine similarity

7.2.4. Pearson Similarity

7.2.5. Test running Pearson Similarity

7.2.6. Pearson is really similar to cosine:

7.3. K-means clustering

7.3.1. k-means clustering Algorithm

7.3.2. Translating k-means clustering into Python

7.4. Implementing Similarities

7.4.1. Implement the similarity in MovieGEEKs site

7.4.2. Implement the clustering in MovieGEEKs site

7.5. Summary

8. Collaborative Filtering in the Neighborhood

8.1. What is collaborative filtering

8.1.1. When information became collaborative filtered

8.1.2. Helping each other

8.1.3. The rating matrix

8.1.4. The collaborative filtering pipeline

8.1.5. User-user collaborative filtering

8.1.6. Data Requirements

8.2. Calculate recommendations

8.3. Calculating the similarities

8.4. Amazons algorithm to pre-calculate item similarity

8.5. Ways to select the neighborhood

8.6. Finding the right neighborhood

8.7. Ways to calculate predicted ratings

8.8. Prediction with item based filtering

8.8.1. Compute item predictions

8.9. Cold start problems

8.10. A few words on machine learning terms.

8.11. Collaborative filtering on the MovieGEEK site

8.11.1. Item based filtering

8.12. What is the difference between association rule recs and collaborative recs?

8.13. Summary

9. Content-based Filtering

9.1. Introduction

9.2. Descriptive example

9.3. Content-based filtering

9.4. Content Analyzer

9.4.1. Feature extraction for the item profile

9.4.2. Categorical Data with small numbers

9.4.3. Converting the year to a comparable feature

9.5. Extracting metadata from descriptions

9.5.1. Preparing descriptions

9.5.2. The professional Netflix watchers

9.6. Finding important words with Term Frequency - Inverse Document Frequency (TF-IDF)

9.7. Topic modeling using the Latent Dirichlet Allocation (LDA)

9.7.1. What numbers can be turned to tweak the LDA

9.8. Finding similar content

9.9. Creating the user profile

9.10. Content based recommendations in MovieGEEKs

9.10.1. Loading data

9.10.2. Train the model

9.10.3. Creating item profiles

9.10.4. Creating user profiles

9.10.5. Showing recs

9.11. Pros and Cons for content-based filtering.

9.12. Summary

10. Finding hidden genres with Matrix Factorization

10.1. Introduction

10.2. Sometimes it's good to reduce the size of the data

10.3. Example of what we want to solve

10.4. Linear Algebra

10.4.1. Matrix

10.4.2. What is Factorization

10.5. Constructing the Factorization using SVD

10.5.1. Adding a new user by folding in

10.5.2. How to do recommendations with SVD

10.5.3. Baseline Predictors

10.5.4. Problems with SVD

10.6. Constructing the factorization using FunkSVD

10.6.1. Root Mean Squared Error

10.6.2. Gradient Descent

10.6.3. Stochastic Gradient Descent

10.6.4. And finally to the Factorization

10.6.5. Adding Biases

10.6.6. When to stop

10.7. Doing recommendations with FunkSVD

10.8. Funk SVD implementation in MovieGEEKs

10.8.1. Keeping the model up to date.

10.9. Summary

11. Taking the best of all algorithms - implementing hybrid recommenders

11.1. The confused world of hybrids

11.2. Monolithic

11.2.1. Mixing features from content based features with behavioral data to improve collaborative filtering recommenders.

11.3. Mixed Hybrid Recommender

11.4. Ensemble Recommenders

11.4.1. Switched ensemble recommender

11.4.2. Weighted ensemble recommender

11.5. Feature-Weighted Linear Stacking

11.5.1. Meta-features - Weights as functions

11.5.2. The algorithm

11.6. Implementation

11.7. Summary

12. Ranking and Learning to Rank

12.1. Introduction

12.2. Leaning to Rank example at Foursquare

12.3. Re-ranking

12.4. What is Learning to Rank?

12.4.1. The three types of learning to Rank algorithms

12.4.2. Ways to gauge quality of ranking

12.5. How to teach the ranking algorithm

12.6. Bayesian Personalized Ranking

12.6.1. BPR

12.6.2. Math magic (advanced section)

12.6.3. The BPR algorithm

12.6.4. Bayesian Personalized Ranking with Matrix Factorization

12.7. Implementation

12.8. Summary

13. Evaluating and testing your recommender

13.1. Business wants Lift, cross-sales, up-sales and conversions.

13.2. Why is it important to evaluate?

13.3. What to measure

13.3.1. Understanding my taste - minimizing prediction Error

13.3.2. Diversity

13.3.3. Coverage

13.3.4. Serendipity

13.4. Before even starting the offline evaluation

13.4.1. Verify Algorithm

13.4.2. Regression Testing

13.5. Offline evaluation.

13.6. Types of Evaluation

13.7. Offline experiments

13.7.1. Performing the experiment

13.7.2. Implementing the experiment

13.8. Controlled experiments

13.8.1. Family and friends

13.9. A/B testing

13.10. Continuous testing with exploit/explore.

13.11. Summary

14. Future of recommender systems

About the Technology

Recommender systems are everywhere, helping you find everything from movies to jobs, restaurants to hospitals, even romance. Using behavioral and demographic data, these systems make predictions about what users will be most interested in at a particular time, resulting in high-quality, ordered, personalized suggestions. Recommender systems are practically a necessity for keeping your site content current, useful, and interesting to your visitors.

What's inside

  • Practical introduction to recommender system algorithms
  • Collaborative and content-based filtering
  • Creating individual recommendations from visitor data
  • Real-world examples of recommender systems

About the reader

This book assumes you're comfortable reading code in Python and have some experience with databases.

About the author

Kim Falk is a Data Scientist at Adform, where he is working on recommender systems. He has experience in providing recommendations for large entertainment companies and working with big data solutions.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
  • MEAP combo $49.99 pBook + eBook
  • MEAP eBook $39.99 pdf + ePub + kindle

FREE domestic shipping on three or more pBooks