Practical Recommender Systems
Kim Falk
  • MEAP began July 2015
  • Publication in Fall 2017 (estimated)
  • ISBN 9781617292705
  • 375 pages (estimated)
  • printed in black & white

Practical Recommender Systems goes behind the curtain to show you how recommender systems work and, more importantly, how to create and apply them for your site. After you've covered the basics of how recommender systems work, you'll discover how to collect user data and produce personalized recommendations. Next, you'll learn how and where to use the most popular recommendation algorithms and see examples of them in action on sites like Amazon and Netflix. Finally, this hands-on guide covers scaling problems and other issues you may encounter as your site grows.

Table of Contents detailed table of contents

1. What is a Recommender

1.1. Real-life recommendations

1.1.1. Recommender systems are at home on the internet

1.1.2. The Netflix recommender system

1.1.3. Recommender system definition

1.2. Taxonomy of recommender systems

1.2.1. Domain

1.2.2. Purpose

1.2.3. Context

1.2.4. Personalization level

1.2.5. Whose opinions

1.2.6. Privacy and trustworthiness

1.2.7. Interface

1.2.8. Algorithms

1.3. Machine learning and the Netflix Prize

1.4. The Movie GEEKs website

1.4.1. Design and specification

1.4.2. Architecture

1.5. Building a recommender system

1.6. Summary

Part 1: Introduction to Recommender Systems

2. User behavior and how to collect it

2.1. How (I think) Netflix gathers evidence while you browse

2.1.1. The evidence Netflix collects

2.2. Finding useful user behavior

2.2.1. What you can learn from a browser

2.2.2. Act of buying

2.2.3. Consuming products

2.2.4. Visitor ratings

2.2.5. Getting to know your customers the Netflix way

2.3. Identifying users

2.4. Getting visitor data from other sources

2.5. The collector

2.5.1. Build the project files

2.5.2. The Data Model

2.5.3. The snitch - client-side evidence collector

2.6. Integrate the collector into MovieGEEK

2.7. What is a user in the system And how to model them

2.8. Summary

3. Analytics primer and implementing a dashboard

3.1. Why adding a dashboard is a good Idea

3.1.1. Answering "How are we doing?"

3.2. Doing the analytics

3.2.1. Web analytics

3.2.2. The basic statistics

3.2.3. Conversions

3.2.4. Analyzing the path up to conversion

3.2.5. Conversion path

3.3. MovieGEEKs dashboard.

3.3.1. Specification and design of the analytics dashboard

3.3.2. Analytics dashboard wireframe

3.3.3. Architecture

3.4. Summary and what's to come

4. On ratings and how to calculate them

4.1. User-item preferences

4.1.1. Definition of ratings

4.1.2. User-item matrix

4.2. Explicit or Implicit Ratings.

4.2.1. How we use trusted sources for recs

4.3. Revisiting explicit ratings

4.4. What are implicit ratings

4.4.1. People suggestions

4.4.2. Considerations of calculating ratings

4.5. Calculating implicit ratings

4.5.1. Looking at the behavioral data

4.5.2. This could be considered a machine-learning problem

4.6. How to implement these calculations implicit ratings

4.6.1. Adding the time aspect

4.7. Less frequent items provide more value

4.8. Summary

5. Non-personalized recommendations

5.1. What is a non-personalized recommendation

5.1.1. What is a recommendation and what is a commercial.

5.1.2. What is non-personalized recommendation

5.2. How to make recommendations when you don’t have any data.

5.3. Top 10 - A chart of Items.

5.4. Implementing the chart and, in the process, the groundwork for the Recommender system component

5.4.1. The recommender system component

5.4.2. Code from Github

5.4.3. A recommender system

5.4.4. Adding chart to Movie Geeks

5.4.5. Making the content look more attractive

5.5. Seeded recommendations

5.5.1. Top 10 items bought by same user as the one you are viewing.

5.5.2. Association rules

5.5.3. Implementing association rules

5.5.4. Saving the association rules in the database.

5.5.5. Use different events to create the association rules

5.6. Summary

6. The user (and content) who came in from the Cold

6.1. What is a cold Start?

6.1.1. Cold product

6.1.2. A cold visitor

6.1.3. Gray sheep

6.1.4. Let’s look at some real-life examples

6.1.5. So, what can we do about cold starts?

6.2. Keeping track of visitors

6.2.1. Persisting anonymous users

6.3. Three ways to address cold start problem with algorithms

6.3.1. Using association rules to create recs for cold users

6.3.2. Using domain knowledge and business rules

6.3.3. Using Segments

6.3.4. A possible way to get around the Gray Sheep problem and how to introduce cold product

6.4. He who does not ask, will not know

6.5. When the visitor is not new any longer

6.6. Implementing Greeting visitors for the first time with association rules.

6.6.1. Find the Collected items

6.6.2. Retrieve Association rules and order them according to confidence.

6.6.3. Display the recs.

6.6.4. Implementation evaluation

6.7. Summary

7. Finding similarities between users and between content

7.1. Why we need to talk about similarity?

7.1.1. What is a Similarity function?

7.2. Essential similarity functions

7.2.1. Jaccard distance

7.2.2. Lp-norms

7.2.3. Cosine similarity

7.2.4. Pearson Similarity

7.2.5. Test running Pearson Similarity

7.2.6. Pearson is really similar to cosine

7.3. K-means clustering

7.3.1. k-means clustering algorithm

7.3.2. Translating k-means clustering into Python

7.4. Implementing Similarities

7.4.1. Implement the similarity in MovieGEEKs site

7.4.2. Implement the clustering in MovieGEEKs site

7.5. Summary

8. Collaborative Filtering in the Neighborhood

8.1. What is collaborative filtering

8.1.1. When information became collaboratively filtered

8.1.2. Helping each other

8.1.3. The rating matrix

8.1.4. The collaborative filtering pipeline

8.1.5. User-user collaborative filtering

8.1.6. Data Requirements

8.2. Calculate Recommendations

8.3. Calculating the similarities

8.4. Amazons algorithm to pre-calculate item similarity

8.5. Ways to select the neighborhood

8.6. Finding the right neighborhood

8.7. Ways to calculate predicted ratings

8.8. Prediction with item-based filtering

8.8.1. Compute item predictions

8.9. Cold start problems

8.10. A few words on machine learning terms.

8.11. Collaborative filtering on the MovieGEEK site

8.11.1. Item based filtering

8.12. What is the difference between association rule recs and collaborative recs?

8.13. Summary

9. Evaluating and testing your recommender

9.1. Business wants lift, cross-sales, up-sales, and conversions

9.2. Why is it important to evaluate?

9.3. What to measure

9.3.1. Understanding my taste - minimizing prediction error

9.3.2. Diversity

9.3.3. Coverage

9.3.4. Serendipity

9.4. Even before implementing the recommender

9.4.1. Verify the algorithm

9.4.2. Regression Testing

9.5. Types of evaluation

9.6. Offline evaluation

9.7. What to do when the algorithm doesn’t produce any recommendations

9.8. Offline experiments

9.8.1. Performing the experiment

9.9. Implementing the experiment

9.9.1. What we will implement

9.10. Controlled experiments

9.10.1. Family and friends

9.11. A/B testing

9.12. Continuous testing with exploit/explore

9.13. Summary

10. Content-based Filtering

10.1. Introduction

10.2. Descriptive example

10.3. Content-based filtering

10.4. Content Analyzer

10.4.1. Feature extraction for the item profile

10.4.2. Categorical data with small numbers

10.4.3. Converting the year to a comparable feature

10.5. Extracting Metadata from Descriptions

10.5.1. Preparing Descriptions

10.5.2. The professional Netflix watchers

10.6. Finding important words with Term Frequency - Inverse Document Frequency (TF-IDF)

10.7. Topic modeling using the Latent Dirichlet Allocation (LDA)

10.7.1. What knobs can we turn to tweak the LDA?

10.8. Finding similar content

10.9. Creating the user profile

10.10. Content-based recommendations in MovieGEEKs

10.10.1. Loading data

10.10.2. Train the model

10.10.3. Creating item profiles

10.10.4. Creating user profiles

10.10.5. Showing recs

10.11. Evaluation of the content-based recommender

10.12. Pros and Cons for content-based filtering.

10.13. Summary

11. Finding hidden genres with Matrix Factorization

11.1. Introduction

11.2. Sometimes it’s good to reduce the size of the data

11.3. Example of what we want to solve

11.4. A whiff of linear algebra

11.4.1. Matrix

11.4.2. What is factorization?

11.5. Constructing the factorization using SVD

11.5.1. Adding a new user by folding in

11.5.2. How to do recommendations with SVD

11.5.3. Baseline Predictors

11.6. Constructing the factorization using FunkSVD

11.6.1. Root Mean Squared Error

11.6.2. Gradient Descent

11.6.3. Stochastic Gradient Descent

11.6.4. And finally, to the Factorization

11.6.5. Adding Biases

11.6.6. When to stop

11.7. Doing recommendations with FunkSVD

11.8. Funk SVD implementation in MovieGEEKs

11.8.1. Keeping the model up to date.

11.8.2. Faster implementation.

11.9. Evaluation

11.10. Summary

12. Taking the best of all algorithms - implementing hybrid recommenders

12.1. The confused world of hybrids

12.2. The Monolithic

12.2.1. Mixing features from content-based features with behavioral data to improve collaborative filtering recommenders.

12.3. Mixed Hybrid Recommender

12.4. The Ensemble

12.4.1. Switched ensemble recommender

12.4.2. Weighted Ensemble Recommender

12.5. Feature-Weighted Linear Stacking

12.6. Meta-features - Weights as functions

12.6.1. The algorithm

12.7. Implementation

12.8. Summary

13. Ranking and Learning to Rank

13.1. Introduction

13.2. Learning to Rank example at Foursquare

13.3. Re-ranking

13.4. What is learning to rank?

13.4.1. The three types of learning to Rank algorithms

13.5. Bayesian Personalized Ranking ==== BPR

13.5.1. Math magic (advanced section)

13.5.2. The BPR algorithm

13.5.3. Bayesian Personalized Ranking with Matrix Factorization

13.6. Implementation of BPR

13.7. Evaluation

13.8. Summary

14. Future of Recommender Systems

14.1. This Book in a Few Sentences

14.2. So which of the algorithms should you start out implementing?

14.3. Topics to learn next

14.3.1. Further readings

14.3.2. Algorithms

14.3.3. Context

14.3.4. Human Computer Interactions

14.3.5. Choosing a good architecture

14.4. What is the future of recommender systems?

14.5. Final Thoughts

About the Technology

Recommender systems are everywhere, helping you find everything from movies to jobs, restaurants to hospitals, even romance. Using behavioral and demographic data, these systems make predictions about what users will be most interested in at a particular time, resulting in high-quality, ordered, personalized suggestions. Recommender systems are practically a necessity for keeping your site content current, useful, and interesting to your visitors.

What's inside

  • Practical introduction to recommender system algorithms
  • Collaborative and content-based filtering
  • Creating individual recommendations from visitor data
  • Real-world examples of recommender systems

About the reader

This book assumes you're comfortable reading code in Python and have some experience with databases.

About the author

Kim Falk is a Data Scientist at Adform, where he is working on recommender systems. He has experience in providing recommendations for large entertainment companies and working with big data solutions.


Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
Buy
MEAP combo $49.99 pBook + eBook
MEAP eBook $39.99 pdf + ePub + kindle

FREE domestic shipping on three or more pBooks