An approachable and useful book.

*Machine Learning in Action* is unique book that blends the foundational theories of machine learning with the practical realities of building tools for everyday data analysis. You'll use the flexible Python programming language to build programs that implement algorithms for data classification, forecasting, recommendations, and higher-level features like summarization and simplification.

## preface

## acknowledgments

## about this book

## about the author

## about the cover illustration

# Part 1 Classification

## 1. Machine learning basics

### 1.1. What is machine learning?

### 1.2. Key terminology

### 1.3. Key tasks of machine learning

### 1.4. How to choose the right algorithm

### 1.5. Steps in developing a machine learning application

### 1.6. Why Python?

### 1.7. Getting started with the NumPy library

### 1.8. Summary

## 2. Classifying with k-Nearest Neighbors

### 2.1. Classifying with distance measurements

### 2.2. Example: improving matches from a dating site with kNN

### 2.3. Example: a handwriting recognition system

### 2.4. Summary

## 3. Splitting datasets one feature at a time: decision trees

### 3.1. Tree construction

### 3.2. Plotting trees in Python with Matplotlib annotations

### 3.3. Testing and storing the classifier

### 3.4. Example: using decision trees to predict contact lens type

### 3.5. Summary

## 4. Classifying with probability theory: naïve Bayes

### 4.1. Classifying with Bayesian decision theory

### 4.2. Conditional probability

### 4.3. Classifying with conditional probabilities

### 4.4. Document classification with naïve Bayes

### 4.5. Classifying text with Python

### 4.6. Example: classifying spam email with naïve Bayes

### 4.7. Example: using naïve Bayes to reveal local attitudes from personal ads

### 4.8. Summary

## 5. Logistic regression

### 5.1. Classification with logistic regression and the sigmoid function: a tractable step function

### 5.2. Using optimization to find the best regression coefficients

### 5.3. Example: estimating horse fatalities from colic

### 5.4. Summary

## 6. Support vector machines

### 6.1. Separating data with the maximum margin

### 6.2. Finding the maximum margin

### 6.3. Efficient optimization with the SMO algorithm

### 6.4. Speeding up optimization with the full Platt SMO

### 6.5. Using kernels for more complex data

### 6.6. Example: revisiting handwriting classification

### 6.7. Summary

## 7. Improving classification with the AdaBoost meta-algorithm

### 7.1. Classifiers using multiple samples of the dataset

### 7.2. Train: improving the classifier by focusing on errors

### 7.3. Creating a weak learner with a decision stump

### 7.4. Implementing the full AdaBoost algorithm

### 7.5. Test: classifying with AdaBoost

### 7.6. Example: AdaBoost on a difficult dataset

### 7.7. Classification imbalance

### 7.8. Summary

# Part 2 Forecasting numeric values with regression

## 8. Predicting numeric values: regression

### 8.1. Finding best-fit lines with linear regression

### 8.2. Locally weighted linear regression

### 8.3. Example: predicting the age of an abalone

### 8.4. Shrinking coefficients to understand our data

### 8.5. The bias/variance tradeoff

### 8.6. Example: forecasting the price of LEGO sets

### 8.7. Summary

## 9. Tree-based regression

### 9.1. Locally modeling complex data

### 9.2. Building trees with continuous and discrete features

### 9.3. Using CART for regression

### 9.4. Tree pruning

### 9.5. Model trees

### 9.6. Example: comparing tree methods to standard regression

### 9.7. Using Tkinter to create a GUI in Python

### 9.8. Summary

# Part 3 Unsupervised learning

## 10. Grouping unlabeled items using k-means clustering

### 10.1. The k-means clustering algorithm

### 10.2. Improving cluster performance with postprocessing

### 10.3. Bisecting k-means

### 10.4. Example: clustering points on a map

### 10.5. Summary

## 11. Association analysis with the Apriori algorithm

### 11.1. Association analysis

### 11.2. The Apriori principle

### 11.3. Finding frequent itemsets with the Apriori algorithm

### 11.4. Mining association rules from frequent item sets

### 11.5. Example: uncovering patterns in congressional voting

### 11.6. Example: finding similar features in poisonous mushrooms

### 11.7. Summary

## 12. Efficiently finding frequent itemsets with FP-growth

### 12.1. FP-trees: an efficient way to encode a dataset

### 12.2. Build an FP-tree

### 12.3. Mining frequent items from an FP-tree

### 12.4. Example: finding co-occurring words in a Twitter feed

### 12.5. Example: mining a clickstream from a news site

### 12.6. Summary

# Part 4 Additional tools

## 13. Using principal component analysis to simplify data

### 13.1. Dimensionality reduction techniques

### 13.2. Principal component analysis

### 13.3. Example: using PCA to reduce the dimensionality of semiconductor manufacturing data

### 13.4. Summary

## 14. Simplifying data with the singular value decomposition

### 14.1. .1 Applications of the SVD

### 14.2. Matrix factorization

### 14.3. SVD in Python

### 14.4. Collaborative filtering–based recommendation engines

### 14.5. Example: a restaurant dish recommendation engine

### 14.6. Example: image compression with the SVD

### 14.7. Summary

## 15. Big data and MapReduce

### 15.1. MapReduce: a framework for distributed computing

### 15.2. Hadoop Streaming

### 15.3. Running Hadoop jobs on Amazon Web Services

### 15.4. Machine learning in MapReduce

### 15.5. Using mrjob to automate MapReduce in Python

### 15.6. Example: the Pegasos algorithm for distributed SVMs

### 15.7. Do you really need MapReduce?

### 15.8. Summary

## Appendix A: Getting started with Python

## Appendix B: Linear algebra

## Appendix C: Probability refresher

## Appendix D: Resources

## index

© 2014 Manning Publications Co.

## About the book

A machine is said to learn when its performance improves with experience. Learning requires algorithms and programs that capture data and ferret out the interesting or useful patterns. Once the specialized domain of analysts and mathematicians, machine learning is becoming a skill needed by many.

*Machine Learning in Action* is a clearly written tutorial for developers. It avoids academic language and takes you straight to the techniques you'll use in your day-to-day work. Many (Python) examples present the core algorithms of statistical data processing, data analysis, and data visualization in code you can reuse. You'll understand the concepts and how they fit in with tactical tasks like classification, forecasting, recommendations, and higher-level features like summarization and simplification.

Readers need no prior experience with machine learning or statistical processing. Familiarity with Python is helpful.

## What's inside

- A no-nonsense introduction
- Examples showing common ML tasks
- Everyday data analysis
- Implementing classic algorithms like Apriori and Adaboos