Machine Learning in Action
Peter Harrington
  • April 2012
  • ISBN 9781617290183
  • 384 pages
  • printed in black & white

An approachable and useful book.

Alexandre Alves, Oracle Corporation

Machine Learning in Action is unique book that blends the foundational theories of machine learning with the practical realities of building tools for everyday data analysis. You'll use the flexible Python programming language to build programs that implement algorithms for data classification, forecasting, recommendations, and higher-level features like summarization and simplification.

About the book

A machine is said to learn when its performance improves with experience. Learning requires algorithms and programs that capture data and ferret out the interesting or useful patterns. Once the specialized domain of analysts and mathematicians, machine learning is becoming a skill needed by many.

Machine Learning in Action is a clearly written tutorial for developers. It avoids academic language and takes you straight to the techniques you'll use in your day-to-day work. Many (Python) examples present the core algorithms of statistical data processing, data analysis, and data visualization in code you can reuse. You'll understand the concepts and how they fit in with tactical tasks like classification, forecasting, recommendations, and higher-level features like summarization and simplification.

Readers need no prior experience with machine learning or statistical processing. Familiarity with Python is helpful.

Table of Contents detailed table of contents



about this book

about the author

about the cover illustration

Part 1 Classification

1. Machine learning basics

1.1. What is machine learning?

1.2. Key terminology

1.3. Key tasks of machine learning

1.4. How to choose the right algorithm

1.5. Steps in developing a machine learning application

1.6. Why Python?

1.7. Getting started with the NumPy library

1.8. Summary

2. Classifying with k-Nearest Neighbors

2.1. Classifying with distance measurements

2.2. Example: improving matches from a dating site with kNN

2.3. Example: a handwriting recognition system

2.4. Summary

3. Splitting datasets one feature at a time: decision trees

3.1. Tree construction

3.2. Plotting trees in Python with Matplotlib annotations

3.3. Testing and storing the classifier

3.4. Example: using decision trees to predict contact lens type

3.5. Summary

4. Classifying with probability theory: naïve Bayes

4.1. Classifying with Bayesian decision theory

4.2. Conditional probability

4.3. Classifying with conditional probabilities

4.4. Document classification with naïve Bayes

4.5. Classifying text with Python

4.6. Example: classifying spam email with naïve Bayes

4.7. Example: using naïve Bayes to reveal local attitudes from personal ads

4.8. Summary

5. Logistic regression

5.1. Classification with logistic regression and the sigmoid function: a tractable step function

5.2. Using optimization to find the best regression coefficients

5.3. Example: estimating horse fatalities from colic

5.4. Summary

6. Support vector machines

6.1. Separating data with the maximum margin

6.2. Finding the maximum margin

6.3. Efficient optimization with the SMO algorithm

6.4. Speeding up optimization with the full Platt SMO

6.5. Using kernels for more complex data

6.6. Example: revisiting handwriting classification

6.7. Summary

7. Improving classification with the AdaBoost meta-algorithm

7.1. Classifiers using multiple samples of the dataset

7.2. Train: improving the classifier by focusing on errors

7.3. Creating a weak learner with a decision stump

7.4. Implementing the full AdaBoost algorithm

7.5. Test: classifying with AdaBoost

7.6. Example: AdaBoost on a difficult dataset

7.7. Classification imbalance

7.8. Summary

Part 2 Forecasting numeric values with regression

8. Predicting numeric values: regression

8.1. Finding best-fit lines with linear regression

8.2. Locally weighted linear regression

8.3. Example: predicting the age of an abalone

8.4. Shrinking coefficients to understand our data

8.5. The bias/variance tradeoff

8.6. Example: forecasting the price of LEGO sets

8.7. Summary

9. Tree-based regression

9.1. Locally modeling complex data

9.2. Building trees with continuous and discrete features

9.3. Using CART for regression

9.4. Tree pruning

9.5. Model trees

9.6. Example: comparing tree methods to standard regression

9.7. Using Tkinter to create a GUI in Python

9.8. Summary

Part 3 Unsupervised learning

10. Grouping unlabeled items using k-means clustering

10.1. The k-means clustering algorithm

10.2. Improving cluster performance with postprocessing

10.3. Bisecting k-means

10.4. Example: clustering points on a map

10.5. Summary

11. Association analysis with the Apriori algorithm

11.1. Association analysis

11.2. The Apriori principle

11.3. Finding frequent itemsets with the Apriori algorithm

11.4. Mining association rules from frequent item sets

11.5. Example: uncovering patterns in congressional voting

11.6. Example: finding similar features in poisonous mushrooms

11.7. Summary

12. Efficiently finding frequent itemsets with FP-growth

12.1. FP-trees: an efficient way to encode a dataset

12.2. Build an FP-tree

12.3. Mining frequent items from an FP-tree

12.4. Example: finding co-occurring words in a Twitter feed

12.5. Example: mining a clickstream from a news site

12.6. Summary

Part 4 Additional tools

13. Using principal component analysis to simplify data

13.1. Dimensionality reduction techniques

13.2. Principal component analysis

13.3. Example: using PCA to reduce the dimensionality of semiconductor manufacturing data

13.4. Summary

14. Simplifying data with the singular value decomposition

14.1. .1 Applications of the SVD

14.2. Matrix factorization

14.3. SVD in Python

14.4. Collaborative filtering–based recommendation engines

14.5. Example: a restaurant dish recommendation engine

14.6. Example: image compression with the SVD

14.7. Summary

15. Big data and MapReduce

15.1. MapReduce: a framework for distributed computing

15.2. Hadoop Streaming

15.3. Running Hadoop jobs on Amazon Web Services

15.4. Machine learning in MapReduce

15.5. Using mrjob to automate MapReduce in Python

15.6. Example: the Pegasos algorithm for distributed SVMs

15.7. Do you really need MapReduce?

15.8. Summary

Appendix A: Getting started with Python

Appendix B: Linear algebra

Appendix C: Probability refresher

Appendix D: Resources


© 2014 Manning Publications Co.

What's inside

  • A no-nonsense introduction
  • Examples showing common ML tasks
  • Everyday data analysis
  • Implementing classic algorithms like Apriori and Adaboos

About the reader

Readers need no prior experience with machine learning or statistical processing. Familiarity with Python is helpful.

About the author

Peter Harrington is a professional developer and data scientist. He holds five US patents and his work has been published in numerous academic journals.

placing your order...

Don't refresh or navigate away from the page.
print book $26.99 $44.99 pBook + eBook + liveBook
Additional shipping charges may apply
Prints and ships within 3-5 days
Machine Learning in Action (print book) added to cart
continue shopping
go to cart

eBook $28.79 $35.99 3 formats + liveBook
Machine Learning in Action (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks