Practical Data Science with R
Nina Zumel and John Mount
Foreword by Jim Porzak
  • March 2014
  • ISBN 9781617291562
  • 416 pages
  • printed in black & white

A unique and important addition to any data scientist’s library.

From the Foreword by Jim Porzak, Cofounder Bay Area R Users Group


Practical Data Science with R, Second Edition is now available in the Manning Early Access Program. An eBook of this older edition is included at no additional cost when you buy the revised edition!

You may still purchase Practical Data Science with R (First Edition) using the Buy options on this page.

Practical Data Science with R lives up to its name. It explains basic principles without the theoretical mumbo-jumbo and jumps right to the real use cases you'll face as you collect, curate, and analyze the data crucial to the success of your business. You'll apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support.

About the Technology

Business analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science tasks without a lot of academic theory or advanced mathematics.

About the book

Practical Data Science with R shows you how to apply the R programming language and useful statistical techniques to everyday business situations. Using examples from marketing, business intelligence, and decision support, it shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels.

Table of Contents detailed table of contents




about this book

about the cover illustration


Part 1 Introduction to data science

1. The data science process

1.1. The roles in a data science project

1.2. Stages of a data science project

1.3. Setting expectations

1.4. Summary

2. Loading data into R

2.1. Working with data from files

2.2. Working with relational databases

2.3. Summary

3. Exploring data

3.1. Using summary statistics to spot problems

3.2. Spotting problems using graphics and visualization

3.3. Summary

4. Managing data

4.1. Cleaning data

4.2. Sampling for modeling and validation

4.3. Summary

Part 2 Modeling methods

5. Choosing and evaluating models

5.1. Mapping problems to machine learning tasks

5.2. Evaluating models

5.3. Validating models

5.4. Summary

6. Memorization methods

6.1. KDD and KDD Cup 2009

6.2. Building single-variable models

6.3. Building models using many variables

6.4. Summary

7. Linear and logistic regression

7.1. Using linear regression

7.2. Using logistic regression

7.3. Summary

8. Unsupervised methods

8.1. Cluster analysis

8.2. Association rules

8.3. Summary

9. Exploring advanced methods

9.1. Using bagging and random forests to reduce training variance

9.2. Using generalized additive models (GAMs) to learn non-monotone relationships

9.3. Using kernel methods to increase data separation

9.4. Using SVMs to model complicated decision boundaries

9.5. Summary

Part 3 Delivering results

10. Documentation and deployment

10.1. The buzz dataset

10.2. Using knitr to produce milestone documentation

10.3. Using comments and version control for running documentation

10.4. Deploying models

10.5. Summary

11. Producing effective presentations

11.1. Presenting your results to the project sponsor

11.2. Presenting your model to end users

11.3. Presenting your work to other data scientists

11.4. Summary

Appendix A: Working with R and other tools

Appendix B: Important statistical concepts

Appendix C: More tools and ideas worth exploring


What's inside

  • Data science for the business professional
  • Statistical analysis using the R language
  • Project lifecycle, from planning to delivery
  • Numerous instantly familiar use cases
  • Keys to effective data presentations

About the reader

This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed.

About the authors

Nina Zumel and John Mount are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks