A detailed, well explained book to the real life practice of data science techniques.
GET MORE WITH MANNING
An eBook copy of the previous edition, Practical Data Science with R (First Edition), is included at no additional cost. It will be automatically added to your Manning Bookshelf within 24 hours of purchase.
Practical Data Science with R, Second Edition takes a practice-oriented approach to explaining basic principles in the ever-expanding field of data science. You’ll jump right to real-world use cases as you apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support.
Numerous updates in this brand new edition include: an introduction to the vtreat data preparation tool, a section on model explanation, and additional modeling techniques such as boosting and regularized regression!
Part 1: Introduction to Data Science
1. The data science process
1.1. The roles in a data science project
1.1.1. Project roles
1.2. Stages of a data science project
1.2.1. Defining the goal
1.2.2. Data collection and management
1.2.4. Model evaluation and critique
1.2.5. Presentation and documentation
1.2.6. Model deployment and maintenance
1.3. Setting expectations
1.3.1. Determining lower bounds on model performance
2. Starting with R and data
2.1. Starting with R
2.1.1. Installing R
2.1.2. R programming
2.2. Working with data from files
2.2.1. Working with well-structured data from files or URLs
2.2.2. Using R with less-structured data
2.3. Working with relational databases
2.3.1. A production-size example
3. Exploring data
3.1. Using summary statistics to spot problems
3.1.1. Typical problems revealed by data summaries
3.2. Spotting problems using graphics and visualization
3.2.1. Visually checking distributions for a single variable
3.2.2. Visually checking relationships between two variables
4. Managing data
4.1. Cleaning data
4.1.1. Domain-specific data cleaning
4.1.2. Treating missing values (NAs)
vtreat package for automatically treating missing variables
4.2. Data transformations
4.2.2. Centering and scaling
4.2.3. Log transformations for skewed and wide distributions
4.3. Sampling for modeling and validation
4.3.1. Test and training splits
4.3.2. Creating a sample group column
4.3.3. Record grouping
4.3.4. Data provenance
5. Data engineering and data wrangling
Part 2: Modeling Methods and Machine Learning
6. Choosing and evaluating models
7. Starting with modeling
8. Linear and logistic regression
9. Unsupervised methods
10. Exploring advanced methods
Part 3: Delivering Results
11. Documentation and deployment
12. Producing effective presentations
Appendix A: Installing R and other tools
Appendix B: Important statistical concepts
Appendix C: More tools worth exploring
Appendix D: R glossary
About the TechnologyBusiness analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science and machine learning tasks.
About the bookThis invaluable addition to any data scientist’s library shows you how to apply the R programming language and useful statistical techniques to everyday business situations as well as how to effectively present results to audiences of all levels. To answer the ever-increasing demand for machine learning and analysis, this new edition boasts additional R tools, modeling techniques, and more.
- Data science and statistical analysis for the business professional
- Numerous instantly familiar real-world use cases
- Keys to effective data presentations
- Modeling and analysis techniques like boosting, regularized regression, and quadratic discriminant analysis
- Additional R tools including data.table and vtreat
- A new section on interpreting predictions of complicated models
About the readerWhile some familiarity with basic statistics and R is assumed, this book is accessible to readers with or without a background in data science.
About the authorNina Zumel and John Mount are co-founders of Win-Vector LLC, a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at win-vector.com.
Gives really good practical insight into the data science process, not just concentrating on the mechanical parts, but covering the business aspects also.
Kept me engaged and fascinated.
Extensive, detailed, and fast-paced.