R in Action, Third Edition
Robert I. Kabacoff
  • MEAP began July 2019
  • Publication in Early 2021 (estimated)
  • ISBN 9781617296055
  • 625 pages (estimated)
  • printed in black & white
free previous edition eBook included
An eBook copy of the previous edition of this book is included at no additional cost.

If you are looking for the definitive guide to get you up and running with R, this is the book you need!

Jean-François Morin
Built specifically for statistical computing and graphics, the R language, along with its amazing collection of libraries and tools, is one of the most powerful tools you can use to tackle data analysis for business, research, and other data-intensive domains. R in Action, Third Edition takes you hands-on with R, focusing on practical solutions and real-world applications that are most relevant to business developers. This revised and expanded third edition, covers the new tidyverse approach to data analysis and R’s state-of-the-art graphing capabilities with the ggplot2 package.

About the Technology

The R language is the most powerful platform you can choose for modern data analysis. Free and open source, R’s community has created thousands of modules to tackle challenges from data-crunching to presentation. R’s graphical capabilities are also state-of-the-art, with a comprehensive and powerful feature set available for data visualization. R runs on all major operating systems and is used by businesses, researchers, and organizations worldwide.

About the book

R in Action, Third Edition teaches you to use the R language, including the popular tidyverse packages, through hands-on examples relevant to scientific, technical, and business developers. Focusing on practical solutions to real-world data challenges, R expert Rob Kabacoff takes you on a crash course in statistics, from dealing with messy and incomplete data to creating stunning visualisations. In this revised and expanded third edition, new coverage has been added for R’s state-of-the-art graphing capabilities with the ggplot2 package.
Table of Contents detailed table of contents

Part 1: Getting Started

1 Introduction to R

1.1 Why use R?

1.2 Obtaining and installing R

1.3 Working with R

1.3.1 Getting started

1.3.2 Using RStudio

1.3.3 Getting help

1.3.4 The workspace

1.3.5 Projects

1.4 Packages

1.4.1 What are packages?

1.4.2 Installing a package

1.4.3 Loading a package

1.4.4 Learning about a package

1.5 Using output as input: reusing results

1.6 Working with large datasets

1.7 Working through an example

1.8 Summary

2 Creating a dataset

2.1 Understanding datasets

2.2 Data structures

2.2.1 Vectors

2.2.2 Matrices

2.2.3 Arrays

2.2.4 Data frames

2.2.5 Factors

2.2.6 Lists

2.2.7 Tibbles

2.3 Data input

2.3.1 Entering data from the keyboard

2.3.2 Importing data from a delimited text file

2.3.3 Importing data from Excel

2.3.4 Importing data from XML

2.3.5 Importing data from the Web

2.3.6 Importing data from SPSS

2.3.7 Importing data from SAS

2.3.8 Importing data from Stata

2.3.9 Accessing database management systems (DBMSs)

2.3.10 Importing data via Stat/Transfer

2.4 Annotating datasets

2.4.1 Variable labels

2.4.2 Value labels

2.5 Useful functions for working with data objects

2.6 Summary

3 Basic data management

3.1 A working example

3.2 Creating new variables

3.3 Recoding variables

3.4 Renaming variables

3.5 Missing values

3.5.1 Recoding values to missing

3.5.2 Excluding missing values from analyses

3.6 Date values

3.6.1 Converting dates to character variables

3.6.2 Going further

3.7 Type conversions

3.8 Sorting data

3.9 Merging datasets

3.9.1 Adding columns to a data frame

3.9.2 Adding rows to a data frame

3.10 Subsetting datasets

3.10.1 Selecting variables

3.10.2 Dropping variables

3.10.3 Selecting observations

3.10.4 The subset() function

3.10.5 Random samples

3.11 Using dplyr to manipulate data frames

3.11.1 Basic dplyr functions

3.11.2 Using pipe operators to chain statements

3.12 Using SQL statements to manipulate data frames

3.13 Summary

4 Getting started with graphs

4.1 Creating a graph with ggplot2

4.1.1 ggplot

4.1.2 Geoms

4.1.3 Grouping

4.1.4 Scales

4.1.5 Facets

4.1.6 Labels

4.1.7 Themes

4.2 ggplot2 details

4.2.1 Placing the data and mapping options

4.2.2 Graphs as objects

4.2.3 Exporting graphs

4.2.4 Common mistakes

4.3 Summary

5 Advanced data management

5.1 A data-management challenge

5.2 Numerical and character functions

5.2.1 Mathematical functions

5.2.2 Statistical functions

5.2.3 Probability functions

5.2.4 Character functions

5.2.5 Other useful functions

5.2.6 Applying functions to matrices and data frames

5.3 A solution for the data-management challenge

5.4 Control flow

5.4.1 Repetition and looping

5.4.2 Conditional execution

5.5 User-written functions

5.6 Reshaping data

5.6.1 Transpose

5.6.2 Converting between wide to long dataset formats

5.7 Aggregating data

5.8 Summary

Part 2: Basic Methods

6 Basic Graphs

6.1 Bar charts

6.1.1 Simple bar charts

6.1.2 Stacked, grouped and filled bar charts

6.1.3 Mean bar charts

6.1.4 Tweaking bar charts

6.2 Pie charts

6.3 Tree maps

6.4 Histograms

6.5 Kernel density plots

6.6 Box plots

6.6.1 Using parallel box plots to compare groups

6.6.2 Violin plots

6.7 Dot plots

6.8 Summary

7 Basic Statistics

7.1 Descriptive statistics

7.1.1 A menagerie of methods

7.1.2 Even more methods

7.1.3 Descriptive statistics by group

7.1.4 Summarizing data interactively with dplyr

7.1.5 Visualizing results

7.2 Frequency and contingency tables

7.2.1 Generating frequency tables

7.2.2 Tests of independence

7.2.3 Measures of association

7.2.4 Visualizing results

7.3 Correlations

7.3.1 Types of correlations

7.3.2 Testing correlations for significance

7.3.3 Visualizing correlations

7.4 T-tests

7.4.1 Independent t-test

7.4.2 Dependent t-test

7.4.3 When there are more than two groups

7.5 Nonparametric tests of group differences

7.5.1 Comparing two groups

7.5.2 Comparing more than two groups

7.6 Visualizing group differences

7.7 Summary

Part 3: Intermediate Methods

8 Regression

8.1 The many faces of regression

8.1.1 Scenarios for using OLS regression

8.1.2 What you need to know

8.2 OLS regression

8.2.1 Fitting regression models with lm()

8.2.2 Simple linear regression

8.2.3 Polynomial regression

8.2.4 Multiple linear regression

8.2.5 Multiple linear regression with interactions

8.3 Regression diagnostics

8.3.1 A typical approach

8.3.2 An enhanced approach

8.3.3 Multicollinearity

8.4 Unusual observations

8.4.1 Outliers

8.4.2 High-leverage points

8.4.3 Influential observations

8.5 Corrective measures

8.5.1 Deleting observations

8.5.2 Transforming variables

8.5.3 Adding or deleting variables

8.5.4 Trying a different approach

8.6 Selecting the “best” regression model

8.6.1 Comparing models

8.6.2 Variable selection

8.7 Taking the analysis further

8.7.1 Cross-validation

8.7.2 Relative importance

8.8 Summary

9 Analysis of Variance

9.1 A crash course on terminology

9.2 Fitting ANOVA models

9.2.1 The aov() function

9.2.2 The order of formula terms

9.3 One-way ANOVA

9.3.1 Multiple comparisons

9.3.2 Assessing test assumptions

9.4 One-way ANCOVA

9.4.1 Assessing test assumptions

9.4.2 Visualizing the results

9.5 Two-way factorial ANOVA

9.6 Repeated measures ANOVA

9.7 Multivariate analysis of variance (MANOVA)

9.7.1 Assessing test assumptions

9.7.2 Robust MANOVA

9.8 ANOVA as regression

9.9 Summary

10 Power Analysis

10.1 A quick review of hypothesis testing

10.2 Implementing power analysis with the pwr package

10.2.1 t-tests

10.2.2 ANOVA

10.2.3 Correlations

10.2.4 Linear models

10.2.5 Tests of proportions

10.2.6 Chi-square tests

10.2.7 Choosing an appropriate effect size in novel situations

10.3 Creating power analysis plots

10.4 Other packages

10.5 Summary

11 Intermediate graphs

11.1 Scatter plots

11.1.1 Scatter-plot matrices

11.1.2 High-density scatter plots

11.1.3 3D scatter plots

11.1.4 Spinning 3D scatter plots

11.1.5 Bubble plots

11.2 Line charts

11.3 Corrgrams

11.4 Mosaic plots

11.5 Summary

12 Resampling statistics and bootstrapping

12.1 Permutation tests

12.2 Permutation tests with the coin package

12.2.1 Independent two-sample and k-sample tests

12.2.2 Independence in contingency tables

12.2.3 Independence between numeric variables

12.2.4 Dependent two-sample and k-sample tests

12.2.5 Going further

12.3 Permutation tests with the lmPerm package

12.3.1 Simple and polynomial regression

12.3.2 Multiple regression

12.3.3 One-way ANOVA and ANCOVA

12.3.4 Two-way ANOVA

12.4 Additional comments on permutation tests

12.5 Bootstrapping

12.6 Bootstrapping with the boot package

12.6.1 Bootstrapping a single statistic

12.6.2 Bootstrapping several statistics

12.7 Summary

Part 4: Advanced Methods

13 Generalized Linear Models

13.1 Generalized linear models and the glm() function

13.1.1 The glm() function

13.1.2 Supporting functions

13.1.3 Model fit and regression diagnostics

13.2 Logistic regression

13.2.1 Interpreting the model parameters

13.2.2 Assessing the impact of predictors on the probability of an outcome

13.2.3 Overdispersion

13.2.4 Extensions

13.3 Poisson regression

13.3.1 Interpreting the model parameters

13.3.2 Overdispersion

13.3.3 Extensions

13.4 Summary

14 Principal Components and Factor Analysis

15 Time Series

16 Clustering

17 Classification

18 Advanced methods for missing data

Part 5: Expanding Your Skills

19 Advanced Graphics with ggplot2

20 Advanced R Programming

21 Creating a Package


Appendix A: Version control with git

Appendix B: Customizing the startup environment

Appendix C: Exporting data from R

Appendix D: Matrix Algebra in R

Appendix E: Packages used in this book

Appendix F: Working with large datasets

Appendix G: Updating an R installation

What's inside

  • A complete learning resource for R and tidyverse
  • Clean, manage, and analyze data with R
  • Use the ggplot2 package for graphs and visualizations
  • Techniques for debugging programs and creating packages

About the reader

This book is designed for readers who need to solve practical data analysis problems using the R language and tools. Some background in mathematics and statistics is helpful, but no prior experience with R or computer programming is required

About the author

Dr. Robert I Kabacoff is a professor of quantitative analytics at Wesleyan University, and a seasoned data scientist with more than 20 years of experience providing statistical programming and data analytic support in business, healthcare, and government settings. He has taught both undergraduate and graduate courses in data analysis and statistical programming and manages the Quick-R website at statmethods.net and the R for Data Visualization website at rkabacoff.github.io/datavis.

