Beyond Spreadsheets with R
A beginner's guide to R and RStudio
Dr. Jonathan Carroll
  • December 2018
  • ISBN 9781617294594
  • 352 pages
  • printed in black & white
pBook available Dec 13, 2018
ePub + Kindle available Dec 23, 2018

A useful guide to facilitate graduating from spreadsheets to more serious data wrangling with R.

John D. Lewis, DDN

Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You’ll build on simple programming techniques like loops and conditionals to create your own custom functions. You’ll come away with a toolkit of strategies for analyzing and visualizing data of all sorts using R and RStudio.

Table of Contents detailed table of contents

1 Introducing Data and the R Language

1.1 Data: What, Where, How?

1.1.1 What is Data?

1.1.2 Seeing the World as Data Sources

1.1.3 Data Munging?

1.1.4 What You Can Do With Well-Handled Data

1.1.5 Data as an Asset

1.1.6 Reproducible Research and Version Control

1.2 Introducing R

1.2.1 The Origins of R

1.2.2 What It Is and What It Isn’t

1.3 How R Works

1.4 Introducing RStudio

1.4.1 Working with R within RStudio

1.4.2 Built-in Packages (Data and Functions)

1.5 In-built Documentation

1.5.1 Vignettes

1.6 Try It Yourself

Terminology

Summary

2 Getting to Know R Data Types

2.1 Types of Data

2.1.1 Numbers

2.1.2 Text (Strings)

2.1.3 Categories (Factors)

2.1.4 Dates and Times

2.1.5 Logicals

2.1.6 Missing Values

2.2 Storing Values (Assigning)

2.2.1 Naming Data (Variables)

2.2.2 Unchanging Data

2.2.3 The Assignment Operators (<- vs =)

2.3 Specifying the Data Type

2.4 Telling R to Ignore Something

2.5 Try It Yourself

Terminology

Summary

3 I Want To Make New Data Values

3.1 Basic Mathematics

3.2 Operator Precedence

3.3 String Concatenation (Joining)

3.4 Comparisons

3.5 Automatic Conversion (Coercion)

3.6 Try It Yourself

Terminology

Summary

4 Understanding the Tools We’ll Use: Functions

4.1 Functions

4.1.1 Under the Hood

4.1.2 Function Template

4.1.3 Arguments

4.1.4 Multiple Arguments

4.1.5 Default Arguments

4.1.6 Argument Name Matching

4.1.7 Partial Matching

4.1.8 Scope

4.2 Packages

4.2.1 How Does R (Not) Know About This Function?

4.3 Messages, Warnings, and Errors, Oh My!

4.3.1 Creating Messages, Warnings, and Errors

4.3.2 How To Diagnose Them

4.4 Testing

4.5 Project: Generalizing a function

4.6 Try It Yourself

Terminology

Summary

5 Combining data values

5.1 Simple Collections

5.1.1 Coercion

5.1.2 Missing Values

5.1.3 Attributes

5.1.4 Names

5.2 Sequences

5.2.1 Vector Functions

5.2.2 Vector Math Operations

5.3 Matrices

5.3.1 Indexing

5.4 Lists

5.5 data.frames

5.6 Classes

5.6.1 The tibble class

5.6.2 Structures as Function Arguments

5.7 Try It Yourself

Terminology

Summary

6 Selecting data values

6.1 Text Processing

6.1.1 Text Matching

6.1.2 Substrings

6.1.3 Text Substitutions

6.1.4 Regular Expressions

6.2 Selecting Components from Structures

6.2.1 Vectors

6.2.2 Lists

6.2.3 Matrices

6.3 Replacing Values

6.4 data.frames and dplyr

6.4.1 dplyr Verbs

6.4.2 Non-Standard Evaluation

6.4.3 Pipes

6.4.4 Subsetting data.frame The Hard Way

6.5 Replacing NA

6.6 Selecting Conditionally

6.7 Summarising Values

6.8 A Worked Example: Excel vs R

6.9 Try It Yourself

6.9.1 Solutions—​no peeking

Terminology

Summary

7 Doing things with lots of data

7.1 Tidy Data Principles

7.1.1 The Working Directory

7.1.2 Stored Data Formats

7.1.3 Reading Data into R

7.1.4 Scraping Data

7.1.5 Inspecting Data

7.1.6 I Have Odd Values In My Data (Sentinel Values)

7.1.7 Converting to Tidy Data

7.2 Merging Data

7.3 Writing Data From R

7.4 Try It Yourself

Terminology

Summary

8 Doing things conditionally: Control structures

8.1 Looping

8.1.1 Vectorisation

8.1.2 Tidy repetition: Looping with purrr

8.1.3 for loops

8.2 Wider and Narrower Loop Scope

8.2.1 while loops

8.3 Conditional evaluation

8.3.1 if conditions

8.3.2 ifelse conditions

8.4 Try It Yourself

Terminology

Summary

9 Visualizing data: Plotting

9.1 Data Preparation

9.1.1 Tidy Data, Revisited

9.1.2 Importance of Data Types

9.2 ggplot2

9.2.1 General construction

9.2.2 Adding points

9.2.3 Style aesthetics

9.2.4 Adding lines

9.2.5 Adding bars

9.2.6 Other types of plots

9.2.7 Scales

9.2.8 Facetting

9.2.9 Additional options

9.3 Plots as Objects

9.4 Saving plots

9.5 Try It Yourself

Terminology

Summary

10 Doing more with your data with extensions

10.1 Writing Your Own Packages

10.1.1 Creating a Minimal Package

10.1.2 Documentation

10.2 Analysing Your Package

10.2.1 Unit Testing

10.2.2 Profiling

10.3 What To Do Next?

10.3.1 Regression

10.3.2 Clustering

10.3.3 Working With Maps

10.3.4 Interacting With APIs

10.3.5 Sharing Your Package

10.4 More Resources

Terminology

Summary

Appendixes

Appendix A: Installing R

Windows

Mac

Linux

From source

Appendix B: Installing RStudio

Installing RStudio

Packages used in this book

Appendix C: Graphics in base R

About the Technology

Spreadsheets are powerful tools for many tasks, but if you need to interpret, interrogate, and present data, they can feel like the wrong tools for the task. That’s when R programming is the way to go. The R programming language provides a comfortable environment to properly handle all types of data. And within the open source RStudio development suite, you have at your fingertips easy-to-use ways to simplify complex manipulations and create reproducible processes for analysis and reporting.

About the book

With Beyond Spreadsheets with R you’ll learn how to go from raw data to meaningful insights using R and RStudio. Each carefully crafted chapter covers a unique way to wrangle data, from understanding individual values to interacting with complex collections of data, including data you scrape from the web. You’ll build on simple programming techniques like loops and conditionals to create your own custom functions. You’ll come away with a toolkit of strategies for analyzing and visualizing data of all sorts.

What's inside

  • How to start programming with R and RStudio
  • Understanding and implementing important R structures and operators
  • Installing and working with R packages
  • Tidying, refining, and plotting your data

About the reader

If you’re comfortable writing formulas in Excel, you’re ready for this book.

About the author

Dr Jonathan Carroll is a data science consultant providing R programming services. He holds a PhD in theoretical physics.


placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks

An excellent book to help you understand how stored data can be used.

Hilde Van Gysel, Trebol Engineering

A great introduction to a data science programming language. Makes you want to learn more!

Jenice Tom, CVS Health

Handy to have when your data spreads beyond a spreadsheet.

Danil Mironov, Luxoft Poland