Data Munging with R
Dr. Jonathan Carroll
  • MEAP began June 2017
  • Publication in Spring 2018 (estimated)
  • ISBN 9781617294594
  • 375 pages (estimated)
  • printed in black & white

Data Munging with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. Whether you already have some programming experience or you're just a spreadsheet whiz looking for a more powerful data manipulation tool, this book will help you get started. You'll discover the ins and outs of using the data-oriented R programming language and its many task-specific packages. With dozens of practical examples to follow, learn to fill in missing values, make predictions, and visualize data as graphs. By the time you're done, you'll be a master munger, with a robust, reproducible workflow and the skills to use data to strengthen your conclusions!

Table of Contents detailed table of contents

1. Introducing Data and the R Language

1.1. Data — What, Where, How?

1.1.1. What is Data?

1.1.2. Seeing the World as Data Sources

1.1.3. What You Can Do With Well-Handled Data

1.1.4. Data as an Asset

1.1.5. Reproducible Research and Version Control

1.2. Introducing R

1.2.1. The Origins of R

1.2.2. What It Is and What It Isn't

1.3. How R Works

1.4. Introducing RStudio

1.4.1. Working with R within RStudio

1.4.2. Inbuilt Packages (Data and Functions)

1.5. In-built Documentation

1.5.1. Vignettes

1.6. Try It Yourself

1.7. Summary

2. Getting to Know R Data Types

2.1. Types of Data

2.1.1. Numbers

2.1.2. Text (Strings)

2.1.3. Categories (Factors)

2.1.4. Dates and Times

2.1.5. Logicals

2.1.6. Missing Values

2.2. Storing Values (Assigning)

2.2.1. Naming Data (Variables)

2.2.2. Unchanging Data

2.2.3. The Assigmnent Operators (<- vs =)

2.3. Specifying the Data Type

2.4. Telling R to Ignore Something

2.5. Summary

3. I Want To Make New Data Values

3.1. Basic Mathematics

3.2. Operator Precedence

3.3. String Concatenation (Joining)

3.4. Comparisons

3.5. Automatic Conversion (Coercion)

3.6. Try It Yourself

3.7. Summary

4. Understanding the Tools We’ll Use — Functions

4.1. Functions

4.1.1. Under the Hood

4.1.2. Function Template

4.1.3. Arguments

4.1.4. Multiple Arguments

4.1.5. Default Arguments

4.1.6. Argument Name Matching

4.1.7. Partial Matching

4.1.8. Scope

4.2. Packages

4.2.1. How Does R (Not?) Know About This Function?

4.3. Messages, Warnings, and Errors, Oh My!

4.3.1. How To Diagnose Them

4.4. Testing

4.5. Project 4.1 — Generalise a Function

4.6. Try It Yourself

4.7. Summary

5. I Want To Combine Data Values

5.1. Simple Collections

5.1.1. Coercion

5.1.2. Missing Values

5.1.3. Attributes

5.1.4. Names

5.2. Sequences

5.2.1. Vector Math Operations

5.3. Matrices

5.3.1. Indexing

5.4. Lists

5.5. data.frame 's

5.6. Classes

5.7. tibble

5.7.1. Structures as Function Arguments

5.8. Try It Yourself

5.9. Summary

6. I Want To Do Something With Lots of Data

7. I just want part of my data (selections)

8. I want to do something repeatedly (control structures)

9. I want to make a prediction from my data (prediction)

10. I want to visualise my data (plotting)

11. I want to do more with my data (extensions)

Appendixes

Appendix A: Installing R

Appendix B: Installing RStudio

About the Technology

Data munging - manipulating raw data - is a cornerstone of data science. Munging techniques include cleaning, sorting, parsing, filtering, and pretty much anything else you need to make data truly useful. The R language, with its intuitive RStudio environment, is the perfect data munging tool. R provides a rich ecosystem of community-driven packages and utilities for finance and accounting, marketing, web-scraping, and all manner of data science tasks. And getting started with R is so easy, even managers have been known to use it for ad hoc data analysis!

What's inside

  • Learning to program
  • Critical R structures and operators
  • Handling R packages
  • Tidying and refining your data
  • Plotting your data

About the reader

If you have beginner programming skills or you're comfortable with writing spreadsheet formulas, you have everything you need to get the most out of this book.

About the author

Dr Jonathan Carroll holds a PhD from the University of Adelaide in theoretical astrophysics, currently working in statistical modelling. He contributes packages to R, is a frequent contributor of answers on StackOverflow and an avid science communicator.


Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
Buy
MEAP combo $49.99 pBook + eBook + liveBook
MEAP eBook $39.99 pdf + ePub + kindle + liveBook

FREE domestic shipping on three or more pBooks