Data Wrangling with JavaScript
Ashley Davis
  • December 2018
  • ISBN 9781617294846
  • 432 pages
  • printed in black & white
ePub + Kindle available Dec 17, 2018

A thorough and comprehensive step-by-step guide to managing data with JavaScript.

Ethan Rive

Data Wrangling with JavaScript is hands-on guide that will teach you how to create a JavaScript-based data processing pipeline, handle common and exotic data, and master practical troubleshooting strategies.

Table of Contents detailed table of contents

1 Getting started: establishing your data pipeline

1.1 Why data wrangling?

1.2 What is data wrangling?

1.3 Why a book on JavaScript data wrangling?

1.4 What will you get out of this book?

1.5 Why use JavaScript for data wrangling?

1.6 Is JavaScript appropriate for data analysis?

1.7 Navigating the JavaScript ecosystem

1.8 Assembling your toolkit

1.9 Establishing your data pipeline

1.9.1 Setting the stage

1.9.2 The data wrangling process

1.9.3 Planning

1.9.4 Acquisition, storage and retrieval

1.9.5 Exploratory coding

1.9.6 Clean and prepare

1.9.7 Analysis

1.9.8 Visualization

1.9.9 Getting to production

Summary

2 Getting started with Node.js

2.1 Starting your toolkit

2.2 Building a simple reporting system

2.3 Getting the code and data

2.4 Installing Node.js

2.5 Working with Node.js

2.5.1 Creating a Node.js project

2.5.2 Creating a command line application

2.5.3 Creating a code library

2.5.4 Creating a simple web server

2.6 Asynchronous coding

2.6.1 Loading a single file

2.6.2 Loading multiple files

2.6.3 Error handling

2.6.4 Asynchronous coding with promises

2.6.5 Wrapping asynchronous operations in promises

2.6.6 Async coding with “async” and “await”

Summary

3 Acquisition, storage and retrieval

3.1 Building out your toolkit

3.2 Getting the code and data

3.3 The core data representation

3.3.1 The earthquakes web site

3.3.2 Data formats covered

3.3.3 Power and flexibility

3.4 Importing data

3.4.1 Loading data from text files

3.4.2 Loading data from a REST API

3.4.3 Parsing JSON text data

3.4.4 Parsing CSV text data

3.4.5 Importing data from databases

3.4.6 Importing data from MongoDB

3.4.7 Importing data from MySQL

3.5 Exporting data

3.5.1 We need some data to export!

3.5.2 Exporting data to text files

3.5.3 Exporting data to JSON text files

3.5.4 Exporting data to CSV text files

3.5.5 Exporting data to a database

3.5.6 Exporting data to MongoDB

3.5.7 Exporting data to MySQL

3.6 Building complete data conversions

3.7 Expanding the process

Summary

4 Working with unusual data

4.1 Getting the code and data

4.2 Importing custom data from text files

4.3 Importing data by scraping web pages

4.4 Working with binary data

4.4.1 Unpacking a custom binary file

4.4.2 Packing a custom binary file

4.4.3 Replacing JSON with BSON

Summary

5 Exploratory coding

5.1 Expanding your toolkit

5.2 Analyzing car accidents

5.3 Getting the code and data

5.4 Iteration and your feedback loop

5.5 A first pass at understanding our data

5.6 Working with a reduced data sample

5.7 Prototyping with Excel

5.8 Exploratory coding with Node.js

5.9 Exploratory coding in the browser

5.10 Putting it all together

Summary

6 Clean and prepare

6.1 Expanding our toolkit

6.2 Preparing the reef data

6.3 Getting the code and data

6.4 The need for data clean-up and preparation

6.5 Where does broken data come from?

6.6 How does data clean-up fit into the pipeline?

6.7 Identifying bad data

6.8 Kinds of problems

6.9 Responses to bad data

6.10 Techniques for fixing bad data

6.11 Cleaning our data set

6.11.1 Rewriting bad rows

6.11.2 Filtering rows of data

6.11.3 Filtering columns of data

6.12 Preparing our data for effective use

6.12.1 Aggregating rows of data

6.12.2 Combining data from different files using Globby

6.12.3 Splitting data into separate files

6.13 Building a data processing pipeline with Data-Forge

Summary

7 Dealing with huge data files

7.1 Expanding our toolkit

7.2 Fixing temperature data

7.3 Getting the code and data

7.4 When conventional data processing breaks down

7.5 The limits of Node.js

7.5.1 Incremental data processing

7.5.2 Incremental core data representation

7.5.3 Node.js file streams basics primer

7.5.4 Transforming huge CSV files

7.5.5 Transforming huge JSON files

7.5.6 Mix and match

Summary

8 Working with a mountain of data

8.1 Expanding our toolkit

8.2 Dealing with a mountain of data

8.3 Getting the code and data

8.4 Techniques for working with big data

8.5 More Node.js limitations

8.6 Divide and conquer

8.7 Working with large databases

8.7.1 Database setup

8.7.2 Opening a connection to the database

8.7.3 Moving large files to your database

8.7.4 Incremental processing with a database cursor

8.7.5 Incremental processing with data windows

8.7.6 Creating an index

8.7.7 Filtering using queries

8.7.8 Discarding data with projection

8.7.9 Sorting large data sets

8.8 Achieving better data throughput

8.8.1 Optimize your code

8.8.2 Optimize your algorithm

8.8.3 Processing data in parallel

Summary

9 Practical data analysis

9.1 Expanding your toolkit

9.2 Analyzing the weather data

9.3 Getting the code and data

9.4 Basic data summarization

9.4.1 Sum

9.4.2 Average

9.4.3 Standard deviation

9.5 Group and summarize

9.6 The frequency distribution of temperatures

9.7 Time series

9.7.1 Yearly average temperature

9.7.2 Rolling average

9.7.3 Rolling standard deviation

9.7.4 Linear regression

9.7.5 Comparing time series

9.7.6 Stacking time series operations

9.8 Understanding relationships

Summary

10 Browser-based visualization

10.1 Expanding your toolkit

10.2 Getting the code and data

10.3 Choosing a chart type

10.4 Line chart for New York City temperature

10.4.1 The most basic C3 line chart

10.4.2 Adding real data

10.4.3 Parsing the static CSV file

10.4.4 Adding years as the X axis

10.4.5 Creating a custom Node.js web-server

10.4.6 Adding another series to the chart

10.4.7 Adding a second Y axis to the chart

10.4.8 Rendering a time series chart

10.5 Other chart types with C3

10.5.1 Bar chart

10.5.2 Horizontal bar chart

10.5.3 Pie chart

10.5.4 Stacked bar chart

10.5.5 Scatter plot chart

10.6 Improving the look of our charts

10.7 Moving forward with your own projects

Summary

11 Server-side visualization

11.1 Expanding your toolkit

11.2 Getting the code and data

11.3 The headless browser

11.4 Using Nightmare for server-side visualization

11.4.1 Why Nightmare?

11.4.2 Nightmare and Electron

11.4.3 Our process: capturing visualizations with Nightmare

11.4.4 Prepare a visualization to render

11.4.5 Starting the web server

11.4.6 Procedurally start and stop the web server

11.4.7 Rendering the web page to an image

11.4.8 Before we move on…​

11.4.9 Capturing the full visualization

11.4.10 Feeding the chart with data

11.4.11 Multi-page reports

11.4.12 Debugging code in the headless browser

11.4.13 Making it work on a Linux server

11.5 You can do much more with a headless browser

11.5.1 Web scraping

11.5.2 Other uses

11.6 Summary

12 Live data

12.1 We need an early warning system

12.2 Getting the code and data

12.3 Dealing with live data

12.4 Building a system for monitoring air quality

12.5 Setup for development

12.6 Live streaming data

12.6.1 HTTP POST for infrequent data submission

12.6.2 Sockets for high frequency data submission

12.7 Refactor for configuration

12.8 Data capture

12.9 An event-based architecture

12.10 Code restructure for event handling

12.10.1 Triggering SMS alerts

12.10.2 Automatically generating a daily report

12.11 Live data processing

12.12 Live visualization

Summary

13 Advanced visualization with D3

13.1 Advanced visualization

13.2 Getting the code and data

13.3 Visualizing space junk

13.4 What is D3?

13.5 The D3 data pipeline

13.6 Basic setup

13.7 SVG crash course

13.7.1 SVG Circle

13.7.2 Styling

13.7.3 SVG Text

13.7.4 SVG group

13.8 Building visualizations with D3

13.8.1 Element state

13.8.2 Selecting elements

13.8.3 Manually adding elements to our visualization

13.8.4 Scaling to fit

13.8.5 Procedural generation the D3 way

13.8.6 Loading a data file

13.8.7 Color coding the space junk

13.8.8 Adding interactivity

13.8.9 Adding a year-by-year launch animation

Summary

14 Getting to production

14.1 Production concerns

14.2 Taking our early warning system to production

14.3 Deployment

14.4 Monitoring

14.5 Reliability

14.5.1 System longevity

14.5.2 Practice defensive programming

14.5.3 Data protection

14.5.4 Testing and automation

14.5.5 Handling unexpected errors

14.5.6 Designing for process restart

14.5.7 Dealing with an ever-growing database

14.6 Security

14.6.1 Authentication and authorization

14.6.2 Privacy and confidentiality

14.6.3 Secret configuration

14.7 Scaling

14.7.1 Measurement before optimization

14.7.2 Vertical scaling

14.7.3 Horizontal scaling

Summary

Appendixes

Appendix A: JavaScript cheat sheet

Appendix B: Data-Forge cheat sheet

Appendix C: Getting Started with Vagrant

About the Technology

Why not handle your data analysis in JavaScript? Modern libraries and data handling techniques mean you can collect, clean, process, store, visualize, and present web application data while enjoying the efficiency of a single-language pipeline and data-centric web applications that stay in JavaScript end to end.

About the book

Data Wrangling with JavaScript promotes JavaScript to the center of the data analysis stage! With this hands-on guide, you’ll create a JavaScript-based data processing pipeline, handle common and exotic data, and master practical troubleshooting strategies. You’ll also build interactive visualizations and deploy your apps to production. Each valuable chapter provides a new component for your reusable data wrangling toolkit.

What's inside

  • Establishing a data pipeline
  • Acquisition, storage, and retrieval
  • Handling unusual data sets
  • Cleaning and preparing raw data
  • Interactive visualizations with D3

About the reader

Written for intermediate JavaScript developers. No data analysis experience required.

About the author

Ashley Davis is a software developer, entrepreneur, author, and the creator of Data-Forge and Data-Forge Notebook, software for data transformation, analysis, and visualization in JavaScript.


placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks

Do you still think that you need R and Python skills to do data analysis? This mind-shifting book explains that JavaScript is enough!

Ubaldo Pescatore

Does a fantastic job detailing the wrangling process, the tools involved, and the issues and concerns to expect without ever leaving the JavaScript domain.

Alex Basile

Excellent real-world examples for full-stack JavaScript developers.

Sai Kota