Data Wrangling with JavaScript
Ashley Davis
  • MEAP began January 2018
  • Publication in November 2018 (estimated)
  • ISBN 9781617294846
  • 430 pages (estimated)
  • printed in black & white

It's as if the author was really showing us every step. Reading seems like doing.

David Krief
If you're a JavaScript developer, you already know that working with data is a big deal. Why let the Python and R coders get all the glory? JavaScript isn't just good at data visualization, you can move your entire data wrangling pipeline to JavaScript and work more effectively. Data Wrangling with JavaScript teaches you core data munging techniques in JavaScript, along with many libraries and tools that will make your data tasks even easier.
Table of Contents detailed table of contents

1 Getting started: establishing your data pipeline

1.1 Why data wrangling?

1.2 What is data wrangling?

1.3 Why a book on JavaScript data wrangling?

1.4 What will you get out of this book?

1.5 Why use JavaScript for data wrangling?

1.6 Is JavaScript appropriate for data analysis?

1.7 Navigating the JavaScript ecosystem

1.8 Assembling your toolkit

1.9 Establishing your data pipeline

1.9.1 Setting the stage

1.9.2 The data wrangling process

1.9.3 Planning

1.9.4 Acquisition, storage and retrieval

1.9.5 Exploratory coding

1.9.6 Clean and prepare

1.9.7 Analysis

1.9.8 Visualization

1.9.9 Getting to production

1.10 Summary

2 Getting started with Node.js

2.1 Starting your toolkit

2.2 Building a simple reporting system

2.3 Getting the code and data

2.4 Installing Node.js

2.5 Working with Node.js

2.5.1 Creating a Node.js project

2.5.2 Creating a command line application

2.5.3 Creating a code library

2.5.4 Creating a simple web server

2.6 Asynchronous coding

2.6.1 Loading a single file

2.6.2 Loading multiple files

2.6.3 Error handling

2.6.4 Asynchronous coding with promises

2.6.5 Wrapping asynchronous operations in promises

2.6.6 Async coding with �async� and �await�

2.7 Summary

3 Acquisition, storage and retrieval

3.1 Building out your toolkit

3.2 Getting the code and data

3.3 The core data representation

3.3.1 The earthquakes web site

3.3.2 Data formats covered

3.3.3 Power and flexibility

3.4 Importing data

3.4.1 Loading data from text files

3.4.2 Loading data from a REST API

3.4.3 Parsing JSON text data

3.4.4 Parsing CSV text data

3.4.5 Importing data from databases

3.4.6 Importing data from MongoDB

3.4.7 Importing data from MySQL

3.5 Exporting data

3.5.1 We need some data to export!

3.5.2 Exporting data to text files

3.5.3 Exporting data to JSON text files

3.5.4 Exporting data to CSV text files

3.5.5 Exporting data to a database

3.5.6 Exporting data to MongoDB

3.5.7 Exporting data to MySQL

3.6 Building complete data conversions

3.7 Expanding the process

3.8 Summary

4 Working with unusual data

4.1 Getting the code and data

4.2 Importing custom data from text files

4.3 Importing data by scraping web pages

4.4 Working with binary data

4.4.1 Unpacking a custom binary file

4.4.2 Packing a custom binary file

4.4.3 Replacing JSON with BSON

4.5 Summary

5 Exploratory coding

5.1 Expanding your toolkit

5.2 Analyzing car accidents

5.3 Getting the code and data

5.4 Iteration and your feedback loop

5.5 A first pass at understanding our data

5.6 Working with a reduced data sample

5.7 Prototyping with Excel

5.8 Exploratory coding with Node.js

5.9 Exploratory coding in the browser

5.10 Putting it all together

5.11 Summary

6 Clean and prepare

6.1 Expanding our toolkit

6.2 Preparing the reef data

6.3 Getting the code and data

6.4 The need for data clean-up and preparation

6.5 Where does broken data come from?

6.6 How does data clean-up fit into the pipeline?

6.7 Identifying bad data

6.8 Kinds of problems

6.9 Responses to bad data

6.10 Techniques for fixing bad data

6.11 Cleaning our data set

6.11.1 Rewriting bad rows

6.11.2 Filtering rows of data

6.11.3 Filtering columns of data

6.12 Preparing our data for effective use

6.12.1 Aggregating rows of data

6.12.2 Combining data from different files using Globby

6.12.3 Splitting data into separate files

6.13 Building a data processing pipeline with Data-Forge

6.14 Summary

7 Dealing with huge data files

7.1 Expanding our toolkit

7.2 Fixing temperature data

7.3 Getting the code and data

7.4 When conventional data processing breaks down

7.5 The limits of Node.js

7.6 Incremental data processing

7.6.1 Incremental core data representation

7.6.2 Node.js file streams basics primer

7.6.3 Transforming huge CSV files

7.6.4 Transforming huge JSON files

7.6.5 Mix and match

7.7 Summary

8 Working with a mountain of data

8.1 Expanding our toolkit

8.2 Dealing with a mountain of data

8.3 Getting the code and data

8.4 Techniques for working with big data

8.5 More Node.js limitations

8.6 Divide and conquer

8.7 Working with large databases

8.7.1 Database setup

8.7.2 Opening a connection to the database

8.7.3 Moving large files to your database

8.7.4 Incremental processing with a database cursor

8.7.5 Incremental processing with data windows

8.7.6 Creating an index

8.7.7 Filtering using queries

8.7.8 Discarding data with projection

8.7.9 Sorting large data sets

8.8 Achieving better data throughput

8.8.1 Optimize your code

8.8.2 Optimize your algorithm

8.8.3 Processing data in parallel

8.9 Summary

9 Practical data analysis

9.1 Expanding your toolkit

9.2 Analyzing the weather data

9.3 Getting the code and data

9.4 Basic data summarization

9.4.1 Sum

9.4.2 Average

9.4.3 Standard deviation

9.5 Group and summarize

9.6 The frequency distribution of temperatures

9.7 Time series

9.7.1 Yearly average temperature

9.7.2 Rolling average

9.7.3 Rolling standard deviation

9.7.4 Linear regression

9.7.5 Comparing time series

9.7.6 Stacking time series operations

9.8 Understanding relationships

9.9 Summary

10 Browser-based visualization

10.1 Expanding your toolkit

10.2 Getting the code and data

10.3 Choosing a chart type

10.4 Line chart for New York City temperature

10.4.1 The most basic C3 line chart

10.4.2 Adding real data

10.4.3 Parsing the static CSV file

10.4.4 Adding years as the X axis

10.4.5 Creating a custom Node.js web-server

10.4.6 Adding another series to the chart

10.4.7 Adding a second Y axis to the chart

10.4.8 Rendering a time series chart

10.5 Other chart types with C3

10.5.1 Bar chart

10.5.2 Horizontal bar chart

10.5.3 Pie chart

10.5.4 Stacked bar chart

10.5.5 Scatter plot chart

10.6 Improving the look of our charts

10.7 Moving forward with your own projects

10.8 Summary

11 Server-side visualization

11.1 Expanding your toolkit

11.2 Getting the code and data

11.3 The headless browser

11.4 Using Nightmare for server-side visualization

11.4.1 Why Nightmare?

11.4.2 Nightmare and Electron

11.4.3 Our process: capturing visualizations with Nightmare

11.4.4 Prepare a visualization to render

11.4.5 Starting the web server

11.4.6 Procedurally start and stop the web server

11.4.7 Rendering the web page to an image

11.4.8 Before we move on�

11.4.9 Capturing the full visualization

11.4.10 Feeding the chart with data

11.4.11 Multi-page reports

11.4.12 Debugging code in the headless browser

11.4.13 Making it work on a Linux server

11.5 There’s much more you can do with a headless browser

11.6 Summary

12 Live data

12.1 We need an early warning system

12.2 Getting the code and data

12.3 Dealing with live data

12.4 Building a system for monitoring air quality

12.5 Setup for development

12.6 Live streaming data

12.6.1 HTTP POST for infrequent data submission

12.6.2 Sockets for high frequency data submission

12.7 Refactor for configuration

12.8 Data capture

12.9 An event-based architecture

12.10 Code restructure for event handling

12.10.1 Triggering SMS alerts

12.10.2 Automatically generating a daily report

12.11 Live data processing

12.12 Live visualization

12.13 Summary

13 Advanced visualization with D3

13.1 Advanced visualization

13.2 Getting the code and data

13.3 Visualizing space junk

13.4 What is D3?

13.5 The D3 data pipeline

13.6 Basic setup

13.7 SVG crash course

13.7.1 SVG Circle

13.7.2 Styling

13.7.3 SVG Text

13.7.4 SVG group

13.8 Building visualizations with D3

13.8.1 Element state

13.8.2 Selecting elements

13.8.3 Manually adding elements to our visualization

13.8.4 Scaling to fit

13.8.5 Procedural generation the D3 way

13.8.6 Loading a data file

13.8.7 Color coding the space junk

13.8.8 Adding interactivity

13.8.9 Adding a year-by-year launch animation

13.9 Summary

14 Getting to production

14.1 Production concerns

14.2 Taking our early warning system to production

14.3 Deployment

14.4 Monitoring

14.5 Reliability

14.5.1 System longevity

14.5.2 Practice defensive programming

14.5.3 Data protection

14.5.4 Testing and automation

14.5.5 Handling unexpected errors

14.5.6 Designing for process restart

14.5.7 Dealing with an ever-growing database

14.6 Security

14.6.1 Authentication and authorization

14.6.2 Privacy and confidentiality

14.6.3 Secret configuration

14.7 Scaling

14.7.1 Measurement before optimization

14.7.2 Vertical scaling

14.7.3 Horizontal scaling

14.8 Summary


Appendix A: JavaScript cheat sheet

Appendix B: Data-Forge cheat sheet

Appendix C: Data wrangling toolset

Appendix D: Getting Started with Vagrant

About the Technology

JavaScript is capable of handling most common data collection, cleaning, analysis and presentation tasks just as easily as R or Python. With a growing ecosystem of tools and libraries available, and the flexibility to run on many platforms (web, desktop and mobile), JavaScript is a terrific all-round environment for all your data wrangling needs!

About the book

Data Wrangling with JavaScript teaches you the art of collecting, managing, cleaning, and analyzing data with JavaScript. In this practical book written with existing JavaScript developers in mind, you'll start by setting up your JavaScript and Node.js-based data wrangling pipeline. Then, you'll systematically work through core techniques for acquiring, storing, and retrieving data of all sorts, ranging from text and .csv files to databases and REST APIs. You'll explore JavaScript-based data tools like Globby and Data-Forge, manipulate huge datasets with Node.js, and deal with strange data types including web scraping and custom binary files. Master data wrangler Ashley Davis guides you through the most important data analysis skills and teaches you how to explore, understand and visualize your data. Because you'll be using real-world data at each step of the process, you'll be confident that you can apply your new skills immediately.

What's inside

  • Establishing a data pipeline
  • Acquisition, storage, and retrieval
  • How to handle unusual data sets
  • Cleaning and preparing raw data
  • Visualizing your results

About the reader

Written for developers with experience using JavaScript. No prior knowledge of data analytics is needed.

About the author

Ashley Davis is a software developer, entrepreneur, writer, and a stock trader. He is the creator of Data-Forge, a data transformation and analysis toolkit for JavaScript inspired by Pandas and Microsoft LINQ.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.

FREE domestic shipping on three or more pBooks

Not only did I learn the data pipeline around data analysis, but also details on concepts and tools I've either heard of or thought I knew. This book is not only insightful but also invaluable to developers that need to analyze data; especially in the Javascript domain!

James Wang

Whoever said Python is the data language was deeply wrong! JavaScript's capabilities and tools for processing, visualizing and analyzing data are unrivaled.

Pablo Farias Navarro, Founder of Zenva

If you are after building production-ready, data-powered apps and dashboards, Ashley's book will provide you with battle-tested techniques, frameworks and good practices to developing professional server and client-side data projects from the ground-up.

Pablo Farias Navarro, Founder of Zenva