The Art of Data Usability
Tryggvi Björgvinsson
  • MEAP began June 2017
  • Publication in Summer 2018 (estimated)
  • ISBN 9781617294716
  • 300 pages (estimated)
  • printed in black & white

Data is only valuable if it's useful. If you're responsible for making meaningful data available to business stakeholders, researchers, or even the general public, you need a predictable process for discerning the users' needs and delivering the right data in the right way. So how do you establish the correct priorities and measures of relevance? How do you continuously improve your data projects? This is the essential art of data usability.

Table of Contents detailed table of contents

Part 1: Introduction

1. Learning from the past

1.1. Basing improvements on past insights

1.2. Finding the path to wisdom

1.2.1. Data

1.2.2. Information

1.2.3. Knowledge

1.2.4. Wisdom

1.2.5. DIKW hierarchy

1.3. Structuring a data project

1.3.1. Design for different situations

1.3.2. Set up the infrastructure

1.3.3. Collect and process data

1.3.4. Announce the results

1.3.5. Wrap things up

1.4. What to expect from this book

1.4.1. Book structure

1.4.2. Learn by doing

1.5. Summary

2. Creating the perfect world

2.1. Setting up your chapter environment

2.2. What exactly is quality?

2.2.1. Quality is about answering needs

2.2.2. Quality dimensions

2.3. Finding out what people want

2.3.1. Quality levels

2.3.2. Quality attributes

2.4. Managing quality

2.4.1. The scientific method

2.4.2. The quality cycle

2.5. Working with the quality cycle

2.5.1. How to generate example data

2.5.2. Create the script

2.5.3. Planning and designing metrics

2.5.4. Implementing controls and changes

2.5.5. Analyzing the implementation

2.5.6. Establishing a new baseline

2.6. The importance of documenting everything

2.6.1. If you like data, you’ll also like quality management

2.6.2. The components of quality management

2.7. The difference of data quality

2.7.1. Automating bureaucracy

2.7.2. The world is always changing

2.7.3. Constantly check whether you can improve

2.7.4. Example of an automated quality control

2.7.5. This book is about the automation of quality controls

2.8. Summary

Part 2: Designing a Data Project

3. Knowing what people want

3.1. Design data usability into the project from the start

3.2. Usability is about various people in different situations

3.2.1. Prioritizing can be done manually and semi-automatically

3.3. Identify and prioritize user groups

3.3.1. Know who your users are

3.3.2. Prioritize users based on needs

3.3.3. Use computer programs to aid us

3.3.4. Make it configurable

3.3.5. Think of your users as coordinates

3.3.6. Create the program

3.3.7. The program is flexible under different circumstances

3.4. Identify and prioritize situations

3.4.1. Find situations that may come up

3.4.2. Prioritize the situations

3.4.3. Reuse the prioritization program we created earlier

3.5. Identify needs of users in situations

3.5.1. Program to assist the mapping

3.5.2. Using the generated mapping file

3.5.3. Create the program

3.5.4. Mapping the needs

3.6. Prioritize needs of users in situations

3.6.1. Create the program

3.6.2. Find the needs with the highest priority

3.6.3. These are the priorities for your quality cycles

3.6.4. Find the appropriate level for your quality metrics

3.7. Summary

4. Applying continuous quality control

4.1. System monitoring tools

4.2. Finding a monitoring solution

4.2.1. Create a non-production monitoring tool to grasp the concepts

4.2.2. Real-world quality controls

4.2.3. How Nagios plugins work

4.2.4. Getting an explanation to the quality manager

4.3. Writing quality controls as monitoring plugins

4.3.1. Create the plugin

4.3.2. Controlling the example quality control

4.4. The book’s monitoring tool

4.4.1. Configuration for operating system compatibility

4.4.2. Limited functionality

4.4.3. Configuration of plugins

4.4.4. Example configuration file

4.4.5. Create the program

4.4.6. Monitor our test plugin

4.5. Summary

Part 3: Managing Data

5. Living with failures

5.1. Availability

5.2. Ensuring or improving availability

5.2.1. Create the hot backup solution

5.2.2. Try out the hot backup

5.2.3. Read the available file

5.3. Monitoring availability

5.3.1. Define the right reference period for your quality controls

5.3.2. How to monitor availability

5.3.3. Record state changes

5.3.4. Create the state recording program

5.3.5. Check availability based on state changes

5.3.6. Create the quality control plugin

5.3.7. Configuration for our monitoring tool

5.4. Summary

6. Coping with disasters

6.1. Recoverability

6.2. Integrity

6.2.1. Checksums

6.2.2. Test checksums with a data generating program

6.2.3. Create a checksum computing program

6.3. Ensuring or improving recoverability

6.3.1. Create the cold backup solution

6.3.2. Try out the cold backup

6.4. Monitoring recoverability

6.4.1. One usability attribute can be required for another

6.4.2. Create the quality control plugin

6.4.3. Configuration for our monitoring tool

6.5. Summary

7. Getting the right data

7.1. Validity

7.1.1. Schemas

7.1.2. CSV schemas

7.1.3. Using CSV on the Web

7.1.4. Create a CSV on the Web validator

7.1.5. Validating the CSV file

7.1.6. Create the validation program

7.2. Monitoring validity

7.2.1. Create the quality control plugin

7.2.2. Provide feedback to those who can improve the data

7.2.3. Configuration for our monitoring tool

7.3. Summary

Part 4: Collecting Data

8. Reloading the source data

8.1. Reproducibility

8.1.1. Original ETL program

8.1.2. Later ETL process

8.1.3. Comparison of ETL programs

8.1.4. Picking the right ETL process

8.1.5. Revisiting validation

8.1.6. Create the validation program

8.1.7. Create the reproducibility program

8.2. Monitoring reproducibility

8.2.1. Create test data

8.2.2. Create the quality control plugin

8.2.3. You need another quality control plugin

8.3. Summary

Part 5: Processing Data

9. Delivering on time

9.1. Timeliness

9.1.1. The difficulties of vague quality levels

9.1.2. Recurring attributes require manual work

9.1.3. Automating what we can

9.1.4. Creating the assisting program

9.1.5. Trying out the program

9.2. Monitoring timeliness

9.2.1. Floating reference period

9.2.2. Writing the quality control

9.2.3. Try out the quality control

9.2.4. Configuration for our monitoring tool

9.3. Summary

10. Getting the results you expect

Part 6: Dissemination of Data

11. Making yourself understood

12. Being transparent

Part 7: Closing a Data project

13. The right to be forgotten

14. Opening up


Appendix A: Installing and using Python

A.1. Installation

A.1.1. GNU/Linux

A.1.2. Mac OS X

A.1.3. Microsoft Windows

A.2. Virtual environments

A.3. Create your first Python program

Appendix B: Data formats

B.1. CSV


About the book

The Art of Data Usability teaches you to think about data quality in context, presenting a methodology to maximize the usefulness of data for its intended consumers. In this practical guide, you'll master an iterative process for identifying and refining user data needs and reflecting those requirements in your data projects. You'll benefit from author Tryggvi Bjorgvinnson's years of experience delivering artful data projects as you learn to apply quality management principles to your projects, collect techniques to monitor the attributes you need from your data, and develop Python-based scripts to run against datasets. You'll also discover which parts of the process can be automated and develop an intuition for when good, old-fashioned whiteboarding is a better idea. With these best practices, crystal-clear instructions, and hands-on projects, you'll be able to get data that you can really trust!

What's inside

  • Attributes of quality data
  • Identifying user needs and requirements
  • Using Python for data quality monitoring
  • The correct way to disseminate your data
  • Best practices and methods to improve data usability

About the reader

Written for readers comfortable with data management and common data formats such CSV and JSON.

About the author

Tryggvi Björgvinsson is the head of IT and dissemination at Statistics Iceland. He holds a Ph.D. in software engineering.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo $44.99 pBook + eBook + liveBook
MEAP eBook $35.99 pdf + ePub + kindle + liveBook

FREE domestic shipping on three or more pBooks