The Art of Data Usability
Tryggvi Björgvinsson
  • ISBN 9781617294716
  • 300 pages (estimated)
  • printed in black & white

Data is only valuable if it's useful. If you're responsible for making meaningful data available to business stakeholders, researchers, or even the general public, you need a predictable process for discerning the users' needs and delivering the right data in the right way. So how do you establish the correct priorities and measures of relevance? How do you continuously improve your data projects? This is the essential art of data usability.

Table of Contents detailed table of contents

Part 1: Foundations

1. Learning from the past

1.1. Basing improvements on past insights

1.2. Understanding the DIKW hierarchy

1.2.1. Finding the path to wisdom

1.2.2. Data

1.2.3. Information

1.2.4. Knowledge

1.2.5. Wisdom

1.3. What to expect from this book

1.3.1. Overview of the usability work

1.4. Summary

2. Creating the perfect world

2.1. TL;DR

2.2. What exactly is quality?

2.2.1. Quality is about answering needs

2.2.2. Quality dimensions

2.3. Finding out what people want

2.3.1. Quality levels

2.3.2. Quality attributes

2.4. Managing quality

2.4.1. The scientific method

2.4.2. The quality cycle

2.5. Working with the quality cycle

2.5.1. Planning and designing metrics

2.5.2. Implementing controls and changes

2.5.3. Analyzing the implementation

2.5.4. Establishing a new baseline

2.6. If you like data, you’ll also like quality management

2.6.1. Document everything

2.6.2. The components of quality management

2.6.3. The difference of data quality

2.7. Automating bureaucracy

2.7.1. The world is always changing

2.7.2. Constant checking to improve

2.7.3. An automated quality control

2.8. Summary

3. The structure of a data project

3.1. TL;DR

3.2. Structuring a data project

3.3. Design

3.4. Management

3.4.1. Common quality attributes in data management

3.5. Collection

3.5.1. Common quality attributes in collection

3.6. Processing

3.6.1. Common quality attributes in processing

3.7. Dissemination

3.7.1. Common quality attributes in dissemination

3.8. Closing

3.9. Remember all stages when identifying user needs

3.10. Summary

4. Knowing what people want

4.1. TL;DR

4.2. Designing data usability into the project from the start

4.2.1. Prioritizing needs manually and semi-automatically

4.3. Identifying and prioritizing user groups

4.3.1. Knowing your users

4.3.2. Prioritizing users by importance

4.4. Identifying and prioritizing situations

4.4.1. Identifying situations that may arise

4.4.2. Prioritizing situations

4.5. Identify and prioritize needs

4.5.1. Identify needs of users in situations

4.5.2. Prioritize needs of users in situations

4.6. Assisted prioritizing

4.6.1. User configuration

4.6.2. Situations configuration file

4.6.3. Prioritization of entities

4.6.4. Mapping the needs

4.6.5. Prioritize the needs

4.6.6. These are the priorities for your quality cycles

4.6.7. Find the appropriate level for your quality metrics

4.7. Summary

5. Applying continuous quality control

5.1. TL;DR

5.2. System monitoring tools

5.3. Finding a monitoring solution

5.3.1. Understanding Nagios plugins

5.3.2. Returning detail messages

5.3.3. Writing quality controls as monitoring plugins

5.3.4. Create the plugin

5.4. Setting up our quality control

5.4.1. Controlling our quality control

5.5. Our monitoring tool

5.5.1. Creating configuration files

5.5.2. Monitoring the test plugin

5.6. Summary

6. Setting up your workflow

6.1. TL;DR

6.2. Methodologies and methods

6.2.1. You need to use a methodology

6.2.2. Agile usability

6.3. Tracking team progress with Kanban

6.3.1. Using digital or physical cards

6.3.2. Writing Kanban cards

6.3.3. Kanban card quality plan templates

6.3.4. A Kanban card for usefulness

6.4. Your quality Kanban board

6.4.1. Starting a work process

6.4.2. Implementing controls in order

6.4.3. Implementing changes

6.4.4. Analyzing changes

6.4.5. Establishing the baseline

6.4.6. Extending the work process

6.4.7. Integrating the monitoring solution

6.5. Summary

7. Maintaining quality controls after project end

7.1. TL;DR

7.2. Setting up quality maintenance

7.2.1. Dedicated maintenance monitor

7.3. The end of a project

7.3.1. Moving the traffic sensor project to maintenance

7.4. Moving quality attributes to maintenance

7.4.1. Filtering traffic data quality controls

7.4.2. Running the maintenance monitor

7.5. Summary

Part 2: Tips

8. The reference period

8.1. TL;DR

8.2. Availability

8.2.1. Ensuring or improving availability

8.2.2. Setting up the hot backup solution

8.3. Monitoring availability

8.3.1. Defining the quality controls reference period

8.3.2. Two quality controls for metrics

8.3.3. Recording state changes

8.4. Summary

9. Utilizing Warnings in Monitoring Solutions

9.1. TL;DR

9.2. Understandability

9.2.1. Github blog posts

9.2.2. Warnings instead of errors

9.2.3. Blog post

9.3. Monitoring understandability

9.3.1. Word lists

9.3.2. Creating the quality control

9.3.3. Checking understandability

9.3.4. Revisiting the use of warnings

9.4. Summary

10. Recurring attributes

10.1. TL;DR

10.2. Timeliness

10.2.1. The difficulties of vague quality levels

10.2.2. Recurring attributes require manual work

10.2.3. Automating what we can

10.2.4. Process dashboard

10.3. Monitoring timeliness

10.3.1. Floating reference period

10.3.2. Writing the quality control

10.3.3. Try out the quality control

10.4. Summary

11. The KISS of quality

11.1. TL;DR

11.2. Reproducibility

11.2.1. Internet memes

11.2.2. Original ETL program

11.2.3. Later ETL process

11.2.4. Comparison of ETL programs

11.2.5. Picking the right ETL process

11.2.6. Create the reproducibility program

11.3. Monitoring reproducibility

11.3.1. Create test data

11.3.2. Using software tests to control quality

11.3.3. Create tests

11.3.4. Create the quality control plugin

11.3.5. You need another quality control plugin

11.3.6. Try out the quality controls

11.4. Summary

12. Combining smaller controls into a meta-control

12.1. TL;DR

12.2. Completeness

12.2.1. What do I mean by metadata?

12.2.2. Example dataset

12.2.3. Filling in the metadata

12.2.4. Other resources

12.3. Monitoring completeness

12.3.1. Checking the metadata

12.3.2. Checking data values

12.3.3. Trying our quality controls out

12.3.4. Combining multiple checks into one

12.3.5. Try the meta-quality control

12.4. Summary

13. Improving an attribute with another attribute

13.1. TL;DR

13.2. Recoverability

13.3. Integrity

13.3.1. Checksums

13.3.2. Test checksums with a data generating program

13.4. Backing up our files

13.5. Monitoring recoverability

13.5.1. One usability attribute can be required for another

13.5.2. Create the quality control plugin

13.5.3. Try out the quality control

13.6. Summary

14. Checking what doesn’t exist

14.1. TL;DR

14.2. Forgetability

14.2.1. Inputting the data

14.3. Monitoring auditability

14.3.1. Try out the quality control

14.4. Summary

15. Hooking monitoring into a feedback process

15.1. TL;DR

15.2. Validity

15.2.1. Schemas

15.2.2. CSV schemas

15.2.3. Using data packages

15.3. Monitoring validity

15.3.1. Create the quality control

15.3.2. Try out the quality control

15.3.3. Provide feedback to those who can improve the data

15.4. Summary

16. Using the data you expect

16.1. TL;DR

16.2. Normality

16.2.1. Applicability of normal distribution to the public transport system

16.2.2. Bus data

16.3. Monitoring normality

16.3.1. The Empirical rule

16.3.2. Creating the quality control

16.3.3. Try out the quality control

16.4. Report forward

16.5. Summary


Appendix A: Docker

A.1. What is Docker?

A.1.1. Docker images and containers

A.1.2. Installing Docker

A.2. Running Docker images

A.2.1. Docker networks

A.2.2. Stopping and removing containers

Appendix B: Installing and using Python

B.1. Installation

B.1.1. GNU/Linux

B.1.2. Mac OS X

B.1.3. Microsoft Windows

B.2. Virtual environments

B.3. Chapter requirements

B.4. Create your first Python program

Appendix C: Data formats

C.1. CSV


About the book

The Art of Data Usability teaches you to think about data quality in context, presenting a methodology to maximize the usefulness of data for its intended consumers. In this practical guide, you'll master an iterative process for identifying and refining user data needs and reflecting those requirements in your data projects. You'll benefit from author Tryggvi Bjorgvinnson's years of experience delivering artful data projects as you learn to apply quality management principles to your projects, collect techniques to monitor the attributes you need from your data, and develop Python-based scripts to run against datasets. You'll also discover which parts of the process can be automated and develop an intuition for when good, old-fashioned whiteboarding is a better idea. With these best practices, crystal-clear instructions, and hands-on projects, you'll be able to get data that you can really trust!

What's inside

  • Attributes of quality data
  • Identifying user needs and requirements
  • Using Python for data quality monitoring
  • The correct way to disseminate your data
  • Best practices and methods to improve data usability

About the reader

Written for readers comfortable with data management and common data formats such CSV and JSON.

About the author

Tryggvi Björgvinsson is the head of IT and dissemination at Statistics Iceland. He holds a Ph.D. in software engineering.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo
$35.00 $44.99 pBook + eBook + liveBook
MEAP eBook
$25.00 $35.99 pdf + ePub + kindle + liveBook

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks