I have been using gnuplot for 15 years, and it’s an indispensable part of my toolset: one of the handful of programs I can’t do without.
Initially, I used gnuplot as part of my academic research work as a theoretical condensed matter physicist. But much later, when I joined Amazon.com, I found myself using gnuplot again, this time to analyze the movement of workers in Amazon’s gargantuan warehouses and the distribution of packages to customers. Later yet, I found gnuplot helpful when analyzing web traffic patterns for the Walt Disney Company.
I find gnuplot indispensable because it lets me see data, and do so in an easy, uncomplicated manner. Using gnuplot, I can draw and redraw graphs and look at data in different ways. I can generate images of data sets containing millions of points, and I can script gnuplot to create graphs for me automatically.
These things matter. In one of my assignments, I was able to discover highly relevant information because I was able to generate literally hundreds of graphs. Putting all of them on a web page next to each other revealed blatant similarities (and differences) between different data setsa fact that had never before been noticed, not least because everybody else was using tools (mostly Excel) that would only allow graphs to be created one at a time.
While at Amazon, I discovered something else: data is no longer confined to the science lab. In a modern corporation, data is everywhere. Any reasonably sophisticated organization is constantly collecting data: sales numbers, web traffic, inventory, turnover, database performance, supply chain details, you name it. Naturally, there’s a continuous and ever-increasing demand to make use of this data to improve the business.
What this means is that data analysis is no longer a specialist’s job everybody has a need for it, even if only to monitor one’s own metrics or performance indicators. This isn’t a bad thing. The way inputs influence outputs is often not obvious, and placing decisions on a firmer, more rational footing is reasonable.
But what I also found at Amazon and elsewhere is that the people doing the data analysis often don’t have the right toolset, both in terms of actual software tools and in regard to methods and techniques.
In many ways, my experience in the corporate world has been an influence while writing this book. I believe that graphical methodswhich are accessible to anyone, regardless of mathematical or statistical trainingare an excellent way to understand data and derive value from it (much better and more powerful than a five-day statistics class, and much more flexible and creative than a standard Six-Sigma program).
And I believe that gnuplot is a very good tool to use for this purpose. Its learning curve is flatyou can pick up the basics in an hour. It requires no programming skills. It handles a variety of input formats. It’s fast and it’s interactive. It’s mature. It’s also free and open source.
Gnuplot has always been popular with scientists all overI hope to convince you that it can be useful to a much larger audience. Business analysts, operations managers, database and data warehouse administrators, programmers: anybody who wants to understand data with graphs.
I’d like to show you how to do it.