Data Munging with Perl

David Cross

2001 | 304 pages
ISBN: 1930110006

Out of Print
$23.50 eBook edition (PDF only)


RESOURCES

DESCRIPTION

Your desktop dictionary may not include it, but 'munging' is a common term in the programmer's world. Many computing tasks require taking data from one computer system, manipulating it in some way, and passing it to another. Munging can mean manipulating raw data to achieve a final form. It can mean parsing or filtering data, or the many steps required for data recognition. Or it can be something as simple as converting hours worked plus pay rates into a salary cheque.

This book shows you how to process data productively with Perl. It discusses general munging techniques and how to think about data munging problems. You will learn how to decouple the various stages of munging programs, how to design data structures, how to emulate the Unix filter model, etc. If you need to work with complex data formats it will teach you how to do that and also how to build your own tools to process these formats. The book includes detailed techniques for processing HTML and XML. And, it shows you how to build your own parsers to process data of arbitrary complexity.

If you are a programmer who munges data, this book will save you time. It will teach you systematic and powerful techniques using Perl. If you are not a Perl programmer, this book may just convince you to add Perl to your repertoire.

What's inside:

Translation rights for Data Munging with Perl have been granted for Japan, Brazil, and Germany. If you are interested in learning where to buy this book in a language other than English, please inquire at your local bookseller.

WHAT THE EXPERTS SAY ABOUT THIS BOOK

"I found the sample problems and the author's solutions to be very well done. I especially liked the design tips..."
--Pikes Peak Perl Mongers

"well worth the price, and a good starting point for more advanced forays."
--Use.Perl.com

"...a very good resource for programmers who want to learn more about data parsing, data filters, and data conversion..."
--ACM Computing Reviews

ABOUT THE AUTHOR

Dave Cross is the owner and Managing Director of Magnum Solutions Ltd., an internet and database consultancy based in London. He has 12 years' experience working in the IT industry. He is an active member of the Perl community, the founder of the London Perl Mongers, and is also a regular columist for Perlmonth, the online Perl magazine.

SAMPLE CHAPTERS

The table of contents, two sample chapters, and the index from Data Munging with Perl are available in PDF format. You need Adobe's free Acrobat Reader software to view it. You may download Acrobat Reader here.

Download the Table of Contents
Download Chapter 2
Download Chapter 3
Download the Index

REVIEWS

"It is well written, informative, thought provoking, and will be as relevant five years from now as it is today. In short, what are you waiting for? Go and buy a copy."
-- Dr. Dobb's Journal, May 2003

"I found the sample problems and the author's solutions to be very well done. I especially liked the design tips...I give the book a rating of 4.5 of a possible 5."
--Pikes Peak Perl Mongers

"From the word go, the author clearly sets out data munging basics and why perl is a good choice for this task...While 'Data Munging with Perl' is not presented as a traditional reference text, it tackles data munging problems cleanly enough to serve as one for the issues touched upon...unless you are already confident of your munging capabilities then this text deserves a place on your bookshelf."
--Edinburgh Perl Mongers

"...a very good resource for programmers who want to learn more about data parsing, data filters, and data conversion...It covers several different modules from CPAN (unlike many perl books on the market), as well as HTML and XML manipulation."
--ACM Computing Reviews

"well worth the price, and a good starting point for more advanced forays into (or extension work on) the topics discussed."
--Use.Perl.com

"Dave Cross makes an admirable effort at steering people away from regular expressions when not needed by pointing out some of the other useful facilities."
--DiverseBooks.com

"He not only lists the language's many capabilities for "munging" data, but also explains when and where to use each capability...What I enjoyed most about Cross' book was his emphasis on good design. It would be easy for an author to simply list all of Perl's many functions and features related to data manipulation. Cross avoids that temptation, however, and examines ways to write fast, reusable, and useful programs...The book's chapters are concise, the coverage is comprehensive, and the examples are plentiful and relevant. I've been using Perl's data munging capabilities heavily for many years, and I still picked up some useful new insights from Cross' book."
--Web Techniques Magazine, June 01 issue
-[Also online at http://www.webtechniques.com/archives/2001/06/book/]

"Data Munging with Perl's 12 chapters address progressively more 'interesting'types of data, with strategies for dealing with each. Each section includes an introductory rationale followed by examples in Perl...shows some good Perl programming, and provides convincing evidence of the value of data structures beyond the halls of academia along the way."
--ERCB Rating *** Above Average
--Dr. Dobb's Electronic Review of Computer Books
--http://www.ercb.com/brief/brief.0232.html

"Data Munging with Perl is now chained to your reviewer's bookcase...it's that kind of book...the author covers a lot of 'idiomatic perl' - stuff that you won't find in a computer science textbook on algorithms but that generations of Perl programmers have found useful...Cross gets his teeth into the hardest topic of all: building your own parsers with Parse::RecDescent...If you're a jobbing programmer who wants to get things done, you need this book."
--Linux Format Rating 10/10
--Linux Format Magazine, November Issue

"Having collected several Perl books over the years, I thought that I'd probably not need anymore. However, Data Munging with Perl is definitely worth adding to you collection. It is well laid out and builds upon your existing Perl knowledge adding a wealth of additional information with regards to munging/mining/parsing data using Perl.... The examples given throughout the book are well thought out and add structure around the tasks in hand. The author (David Cross) certainly knows his stuff and it shows. This is a very good book, and one that I would certainly recommend to anyone."
--Also from Linux Format, online edition
--http://www.linuxformat.co.uk/reviews.php

"This book, written by Perlmonk David Cross, is an excellent, easy to read, and easy to follow guide into what Perl does best: Data Munging...This book will expand your Perl vocabulary by leaps and bounds...If you are a junior to intermediate level programmer, and you want to improve your Perl skills, pick up this book. You won't be disappointed."
--Pipeline User Group, Mechanicsburg, PA

"...runs the whole gamut of munging techniques and issues, introducing a range of fast and fun approaches to both mundane and exotic data manipulation tasks...where this book really excels is in the chapters detailing parsing or pattern matching more complex data sources...It's well worth checking out how to create your very own parser using generic Perl modules...an excellent, slim volume covering an important if none too sexy area of programming. The emphasis is very much on real world examples and practice, making this a highly useful working companion for many an arduous data processing job."
--Linux User Magazine, July - August issue
--http://www.linuxuser.co.uk/articles/issue12/lu12-Books.pdf

"With entire rows of bookstores full of books on learning CGI programming with Perl in 10 easy lessons, it's nice to see one highlighting Perl's data processing capabilities more generally... for experienced Perl programmers it is an excellent overview of Perl's many data manipulation capabilities."
--;login: Magazine of Usenix and Sage, July issue
--http://www.usenix.org/publications/login/2001-07/bookreviews.pdf

"Dave Cross's new book, published by Manning, which means it has a figure from an old guide to native dress of the peoples of the world on the cover instead of some kind of animal, tells everything you need to know about using Perl for what it is most suited for: manipulating data.

"Starting with the source/filter/sink theory of data manipulation and demonstrating every tip and technique with clear and efficient examples, without severe digressions into mythological whimsy, this book would make an excellent second text on the Perl language, or a suitable first for someone who is good with programming languages.

"Many of the techniques contained in it are of 'trade secret' quality; they are the sort of write-the-number-of-gallons-of-paint-it-took-to-paint- the-room-on-the-back-of-the-light-switch-cover practices that until now had to be learned or happened upon by every programmer, alone, or by example, rather than in the context of a coherent theory.

"The theoretical side, in which 'munging' is defined and most software activity is described in terms of it, is clear enough that the book might be an interesting read for management, to answer the question "Just what is it about Perl that makes those who use it regularly so confoundedly fanatical?

"If you've ever been mystified by a Perl wizard who found it easier to export the records from the fancy GUI database into a comma delimited text file and then sort and display the data with mysterious little programs rather than use the GUI's native report generator, and want to find out why, or if you would like to become such a person yourself, or if you already are such a person but would like to get better at it, this book is for you."

--David L. Nicol
--Kansas City Perl Mongers (Review from Amazon.com.)

SOURCE CODE

Source code from Data Munging with Perl is contained in either a single ZIP file, or a Unix gzipped and tarred file archive. Free unzip programs for most platforms are available at Info-Zip.

cross_src.zip (44 Kb)
cross_src.tar.gz (19 Kb)