Manning Early
Access Program
Taming Text
EARLY ACCESS EDITION
How to Find, Organize, and Manipulate It
Grant S. Ingersoll, Thomas S. Morton, and Andrew L. Farris

MEAP Began: June 2008
Softbound print: March 2012 (est.) | 350 pages
ISBN: 193398838X

Pre-Order options*
Order today and start reading Taming Text today through MEAP        
  MEAP + Ebook only - $35.99
  MEAP + Print book (includes Ebook) when available - $44.99
* For more information, please see the MEAP FAQs page.
  About MEAP Release Date Estimates    

Table of Contents, MEAP Chapters & Resources

Table of Contents         Resources 
 1: Getting started taming text - FREE
 2: Foundations of Taming Text - AVAILABLE
 3: Searching - AVAILABLE
 4: Fuzzy String Matching - AVAILABLE
 5: Identifying people, places, and things - AVAILABLE
 6: Clustering text - AVAILABLE
 7: Classification, categorization and tagging - AVAILABLE
 8: An example application: Question answering - AVAILABLE
 9: Untamed text: Exploring the next frontier - AVAILABLE
 Appendix A: Example configuration files
 

DESCRIPTION

It is no secret that the world is drowning in text and data. This causes real problems for everyday users who need to make sense of all the information available, and software engineers who want to make their text-based applications more useful and user-friendly. Whether you're building a search engine for a corporate website, automatically organizing email, or extracting important nuggets of information from the news, dealing with unstructured text can be a daunting task.

Taming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. The book guides you through examples illustrating each of these topics, as well as the foundations upon which they are bulit.

WHAT'S INSIDE

Taming Text is written in a clear, concise style that translates much of the jargon of the field into terms that any developer can understand, even without an advanced degree in statistics or natural language processing. Throughout the book, concise examples are developed and explained using the concepts of each chapter. The examples in this book are in Java, but the concepts can be applied to any language.

About the Authors

Grant Ingersoll is an independent consultant developing search and natural language processing tools. Prior to being a consultant, he was a Senior Software Engineer at the Center for Natural Language Processing at Syracuse University with 11 years of hands-on experience developing Java applications, many of which have been spent working on text processing applications. At the Center and, previously, at MNIS-TextWise, Grant worked on a number of text processing applications involving information retrieval, question answering, clustering, summarization, and categorization. Grant is a committer, as well as a speaker and trainer, on the Apache Lucene Java project and a co-founder of the Apache Mahout machine-learning project. He holds a master's degree in computer science from Syracuse University and a bachelor's degree in mathematics and computer science from Amherst College.

Thomas Morton writes software and performs research in the area of text processing and machine learning. He has been the primary developer and maintainer of the OpenNLP text processing project and Maximum Entropy machine learning project for the last 5 years. He received his doctorate in Computer Science from the University of Pennsylvania in 2005, and has worked in several industry positions applying text processing and machine learning to enterprise class development efforts. Currently he works as a software architect for Comcast Interactive Media in Philadelphia.

Drew Farris is a professional software developer and technology consultant whose interests focus on large scale analytics, distributed computing and machine learning. Previously, he worked at TextWise where he implemented a wide variety of text exploration, management and retrieval applications combining natural language processing, classification and visualization techniques. He has contributed to a number of open source projects including Apache Mahout, Lucene and Solr, and holds a master's degree in Information Resource Management from Syracuse University's iSchool and a B.F.A in Computer Graphics.

WHAT REVIEWERS ARE SAYING

“I think this will be an excellent book for developers who want to be introduced into the world of text, more specifically if they have to implement search functionaly, if they need to match textual data (for instance: matching movies in different DVD databases), or if they programmatically have to derive meaning from documents.”
Bruno Lowagie

About the Early Access Version

This Early Access version of Taming Text enables you to receive new chapters as they are being written. You can also interact with the authors to ask questions, provide feedback and errata, and help shape the final manuscript on the Author Online

Want to learn More?

Sign up to read more content when it is released and to receive news about this book.