Manning Early
Access Program
Taming Text
EARLY ACCESS EDITION
How to Find, Organize, and Manipulate It
Grant S. Ingersoll and Thomas S. Morton

MEAP Began: June 2008
Softbound print: Fall 2010 | 350 pages
ISBN: 193398838X

Pre-Order options*
Order today and start reading Taming Text today through MEAP        
  MEAP + Ebook only - $27.50
  MEAP + Print book (includes Ebook) when available - $44.99
* For more information, please see the MEAP FAQs page.
  About MEAP Release Date Estimates    

Table of Contents, MEAP Chapters & Resources

Table of Contents         Resources 
 1: Getting started taming text - FREE
 2: Foundations of Taming Text - AVAILABLE
 3: Searching - AVAILABLE
 4: Fuzzy String Matching
 5: Identifying people, places, and things - AVAILABLE
 6: Keyword tagging
 7: Clustering text
 8: Putting it all together: Question answering
 9: Untamed text: Exploring the next frontier

Appendix A: Example configuration files
 

DESCRIPTION

It is no secret that the world is drowning in text and data. This causes real problems for everyday users who need to make sense of all the information available, and software engineers who want to make their text-based applications more useful and user-friendly. Whether you're building a search engine for a corporate website, automatically organizing email, or extracting important nuggets of information from the news, dealing with unstructured text can be a daunting task.

Taming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. The book guides you through examples illustrating each of these topics, as well as the foundations upon which they are bulit.

WHAT'S INSIDE:

Taming Text is written in a clear, concise style that translates much of the jargon of the field into terms that any developer can understand, even without an advanced degree in statistics or natural language processing. Throughout the book, concise examples are developed and explained using the concepts of each chapter. The examples in this book are in Java, but the concepts can be applied to any language.

About the Authors

Grant Ingersoll is an independent consultant developing search and natural language processing tools. Prior to being a consultant, he was a Senior Software Engineer at the Center for Natural Language Processing at Syracuse University with 11 years of hands-on experience developing Java applications, many of which have been spent working on text processing applications. At the Center and, previously, at MNIS-TextWise, Grant worked on a number of text processing applications involving information retrieval, question answering, clustering, summarization, and categorization. Grant is a committer, as well as a speaker and trainer, on the Apache Lucene Java project and a co-founder of the Apache Mahout machine-learning project. He holds a master's degree in computer science from Syracuse University and a bachelor's degree in mathematics and computer science from Amherst College.

Thomas Morton writes software and performs research in the area of text processing and machine learning. He has been the primary developer and maintainer of the OpenNLP text processing project and Maximum Entropy machine learning project for the last 5 years. He received his doctorate in Computer Science from the University of Pennsylvania in 2005, and has worked in several industry positions applying text processing and machine learning to enterprise class development efforts. Currently he works as a software architect for Comcast Interactive Media in Philadelphia.

About the Early Access Version

This Early Access version of Taming Text enables you to receive new chapters as they are being written. You can also interact with the authors to ask questions, provide feedback and errata, and help shape the final manuscript on the Author Online

Want to learn More?

Sign up to read more content when it is released and to receive news about this book.