Contents


foreword
preface
preface to the first edition
acknowledgments
about this book
about the authors
JUnit primer

Part 1 Core Lucene

Chapter 1 Meet Lucene
Dealing with information explosion
What is Lucene?
Lucene and the components of a search application
Lucene in action: a sample application
Understanding the core indexing classes
Understanding the core searching classes
Summary
Chapter 2 Building a search index
How Lucene models content
Understanding the indexing process
Basic index operations
Field options
Boosting documents and fields
Indexing numbers, dates, and times
Field truncation
Near-real-time search
Optimizing an index
Other directory implementations
Concurrency, thread safety, and locking issues
Debugging indexing
Advanced indexing concepts
Summary
Chapter 3 Adding search to your application
Implementing a simple search feature
Using IndexSearcher
Understanding Lucene scoring
Lucene’s diverse queries
Parsing query expressions: QueryParser
Summary
Chapter 4 Lucene’s analysis process
Using analyzers
What’s inside an analyzer?
Using the built-in analyzers
Sounds-like querying
Synonyms, aliases, and words that mean the same
Stemming analysis
Field variations
Language analysis issues
Nutch analysis
Summary
Chapter 5 Advanced search techniques
Lucene’s field cache
Sorting search results
Using MultiPhraseQuery
Querying on multiple fields at once
Span queries
Filtering a search
Custom scoring using function queries
Searching across multiple Lucene indexes
Leveraging term vectors
Loading fields with FieldSelector
Stopping a slow search
Summary
Chapter 6 Extending search
Using a custom sort method
Developing a custom Collector
Extending QueryParser
Custom filters
Payloads
Summary

Part 2 Applied Lucene

Chapter 7 Extracting text with Tika
What is Tika?
Tika’s logical design and API
Installing Tika
Tika’s built-in text extraction tool
Extracting text programmatically
Tika’s limitations
Indexing custom XML
Alternatives
Summary
Chapter 8 Essential Lucene extensions
Luke, the Lucene Index Toolbox
Analyzers, tokenizers, and TokenFilters
Highlighting query terms
FastVectorHighlighter
Spell checking
Fun and interesting Query extensions
Building contrib modules
Summary
Chapter 9 Further Lucene extensions
Chaining filters
Storing an index in Berkeley DB
Synonyms from WordNet
Fast memory-based indices
XML QueryParser: Beyond “one box” search interfaces
Surround query language
Spatial Lucene
Searching multiple indexes remotely
Flexible QueryParser
Odds and ends
Summary
Chapter 10 Using Lucene from other programming languages
Ports primer
CLucene (C++)
Lucene.Net (C# and other .NET languages)
KinoSearch and Lucy (Perl)
Ferret (Ruby)
PHP
PyLucene (Python)
Solr (many programming languages)
Summary
Chapter 11 Lucene administration and performance tuning
Performance tuning
Threads and concurrency
Managing resource consumption
Hot backups of the index
Common errors
Summary

Part 3 Case studies

Chapter 12 Case study 1: Krugle
Krugle: Searching source code
Introducing Krugle
Appliance architecture
Search performance
Parsing source code
Substring searching
Query vs. search
Future improvements
Summary
Chapter 13 Case study 2: SIREn
Searching semistructured documents with SIREn
Introducing SIREn
SIREn’s benefits
Indexing entities with SIREn
Searching entities with SIREn
Integrating SIREn in Solr
Benchmark
Summary
Chapter 14 Case study 3: LinkedIn
Adding facets and real-time search with Bobo Browse and Zoie
Faceted search with Bobo Browse
Real-time search with Zoie
Summary


appendix a Installing Lucene
appendix b Lucene index format
appendix c Lucene/contrib benchmark
appendix d Resources
index