Contents
foreword
preface
preface to the first edition
acknowledgments
about this book
about the authors
JUnit primer
Part 1 Core Lucene
- Chapter 1 Meet Lucene
- Dealing with information explosion
- What is Lucene?
- Lucene and the components of a search application
- Lucene in action: a sample application
- Understanding the core indexing classes
- Understanding the core searching classes
- Summary
- Chapter 2 Building a search index
- How Lucene models content
- Understanding the indexing process
- Basic index operations
- Field options
- Boosting documents and fields
- Indexing numbers, dates, and times
- Field truncation
- Near-real-time search
- Optimizing an index
- Other directory implementations
- Concurrency, thread safety, and locking issues
- Debugging indexing
- Advanced indexing concepts
- Summary
- Chapter 3 Adding search to your application
- Implementing a simple search feature
- Using IndexSearcher
- Understanding Lucene scoring
- Lucene’s diverse queries
- Parsing query expressions: QueryParser
- Summary
- Chapter 4 Lucene’s analysis process
- Using analyzers
- What’s inside an analyzer?
- Using the built-in analyzers
- Sounds-like querying
- Synonyms, aliases, and words that mean the same
- Stemming analysis
- Field variations
- Language analysis issues
- Nutch analysis
- Summary
- Chapter 5 Advanced search techniques
- Lucene’s field cache
- Sorting search results
- Using MultiPhraseQuery
- Querying on multiple fields at once
- Span queries
- Filtering a search
- Custom scoring using function queries
- Searching across multiple Lucene indexes
- Leveraging term vectors
- Loading fields with FieldSelector
- Stopping a slow search
- Summary
- Chapter 6 Extending search
- Using a custom sort method
- Developing a custom Collector
- Extending QueryParser
- Custom filters
- Payloads
- Summary
Part 2 Applied Lucene
- Chapter 7 Extracting text with Tika
- What is Tika?
- Tika’s logical design and API
- Installing Tika
- Tika’s built-in text extraction tool
- Extracting text programmatically
- Tika’s limitations
- Indexing custom XML
- Alternatives
- Summary
- Chapter 8 Essential Lucene extensions
- Luke, the Lucene Index Toolbox
- Analyzers, tokenizers, and TokenFilters
- Highlighting query terms
- FastVectorHighlighter
- Spell checking
- Fun and interesting Query extensions
- Building contrib modules
- Summary
- Chapter 9 Further Lucene extensions
- Chaining filters
- Storing an index in Berkeley DB
- Synonyms from WordNet
- Fast memory-based indices
- XML QueryParser: Beyond “one box” search interfaces
- Surround query language
- Spatial Lucene
- Searching multiple indexes remotely
- Flexible QueryParser
Odds and ends
- Summary
- Chapter 10 Using Lucene from other programming languages
- Ports primer
- CLucene (C++)
- Lucene.Net (C# and other .NET languages)
- KinoSearch and Lucy (Perl)
- Ferret (Ruby)
- PHP
- PyLucene (Python)
- Solr (many programming languages)
- Summary
- Chapter 11 Lucene administration and performance tuning
- Performance tuning
- Threads and concurrency
- Managing resource consumption
- Hot backups of the index
- Common errors
- Summary
Part 3 Case studies
- Chapter 12 Case study 1: Krugle
- Krugle: Searching source code
- Introducing Krugle
- Appliance architecture
- Search performance
- Parsing source code
- Substring searching
- Query vs. search
- Future improvements
- Summary
- Chapter 13 Case study 2: SIREn
- Searching semistructured documents with SIREn
- Introducing SIREn
- SIREn’s benefits
- Indexing entities with SIREn
- Searching entities with SIREn
- Integrating SIREn in Solr
- Benchmark
- Summary
- Chapter 14 Case study 3: LinkedIn
- Adding facets and real-time search with Bobo Browse and Zoie
- Faceted search with Bobo Browse
- Real-time search with Zoie
- Summary
appendix a Installing Lucene
appendix b Lucene index format
appendix c Lucene/contrib benchmark
appendix d Resources
index