Solr in Action
Trey Grainger and Timothy Potter
  • March 2014
  • ISBN 9781617291029
  • 664 pages
  • printed in black & white

The knowledge and techniques you need.

From the Foreword by Yonik Seeley, Creator of Solr

Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities.

Table of Contents show full




about this book

Part 1 Meet Solr

1. Introduction to Solr

1.1. Why do I need a search engine?

1.2. What is Solr?

1.3. Why Solr?

1.4. Features overview

1.5. Summary

2. Getting to know Solr

2.1. Getting started

2.2. Searching is what it’s all about

2.3. Tour of the Solr administration console

2.4. Adapting the example to your needs

2.5. Summary

3. Key Solr concepts

3.1. Searching, matching, and finding content

3.2. Relevancy

3.3. Precision and Recall

3.4. Searching at scale

3.5. Summary

4. Configuring Solr

4.1. Overview of solrconfig.xml

4.2. Query request handling

4.3. Managing searchers

4.4. Cache management

4.5. Remaining configuration options

4.6. Summary

5. Indexing

5.1. Example microblog search application

5.2. Designing your schema

5.3. Defining fields in schema.xml

5.4. Field types for structured nontext fields

5.5. Sending documents to Solr for indexing

5.6. Update handler

5.7. Index management

5.8. Summary

6. Text analysis

6.1. Analyzing microblog text

6.2. Basic text analysis

6.3. Defining a custom field type for microblog text

6.4. Advanced text analysis

6.5. Summary

Part 2 Core Solr capabilities

7. Performing queries and handling results

7.1. The anatomy of a Solr request

7.2. Working with query parsers

7.3. Queries and filters

7.4. The default query parser (Lucene query parser)

7.5. Handling user queries (eDisMax query parser)

7.6. Other useful query parsers

7.7. Returning results

7.8. Sorting results

7.9. Debugging query results

7.10. Summary

8. Faceted search

8.1. Navigating your content at a glance

8.2. Setting up test data

8.3. Field faceting

8.4. Query faceting

8.5. Range faceting

8.6. Filtering upon faceted values

8.7. Multiselect faceting, keys, and tags

8.8. Beyond the basics

8.9. Summary

9. Hit highlighting

9.1. Overview of hit highlighting

9.2. How highlighting works

9.3. Improving performance using FastVectorHighlighter

9.4. PostingsHighlighter

9.5. Summary

10. Query suggestions

10.1. Spell-check

10.2. Autosuggesting query terms

10.3. Suggesting document field values

10.4. Suggesting queries based on user activity

10.5. Summary

11. Result grouping/field collapsing

11.1. Result grouping vs. field collapsing

11.2. Skipping duplicate documents

11.3. Returning multiple documents per group

11.4. Grouping by functions and queries

11.5. Paging and sorting grouped results

11.6. Grouping gotchas

11.7. Efficient field collapsing with the collapsing query parser

11.8. Summary

12. Taking Solr to production

12.1. Developing a Solr distribution

12.2. Deploying Solr

12.3. Hardware and server configuration

12.4. Data acquisition strategies

12.5. Sharding and replication

12.6. Solr core management

12.7. Managing clusters of servers

12.8. Querying and interacting with Solr

12.9. Monitoring Solr’s performance

12.10. Upgrading between Solr versions

12.11. Summary

Part 3 Taking Solr to the next level

13. SolrCloud

13.1. Getting started with SolrCloud

13.2. Core concepts

13.3. Distributed indexing

13.5. Collections API

13.6. Basic system-administration tasks

13.7. Advanced topics

13.8. Summary

14. Multilingual search

14.1. Why linguistic analysis matters

14.2. Stemming vs. lemmatization

14.3. Stemming in action

14.4. Handling edge cases

14.5. Available language libraries in Solr

14.6. Searching content in multiple languages

14.7. Language identification

14.8. Summary

15. Complex query operations

15.1. Function queries

15.3. Pivot faceting

15.4. Referencing external data

15.5. Cross-document and cross-index joins

15.6. Big data analytics with Solr

15.7. Summary

16. Mastering relevancy

16.1. The impact of relevancy tuning

16.2. Debugging the relevancy calculation

16.3. Relevancy boosting

16.4. Pluggable Similarity class implementations

16.5. Personalized search and recommendations

16.6. Creating a personalized search experience

16.7. Running relevancy experiments

16.8. Summary

Appendix A: Working with the Solr codebase

Appendix B: Language-specific field type configurations

Appendix C: Useful data import configurations


About the book

Whether you're handling big (or small) data, managing documents, or building a website, it is important to be able to quickly search through your content and discover meaning in it. Apache Solr is your tool: a ready-to-deploy, Lucene-based, open source, full-text search engine. Solr can scale across many servers to enable real-time queries and data analytics across billions of documents.

Solr in Action teaches you to implement scalable search using Apache Solr. This easy-to-read guide balances conceptual discussions with practical examples to show you how to implement all of Solr's core capabilities. You'll master topics like text analysis, faceted search, hit highlighting, result grouping, query suggestions, multilingual search, advanced geospatial and data operations, and relevancy tuning.

What's inside

  • How to scale Solr for big data
  • Rich real-world examples
  • Solr as a NoSQL data store
  • Advanced multilingual, data, and relevancy tricks
  • Coverage of versions through Solr 4.7

About the reader

This book assumes basic knowledge of Java and standard database technology. No prior knowledge of Solr or Lucene is required.

About the authors

Trey Grainger is a director of engineering at CareerBuilder. Timothy Potter is a senior member of the engineering team at LucidWorks. The authors work on the scalability and reliability of Solr, as well as on recommendation engine and big data analytics technologies.

combo $49.99 pBook + eBook
eBook $39.99 pdf + ePub + kindle

FREE domestic shipping on three or more pBooks

Readable and immediately applicable ... an excellent book.

John Viviano, InterCorp, Inc.

The go-to guide for Solr ... a definitive resource for both beginners and experts.

Scott Anthony, Business Instruments

A well-dosed combination of deep technical knowledge and real-world experience.

Alexandre Madurell, Piksel, Inc.