Hibernate Search is a library providing full-text search capabilities to Hibernate. It opens doors to more human friendly and efficient search engines while still following the Hibernate and Java Persistence development paradigm. This library relieves you of the burdens of keeping indexes up to date with the database, converts Lucene results into managed objects of your domain model, and eases the transition from a HQL-based query to a full-text query. Hibernate Search also helps you scale Lucene in a clustered environment.
Hibernate Search in Action aims not only at providing practical knowledge of Hibernate Search but also uncovering some of the background behind Hibernate Search’s design.
We will start by describing full-text search technology and why this tool is invaluable in your development toolbox. Then you will learn how to start with Hibernate Search, how to prepare and index your domain model, how to query your data. We will explore advanced concepts like typo recovery, phonetic approximation, and search by synonym. You will also learn how to improve performance when using Hibernate Search and use it in a clustered environment. The book will then guide you to more advanced Lucene concepts and show you how to access Lucene natively in case Hibernate Search does not cover some of your needs. We will also explore the notion of document scoring and how Lucene orders documents by relevance as well as a few useful tools like term highlighters.
Even though this is an “in Action” book, the authors have included a healthy amount of theory on most of the topics. They feel that it is not only important to know “how” but also “why.” This knowledge will help you better understand the design of Hibernate Search. This book is a savant dosage of theory, reference on Hibernate Search and practical knowledge. The latter is the meat of this book and is lead by practical examples.
After reading it, you will be armed with sufficient knowledge to use Hibernate Search in all situations.
While this book can be read from cover to cover, we made sure you can read the sections you are interested independently from the others. Feel free to jump to the subject you are most interested in. Chapter 2, which you should read first, will give you an overview of Hibernate Search and explain how to set it up. Check the road map section which follows for an overview of Hibernate Search in Action.
Most chapters start with background and theory on the subject they are covering, so feel free to jump straight to the practical knowledge if you are not interested in the introduction. You can always return to the theory.
This book is aimed at any person wanting to know more about Hibernate Search and full-text search in general. Any person curious to understand what full text search technology can bring to them and what benefits Hibernate Search provides will be interested.
Readers looking for a smooth and practical introduction to Hibernate Search will appreciate the step-by-step introduction of each feature and its concrete examples.
The more advanced architect will find sections describing concepts and features offered by Hibernate Search as well as the chapter about clustering to be of interest.
The regular Hibernate Search users will enjoy in-depth descriptions of each subject and the ability to jump to the chapter covering the subject they are interested in. They will also appreciate the chapter focusing on performance optimizations.
The search guru will also enjoy the advanced chapters on Lucene describing scoring, access to the native Lucene APIs from Hibernate Search, and the Lucene contribution package.
Developers or architects using or willing to use Hibernate Search on their project will find useful knowledge (how-to, practical examples, architecture recommendations, optimizations).
It is recommended to have basic knowledge of Hibernate Core or Java Persistence but some reviewers have read the book with no knowledge of Hibernate, some with knowledge of the .Net platform, and found the book useful.
In the first part of the book, we introduce full-text search and Hibernate Search.
Chapter 1 describes the weakness of SQL as a tool to answer human queries and describes full-text search technology. This chapter also describes full-text search approaches, the issues with integrating them in a classic Java SE/EE application and why Hibernate Search is needed.
Chapter 2 is a getting started guide on Hibernate Search. It describes how to set up and configure it in a Java application, how to define the mapping in your domain model. It then describes how Hibernate Search indexes objects and how to write full-text queries. We also introduce Luke, a tool to inspect Lucene indexes.
PART 2 focuses on mapping and indexing.
Chapter 3 describes the basics of domain model mapping. We will walk you through the steps of marking an entity and a property as indexed. You will understand the various mapping strategies.
Chapter 4 goes a step further into the mapping possibilities. Custom bridges are introduced as well as mapping of relationships.
Chapter 5 introduces where and how Hibernate Search indexes your entities. We will learn how to configure directory providers (the structure holding index data), how to configure analyzers and what feature they bring (text normalization, typo recovery, phonetic approximation, search by synonyms and so on). Then we will see how Hibernate Search transparently indexes your entities and how to take control and manually trigger such indexing.
PART 3 of Hibernate Search in Action covers queries.
Chapter 6 covers the programmatic model used for queries, how it integrates into the Hibernate model and shares the same persistence context. You will also learn how to customize queries by defining pagination, projection, fetching strategies, and so on.
Chapter 7 goes into the meat of full-text queries. It describes what is expressible in a Lucene query and how to do it. We start by using the query parser, then move on to the full programmatic model. At this stage of the book, you will have a good understanding of the tools available to you as a search engine developer.
Chapter 8 describes Hibernate Search filters and gives examples where cross-cutting restrictions are useful. You will see how to best benefit from the built-in cache and explore use cases such as security filtering, temporal filtering, and category filtering.
PART 4 focuses on performance and scalability.
Chapter 9 brings in one chapter all the knowledge related to Hibernate Search and Lucene optimization. All areas are covered: indexing, query time, index structure, and index sharding.
Chapter 10 describes how to cluster a Hibernate Search application. You will understand the underlying problems and be introduced to various solutions. The benefits and drawbacks of each will be explored. This chapter includes a full configuration example.
PART 5 goes beyond Hibernate Search and explores advanced knowledge of Lucene.
Chapter 11 describes ways to access the native Lucene APIs when working with Hibernate Search. While this knowledge is not necessary in most applications, it can come in handy in specific scenarios.
Chapter 12 takes a deep dive into Lucene scoring. If you always wanted to know how a full-text search engine order results by relevance, this chapter is for you. This will be a gem if you need to customize the scoring algorithm.
Chapter 13 gives you an introduction to some of Lucene’s contribution projects like text highlighting, spell checking, and so on.
All source code in listings and in text is in a fixed-width font just like this to separate it from normal text. Additionally, Java class names, method names, and object properties are also presented using fixed-width font. Java method names generally don’t include the signature (the list of parameter types).
In almost all cases the original source code has been reformatted; we’ve added line breaks and reworked indentation to fit page space in the book. It was even necessary occasionally to add line continuation markers.
Annotations accompany all of the code listings and are followed by numbered bullets, also known as cueballs, which are linked to explanations of the code.
Hibernate Search and Hibernate Core are open source projects released under the Lesser GNU Public License 2.1. You can download the latest versions (both source and binaries) at http://www.hibernate.org.
Apache Lucene is an open source project from the Apache Software Foundation released under the Apache Public License 2.0. Lucene JARs are included in the Hibernate Search distribution but you can download additional contributions, documentation and the source code at http://lucene.apache.org.
The source code used in this book as well as various online resources are freely available at http://book.emmanuelbernard.com/hsia or from a link on the publisher’s website at http://www.manning.com/HibernateSearchinAction
Purchase of Hibernate Search in Action includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the lead author and from other users. To access the forum and subscribe to it, point your web browser to http://www.manning.com/HibernateSearchinAction or http://www.manning.com/bernard. This page provides information on how to get on the forum once you’re registered, what kind of help is available, and the rules of conduct on the forum.
Manning’s commitment to our readers is to provide a venue where a meaningful dialog between individual readers and between readers and the authors can take place. It’s not a commitment to any specific amount of participation on the part of the authors, whose contribution to the AO remains voluntary (and unpaid). We suggest you try asking the authors some challenging questions lest their interest stray!
The Author Online forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
EMMANUEL BERNARD graduated from Supelec (French “Grande Ecole”) then spent a few years in the retail industry as a developer and architect. That’s where he started to be involved in the ORM space. He joined the Hibernate team in 2003 and is now a lead developer at JBoss, a division of Red Hat.
Emmanuel is the cofounder and lead developer of Hibernate Annotations and Hibernate EntityManager (two key projects on top of Hibernate Core implementing the Java Persistence(tm) specification) and more recently Hibernate Search and Hibernate Validator.
Emmanuel is a member of the JPA 2.0 expert group and the spec lead of JSR 303: Bean Validation. He is a regular speaker at various conferences and JUGs, including JavaOne, JBoss World and Devoxx.
JOHN GRIFFIN has been in the software and computer industry in one form or another since 1969. He remembers writing his first FORTRAN IV program in a magic bus on his way back from Woodstock. Currently, he is the software engineer/architect for SOS Staffing Services, Inc. He was formerly the lead e-commerce architect for Iomega Corporation, lead SOA architect for Realm Systems and an independent consultant for the Department of the Interior among many other callings.
John has even spent time as an adjunct university professor. He enjoys being a committer to projects because he believes “it's time to get involved and give back to the community.”
John is the author of XML and SQL Server 2000 published by New Riders Press in 2001 and a member of the ACM. John has also spoken at various conferences and JUGs.
He resides in Layton, Utah, with wife Judy and their Australian Shepherds Clancy and Molly.
By combining introductions, overviews, and how-to examples, the In Action books are designed to help learning and remembering. According to research in cognitive science, the things people remember are things they discover during self-motivated exploration.
Although no one at Manning is a cognitive scientist, we are convinced that for learning to become permanent it must pass through stages of exploration, play, and, interestingly, retelling of what is being learned. People understand and remember new things, which is to say they master them, only after actively exploring them. Humans learn in action. An essential part of an In Action guide is that it is example-driven. It encourages the reader to try things out, to play with new code, and explore new ideas.
There is another, more mundane, reason for the title of this book: our readers are busy. They use books to do a job or to solve a problem. They need books that allow them to jump in and jump out easily and learn just what they want just when they want it. They need books that aid them in action. The books in this series are designed for such readers.
The illustration on the cover of Hibernate Search in Action is captioned “Scribe” and is taken from the 1805 edition of Sylvain Maréchal’s four-volume compendium of regional dress customs. This book was first published in Paris in 1788, one year before the French Revolution. Each illustration is colored by hand.
The colorful variety of Maréchal’s collection reminds us vividly of how culturally apart the world’s towns and regions were just 200 years ago. Isolated from each other, people spoke different dialects and languages. In the streets or the countryside, they were easy to place—sometimes with an error of no more than a dozen miles—just by their dress. Dress codes have changed everywhere with time and the diversity by region, so rich at the time, has faded away. It is now hard to tell apart the inhabitants of different continents, let alone different towns or regions. Perhaps we have traded cultural diversity for a more varied personal life—certainly a more varied and faster-paced technological life.
At a time when it is hard to tell one computer book from another, Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional life of two centuries ago, brought back to life by Maréchal’s pictures.