Lucene Vs Solr
Lucene is a search library built in Java. Solr is a web application built on top of Lucene. Certainly Solr = Lucene + Added features. Often there would a question, when to choose Solr and when to choose Lucene.
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search.
- To have more control. It is a plain Jar and it could be used as the way we require.
- Cannot depend on any Web server.
- To use termvector, termdocs etc. For example, to calculate the most indexed term in a given period of time. TermVector gives information about the terms and its occurances.
- Lot of contrib modules like spell checker, hit highlighting are available.
- Near real time search support.
- It is widely used in many of the open source projects. There are lot more derivate search products available on top of Lucene.
- Incremental Updates: When ever new documents are added, IndexReader needs to reopen to get the new documents reflect in search.
- Warming the searcher: When ever new searcher is opened or reopened, It should be warmed by performing couple of search. This will help to load the cache and subsequent search will be faster.
Solr major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. This site is powered by Solr.
Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs.
- To index and search docs easily by writting few code
- Solr is a standalone App and it takes care most of the stuff like incremental updates, warmup the reader etc.
- Solr could be extended to multiple nodes. It supports distributed search but not distributed indexing.
- To use Facet search and hit highlighting.
- For Java developers, Solrj library is available, which helps to communicate with the server via API.
- Solr could be used from any programming language which supports HTTP/XML and JSON.
Summary: To get more control use Lucene. For faster development, easy to learn, choose Solr.
comments powered by Disqus
SourceForge.NET is most popular and widely used Forge. It helps to host software projects. It has integrated support for Wiki, Forum, Tracker and Full text search. The code base named allura, is completely built on open source stack. This article explains few important one used to build SourceForge.NET.
Lily currently offers an open source content repository. It is the first cloud-scalable repository for social content applications. It is built from ground up using Big Data and NOSQL technology. Its technology stack includes Hadoop, HBase and Solr.
Wikipedia is a multilingual, collaboratively edited encyclopedia. It is one of the busiest site in the world. It has more than 8 million articles and accessed by millions of users around the world. This article briefly discuss about the open source software used in Wikipedia.
LinkedIn is a social network for professionals. LinkedIn handles millions of searches as well as hundreds of thousands of updates daily. They sponsored many projects to open source. Here are the list of open source products used by LinkedIn.
Pinterest is a tool for collecting and organizing things you love. It is a social networking site where users could pin images and write a note for that. It is now currently serving billions of pages every month. Check out the open source products used in Pinterest.
Zimbra is a Enterprise messaging and collaboration software. It is a good alternative to Microsoft Exchange server. Zimbra is a Email Server but they have not written anything related to SMTP server, rather they have integrated well know open source email server in to its package. Zimbra is a system of well connected / integrated multiple open source software which delivers enterprise quality to it. This article explains most important components which helps to build the enterprise product.
Magnolia CMS is one among popular java based CMS. It has support of CMS, DMS, Wiki, Forum and lot more features. This article discusses about the open source software used to build Magnolia CMS.
Twitter uses many open source products and also contributes most of the code to open source. Here is the list of open source products used by Twitter. This list does not include the projects sponsored by twitter.
Meta Search engine is nothing but a search engine which searches more than one search engine and combines or filters the results. Each search engine has its own proprietary ranking mechanism to rank the results. When combined the search results from all leading search engines would be more informative and useful. With less page traversals we will end up our destination.