Jun 252014
 
Article Java

The home page of the solr site defines this software as an “open source enterprise search platform”. This post will review the functionality implemented by solr, and provide a step-by-step procedure to install solr on a Linux server.

Functionality

There is also in the home page of the official solr website a brief description of the functionality available through this software:

Solr is a search engine with a REST API. Documents XML, JSON, CSV or binary format can be feed to solr to be indexed. After a set of documents has been indexed, queries can be issued to solr to search the indexed content, and the responses are retrieved also in XML, JSON, CSV or binary format.

According to the home page of solr, the most relevant characteristics of this search platform are:

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Web Traffic
  • Standards Based Open Interfaces – XML, JSON and HTTP
  • Comprehensive HTML Administration Interfaces
  • Server statistics exposed over JMX for monitoring
  • Linearly scalable, auto index replication, auto failover and recovery
  • Near Real-time indexing
  • Flexible and Adaptable with XML configuration
  • Extensible Plugin Architecture

Besides, solr uses the Lucene search library, and extends its capabilities, offering:

  • A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
  • Powerful Extensions to the Lucene Query Language
  • Faceted Search and Filtering
  • Geospatial Search with support for multiple points per document and geo polygons
  • Advanced, Configurable Text Analysis
  • Highly Configurable and User Extensible Caching
  • Performance Optimizations
  • External Configuration via XML
  • An AJAX based administration interface
  • Monitorable Logging
  • Fast near real-time incremental indexing and index replication
  • Highly Scalable Distributed search with sharded index across multiple hosts
  • JSON, XML, CSV/delimited-text, and binary update formats
  • Easy ways to pull in data from databases and XML files from local disk and HTTP sources
  • Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
  • Apache UIMA integration for configurable metadata extraction
  • Multiple search indices

Prerequisites:  Java jRE

Solr requires a Java runtime environment (JRE) version 1.7 or above.

If there is no compatible Java JRE in the system where solr is going to be installed, the procedure below can be followed to perform the installation of the JRE:

First, download the installation package from the Oracle website:

http://www.oracle.com/technetwork/java/javase/downloads/index.html.

We chose to download “Server JRE (Java SE Runtime Environment) 8u5”. The name of the file downloaded is server-jre-8u5-linux-x64.tar.gz, and occupies 56 MB.

Once downloaded, the installation is done by just uncompressing the file in the directory where you want to place it. No root privileges are required, because the package is self-contained, and does not require to create files in system directories.

To illustrate this post, we have created a “solr” directory under the login directory. The JRE package will be uncompressed under that directory, and later the solr package itself will also be placed in it.

Installing Solr

First, download from the official solr website the installation package. We have downloaded a file named solr-4.8.1.tgz (146 MB).

Uncompress the file downloaded and move the uncompressed directory tree under the “~/solr” directory previously created:

Next, run the example jetty container included in the package:

and after that, you can point your browser to the url http://localhost:8983/solr/, and have a first look at the solr administration panel:

solr-admin-screen

 

Securing the solr installation

The default installation of solr does not take into consideration any kind of security measures.  If the server where solr has been installed is connected to internet, any user can point a browser to http://SERVER:8983/solr/, and have access to the administration panel.

There are many possible ways to protect solr against unauthorized access. In the next sections two of the most typical cases are explained:

 

Limit access to jetty to only local connections

Maybe the most common way to protect the solr installation is to edit the configuration of jetty (or the configuration of the container that is being used to run solr), to reject connection requests from computers other than localhost.

To do this, edit the file /web/jetty.xml, and change the lines “<Set name="Host">...</Set>” as follows:

But you may need to allow remote access to the administration panel in a controlled way. In most cases, there is another web server (usually, an instance of apache) running on the same server as solr. You can configure a “reverse proxy” on that server, to redirect to http://localhost:8983/solr/ the request arriving to http://SERVER/solr/. In this type of setup, the reverse proxy can be protected with username/password, using the standard procedure of the apache web server.

This is performed adding to the apache configuration file the following lines:

Under the directory “/path/to/htpasswd” a file .htpasswd will hold the username and password that will be granted access to the solr adminsitration panel. This .htpasswd can be generated using the “htpasswd” command included in the apache web server package. For instance:

Protect access to jetty with username/password

Alternatively, of there is no other web server running on the same server where solr has been installed, the configuration of the jetty container itself can be edited to ask for a username/password:

1. In the configuration file example/etc/webdefault.xml, add:

2. In example/etc/jetty.xml, add:

3. Create a file  example/etc/realm.properties, with the username, password and associated role:

Then, restart the container. The next time the administration panel is accessed, jetty will request a validation to proceed. If the configuration is correct, you will be granted access with username “admin”, and password “s3cr3t”

Installing Solr in other containers

The jetty container included in the solr installation package may be adecuate in many cases. But, in case the setup of your server requires installing solr in other type of java container, the procedures to follow for the most widespread types of container are documented in the Solr wiki:

Indexing documents in Solr

As we have already mentioned at the beginning of this post, the first step in using solr is indexing a set of documents. Searches performed in solr will be done against the content of the documents that have been previously indexed.

Solr accepts many different types of documents. Among them, structured documents can benefit most from all the advanced features of this search engine.

Below is a sample structured document, in JSON format:

In this sample, it is easy to identify the document as a book reference. The information in it is structured as a set of (name,value) attributes. This structure will permit later to specify constraints against the values of these attributes, in the searches that will be done.

In the examples/exampledocs directory there are several files that contain sets of structured documents ready to be indexed in solr. Specifically, the file examples/exampledocs/books.json includes the sample document above, together with other documents of the same type, grouped in a JSON array.

This file can be uploaded to solr using the “post.jar” utility included in the package, as follows:

If not told otherwise, post.jar uploads documents to solr issuing HTTP POST requests to the URL  http://localhost:8983/solr/update. But this and other default settings can be modified adding parameters to the call to post.jar. For instance, if solr has been installed in a Tomcat container that uses port 8080, a parameter “url” can be added to tell post.jar the url to use in connecting to solr:

We can see also in this example that the “type” parameter is used to specify the format of the file to be indexed.

Issuing searches to solr

The solr admin panel includes a form to issue searches on the indexed documents. To display the search form, selec “collection1” in the dropdown “Core selector” in the left menu, and then select “Query” in the submenu that appears under it:

solr-query

In the search form, we can find may options that can be used to specify the search criteria. We entered in the “q” field the text  “Lightning”. After clicking on “Execute Query”, the document that contains the word “Lightning” is displayed in the screen, in JSON format:

And this finishes this basic introduction to solr. Naturally, there is yet much to explain about solr features, available indexation options and advanced searches. These will be treated in other posts in this series. Stay tuned!

References

 Posted by at 1:47 pm

 Leave a Reply

(required)

(required)