In the solr search engine, a collection is a set of documents that share the same field structure.
The solr installation package includes a sample collection “collection1”. In the simplest cases, this collection can be used out-of-the-box, or doing small changes to the configuration files of the collection. But, in other cases, it may be desirable to create several collections to index different types of documents. This post explains how to create and configure a new collection in a solr installation.
Create the configuration files for the collection
In the default installation of solr, the configuration files for the sample “collection1” can be found under the directory “example/solr/collection1”.
To create a new collection “mycollection”, start by creating a directory “/example/solr/mycollection”. In this directory, there will be two subdirectories “conf” and “data”, and a file “core.properties”.
The “conf” subdirectory will contain the configuration files for the new collection. The files “solrconfig.xml” and “schema.xml” from the sample “collection1” can be copied to be used as templates.
$ cd example/solr
$ mkdir micoleccion micoleccion/conf micoleccion/data
$ echo "name=micoleccion" > micoleccion/core.properties
$ cp collection1/conf/solrconfig.xml micoleccion/conf
$ cp collection1/conf/schema.xml micoleccion/conf
Note: Besides solrconfig.xml and schema.xml, there are other configuration files under collection1/conf: “spellings.txt”, “stopwords.txt”, “currency.xml”,… These files are referenced in the field type definitions found in the sample “schema.xml”. If the definitions of those data types are kept after editing this file, the referenced files will need to be copied from collection1/conf to mycollection/conf.
The configuration file schema.xml contains the definitions of the fields that can appear in a document, and the definitions of the data types of those fields.
This file needs to be edited to add, modify or delete those definitions, to match the type of documents that will be indexed in the collection.
Usually, the primitive data types string, boolean, int, float, date, etc. can be kept unchanged.
Besides, to enable full-text searches, generic text types such as “text_general”, or language specific text types such as “text_en” for english text, or “text_es” for spanish text, can be kept as they appear in configuration file.
Finally, the sample field definitions need to be replaced with the definitions of the actual fields that will appear in documents in the new collection.