Apache Solr 5.1.0

15 / Jun / 2015 by Ankit Kumar 0 comments

Apache Solr is an open source search platform built upon a Java library called Lucene.

Solr is a popular search platform for Web sites because it can index and search multiple sites and return recommendations for related content based on the search query’s taxonomy. Solr is also a popular search platform for enterprise search because it can be used to index and search documents and email attachments.

Solr works with Hypertext Transfer Protocol (HTTP) Extensible Markup Language (XML). It offers application program interfaces (APIs) for Javascript Object Notation (JSON), Python, and Ruby. According to the Apache Lucene Project, Solr offers capabilities that have made it popular with administrators including:

Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces – XML, JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Linearly scalable, auto index replication, auto failover and recovery
Near Real-time indexing
Flexible and Adaptable with XML configuration
Extensible Plugin Architecture

Overview

In this blog of Apache Solr ,we will show you how to install the latest version of Apache Solr 5.1.0 configuration and perform the index using a data file. Apache Solr supports indexing from different source formats including various databases, PDF files, XML files, CSV files etc.

We will look how to index basic data from a CSV file by using apache solr.

There is recent changes in directory structure of solr from previous versions solr 4.x for reference follow link.

https://lucene.apache.org/solr/5_0_0/changes/Changes.html#v5.0.0.upgrading_from_solr_4.x

Download the Apache solr latest version from below link on your system:

wget http://apache.mirrors.ionfish.org/lucene/solr/5.1.0/solr-5.1.0.zip
unzip solr-5.1.0.zip
cd solr-5.1.0

Apache solr directory Structure Overview:

Screenshot_

Start Solr

bin/solr start -p 8983

 -p specifies port number (by default port number 8983)

Screenshot from 2015-06-15 11:44:04

 

Core Creation

When the Solr server is started in Standalone mode the configuration is called core and when it is started in SolrCloud mode the configuration is called Collection. In this we will discuss about the standalone server and core.

SolrCloud Collection mode

solr create_collection [-c collection] [-d confdir] [-n configName] [-shards #] [-replicationFactor #] [-p port]

Standalone Mode

solr create_core [-c core] [-d confdir] [-p port]

bin/solr create_core -c power1 -d sample_techproducts_configs -p 8983

This will create core power1 for solr running on port 8983 with example configuration with many optional features enabled to demonstrate the full power of Solr.

Now all your cores created <ApacheSolr_Home>/server/solr/ folder.(like i have created power1)

Screenshot from 2015-06-15 11:45:18

 As you see in below screenshot Core power1 has been created when you select the Core Selector.

Screenshot from 2015-06-15 11:46:06

Core Power1 Directory structure

Screenshot from 2015-06-15 11:48:20

Create the Sample CSV File to index and save it as testing.csv

Content of testing.csv

Screenshot from 2015-06-15 11:49:59

 

cat,_version_,price_c,series_t,id,author,rank,price,sequence_i,name,author_s,inStock,genre_s
book,1503669092110827520,"7.99\,USD",A Song of Ice and Fire,0553573403,George R.R. Martin,1.2,7.99,1,A Game of Thrones,George R.R. Martin,true,fantasy
book,1503669092137041920,"7.99\,USD",A Song of Ice and Fire,0553579908,George R.R. Martin,2.3,7.99,2,A Clash of Kings,George R.R. Martin,true,fantasy
book,1503667880901738496,"17.99\,USD",Foundation Novels,0553293354,Isaac Asimov,,17.99,1,Adobe_ttn,Isaac Asimov,true,scifi
book,1503667880903835648,"6.99\,USD",the Chronicles of Amber,0380014300,Roger Zelazny,,6.99,1,Nine Princes In Amber,Roger Zelazny,true,fantasy
book,1503667880904884224,"5.99\,USD",The Chronicles of Prydain,080508049X,Lloyd Alexander,,5.99,2,The Black Cauldron,Lloyd Alexander,true,fantasy
book,1503667880905932800,"15.03\,USD",The Chronicles of Prydain,080508056X,Ankit,,15.03,2,The Black Cauldron,Ankit,true,fantasy
book,1503667880906981376,"16.99\,USD",snapper,0812550706,Orson Scott Card,,16.99,1,snapper_ttn,Orson Scott Card,true,scifi

Schema.xml

It is usually the first file you configure when setting up a new Solr installation.

The schema declares:

  • what kinds of fields there are
  • which field should be used as the unique/primary key
  • which fields are required
  • how to index and search each field

Now we need to make change in schema.xml (server\solr\power1\conf) which provide by default by solr while using the sample_techproducts_configs (Example configuration with many optional features enabled to demonstrate the full power of Solr) as per the our content in file testing.csv put on solr for indexing.We define the field which we are going to use in our testing.csv.

Edit the schema.xml file in the \server\solr\power1\conf folder and add the following contents after the uniqueKey element.

 <field name=”name” type=”text_general” indexed=”true” stored=”true”/>

<field name=”cat” type=”string” indexed=”true” stored=”true” multiValued=”true”/>

<field name=”price”  type=”float” indexed=”true” stored=”true”/>

<field name=”inStock” type=”boolean” indexed=”true” stored=”true” />

<field name=”store” type=”location” indexed=”true” stored=”true”/>

<field name=”rank” type=”float” indexed=”true” stored=”true”/>

<field name=”title” type=”text_general” indexed=”true” stored=”true” multiValued=”true”/>

<field name=”subject” type=”text_general” indexed=”true” stored=”true”/>

<field name=”author” type=”text_general” indexed=”true” stored=”true”/>

<field name=”keywords” type=”text_general” indexed=”true” stored=”true”/>

<field name=”category” type=”text_general” indexed=”true” stored=”true”/>

<dynamicField name=”*_t”  type=”text_general”    indexed=”true”  stored=”true”/>

<dynamicField name=”*_s”  type=”text_general”    indexed=”true”  stored=”true”/>

Note: As  we are using sample_techproducts_configs so field name and dynamic Field already present in schema.xml we need to check it before adding line in schema.xml.

Repetition will throw error at the time of indexing.

Attribute indexed to true. This specifies the field is used for indexing and the record can be retrieved using the index. Setting the value to false will make the field only stored but can’t be queried with.Another attribute called stored and set it to true. This specifies the field is stored and can be returned in the output. Setting this field to false will make the field only indexed and can’t be retrieved in output.

 For Indexing in solr please reference the below link

http://www.solrtutorial.com/basic-solr-concepts.html

After changes in schema.xml restart the solr

bin/solr restart -p 8983

Indexing the content

Solr includes a simple command line tool for POSTing various types of content to a Solr server. The tool is bin/post. The bin/post tool is a Unix shell script

bin/post -c power1 testing.csv -p 8983

bin/post -c core_name <file_name_to_index> -p <Port_number>

For more details use bin/post -help

Search Indexed File

Search the indexed by using the schema browser option available on apache solr dashboard.

Select the required field name for result.

Screenshot from 2015-06-15 11:50:57

We selected the author field to fetch the required details from solr

Screenshot from 2015-06-15 11:52:31

Select required author available in author field name to get the details

i.e we have selected author ankit to get respective details

Screenshot from 2015-06-15 11:53:26 

We have option available to view data in different format json,xml,csv etc.

 

 

FOUND THIS USEFUL? SHARE IT

Leave a comment -