Getting up and Running With Cassandra

17 / Jan / 2011 by Sachin 3 comments

Luckily, I got some time outside my usual obligations in the project, to learn something new and I devoted the time to getting up and running with cassandra . Getting starting up with it was a bit bumpy, as the case almost always is when you start with something entirely new.

Lets get Started..

1) First step of course is to get the binary files for the cassandra from here

2) It is assumed that you have got jdk>1.5 installed on your machine and JAVA_HOME variable is set. Now extract the tar file to the location you want (I extracted in the /opt folder and created a soft link to it with name cassandra).

3) Create the necessary directories for cassandra in /var/lib and /var/log folders and change their ownership


sudo mkdir -p /var/lib/cassandra

sudo chown -R "user" /var/lib/cassandra

sudo mkdir -p /var/log/cassandra

sudo chown -R "user" /var/log/cassandra

Now move the extracted cassandra project move to the bin folder inside it (in my case /opt/cassandra/bin) and type the command


./cassandra  -f

The -f switch will ensure that cassandra runs in the foreground and its logs will print to standard output.

Now a few lines similar to a java stack trace will appear. I expected to see some kind of “successful or server running message” but there was none. So if you are not seeing any FATAL or ERROR messages it means you have succeeded in running cassandra. :)

Now notice in that stack trace that your cassandra runs on port 9160 by default.

So to connect to cassandra open a new terminal and start the cassandra CLI available in the same bin folder.


./cassandra-cli

You will see a message like

Welcome to cassandra CLI.

Type ‘help;’ or ‘?’ for help. Type ‘quit;’ or ‘exit;’ to quit.
[default@unknown]

typing help here will show you a list of commands available.

Now, we will connect to the cassandra server instance we started earlier.


connect localhost/9160;

Don’t miss that semicolon ;). All statements here must end with semicolon.

You will now be connected to a “Test Cluster”. It is the default cluster which comes with cassandra. Cluster is a container which encapsulates many ‘keyspaces’ and keyspaces are things similar to a database in relational DBMS. I better leave the data model topic here and concentrate on task at hand. I intend to take up data model topic in a separate blog.

So, Now we have a cluster to work in.

We will create a keyspace (database in relational DBMS world) and then enter and retrieve some data from it.

First of all lets see what all keyspaces are already available. Invoke


show keyspaces;

The  keyspaces you see are used by cassandra and are not to be intended to be used by the us. So lets make our own keyspace.


create keyspace CustomKeySpace with replication_factor=1;

use CustomKeySpace;

What we have just done is created a new keyspace and started using in. (Quite similar to create database <database-name>; and use <database-name>;) Lets leave replication_factor for now.

[default@unknown]
[default@unknown] use CustomKeySpace;
Authenticated to keyspace: CustomKeySpace
[default@CustomKeySpace]

notice the change from [default@unknown] to [default@CustomKeySpace]. It shows you are a “default” user which earlier was not using a keyspace and is now using CustomKeySpace. Another way to look at it is default user is “logged” into CustomKeySpace.

Next step is to create a table, just that it is called a column-family here.


create column_family user;

lets enter some data to this column family now.


[default@CustomKeySpace] set user ['sachin'] ['lname']= 'Anand' ;
Value inserted.
[default@CustomKeySpace] set user ['sachin'] ['email']= 'sachin[at]intelligrape[dot]com' ;

Value inserted.

Now we created two columns for user ‘sachin’ one is called [‘lname’] and contains value ‘Anand’, other is called ‘email’ and contains value sachin[at]intelligrape[dot]com.

To count the number of columns for a record


count user ['sachin'];

To retrieve the values from the database — you guessed it we will use get.


[default@CustomKeySpace] get user ['sachin'];
=> (column=656d61696c, value=73616368696e40696e74656c6c6967726170652e636f6d, timestamp=1295199962515000)
=> (column=1666e616d65, value=4562656e, timestamp=1295199873677000)
Returned 2 results.

problem here column names and values are coming in hex code here. So, we need to add some metadata to tell what kind of values we are expecting. here we go..


[default@CustomKeySpace] update column family user with column_metadata=[{column_name:lname, validation_class:UTF8Type},{column_name:email, validation_class:UTF8Type}];

Again writing get user command


[default@MyKeySpace] get user ['sachin'];
=> (column=656d61696c, value=sachin@intelligrape.com, timestamp=1295199962515000)
=> (column=666e616d65, value=Anand, timestamp=1295199873677000)
Returned 2 results.

So we have got the results we wanted. So in this small article we learnt how to start cassandra connect to it. create a new keyspace add a column_family and set and get data from it. I will be back with more on this for sure.

Thanks & Regards.

Sachin Anand

Email  : sachin[at]intelligrape[dot]com

FOUND THIS USEFUL? SHARE IT

comments (3)

  1. Deepak Rosario

    The column family creation syntax is as follows
    create column family ;
    Example:
    create column family user;

    Reply
  2. Keith

    For future viewers – you can complete this tutorial with some updated info on CassandraCli: http://wiki.apache.org/cassandra/CassandraCli

    Instead of “column_family”, you just remove the underscore and use a space. I think the underscore was just a typo because the author doesn’t appear to use it elswhere. Also, if you’re working with Node.js, I found this tutorial really helpful in conjunction with the racker/node-cassandra-client package (https://github.com/racker/node-cassandra-client) to get me up and running. The documentation on Cassandra/Node.js is *very* slim, but this article helped me get it up and running

    Reply
  3. Keith

    Hey! Great article on getting Cassandra up and running. I’m more of a programmer and less of a DB admin, so I initially had trouble with what would probably be an absolute no-brainer for someone with more apache experience:

    I knew I had to create the folders in the ‘var’ directory, but I thought this meant I needed to create a ‘var’ directory in ‘bin’ or the root ‘apache-cassandra-1.1.x’ folder. So I got lots of evil errors including

    log4j:ERROR setFile(null,true) call failed.
    java.io.FileNotFoundException: /var/log/cassandra/system.log (No such file or directory)
    java.io.IOException: unable to mkdirs /var/lib/cassandra/data/system/schema_columnfamilies

    I recalled seeing var folders in my apache2/MAMP setup, so I got the hunch that I may need to go to the MacHD root and find it there. Of course, it’s a hidden folder, so I started in terminal by doing a ‘ls -a’ command and found the var folder and then continued with your steps and voila!

    I did get an error at the point in your tutorial where we create column_family users; but I’m willing to bet it has something to do with me using a new version of Cassandra:

    [default@CustomKeySpace] create column_family user;
    Syntax error at position 7: no viable alternative at input ‘column_family’

    Again, thanks so much! I Google+ you and hope that others stumble across this – I searched for quite a while before I found it :(

    Reply

Leave a comment -