Diving deep with MongoDB Indexing

08 / Jan / 2016 by Ajay Sharma 0 comments

Whether you are a lazy programmer dreaming to achieve more by doing less or a super serious databases professional aspiring to make a mark in the land of DATA, Indexing is THE thing you can’t afford to live without. Why? Because of follows:

  1. Performance
  2. Unbelievably better performance
  3. Performance better than James Bond!!!!!

Yes database indexing is real and it exists in our physical world. As you read on this article you will find that indexing is rather simple and if applied intelligently can produce really awesome results.

Definition (As provided on Wikipedia):
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.

Wait! English please…
Suppose someone asks you to find all pages in a book that start with the letter “F”. What would be your approach to find such pages manually? You simply don’t have much choice but to skim through all the pages and look at initial letter on every page to find which pages start with the given letter. Unless, you guessed it right kiddo, unless the book is a dictionary! If the given book is a dictionary you would simply have to count the pages for words beginning with letter “F”. This implies that if we invest a little effort for classifying information based on some important filter criteria, the time taken to search important information decreases drastically.

Let’s take another example. Think of a User table (with usual columns “Username”, “Email”, “Country”, “Date_Of_Birth”) containing millions of records. A simple select query to get all records with usernames beginning with “F” can take too much time as the server has to search through all the records in the table. (You do know that databases don’t keep sorted data by default, right?) But if however you manage to keep an index on “Username” in an ascending or descending order (like in phone-book), you will be left with a much smaller sub-set to query your data on and hence the time required to find such documents (MongoDB equivalent of “Rows”) will be much much less.

But the really cool thing about indexing is that it can be applied on more than one field for any given collection. (Collection roughly corresponds to a “Table” and field for a “Column” in MongoDB terminology). You can think of this like a phone-book sorted by Contact Names as well as Phone Numbers. How cool is that? Right? Wanna know how do databases achieve indexing on multiple fields? Refer this link.

Now let’s dive even deeper into how indexing is applied in MongoDB.

Indexing on a single field:

db.Users.ensureIndex({"Username":1})

That’s right. It’s that freaking simple. The tricky part about indexing is not the part of applying them, it is the task of identifying efficient indices that make them such a huge star. Suppose we have above Users collection and we have also applied index on Username like above. Now will this index help if we are querying to get all “Email” fields starting with any given letter say “F”? No Monseigneur, not at all. Why? Obviously because the index in not applied on “Email” field but on “Username” field. Solution? Simple, we kill the Bat!!

db.Users.ensureIndex({"Email":1})

So now our Users collection is indexed on two fields- Username and Email.

What is the “1” in that line doing?
It is the direction of the index. Both the indices created above will arrange data in ascending alphabetic order. For reverse alphabetic order, replace “1” with “-1″ like following:

db.Users.ensureIndex({"Email":-1})

How to check if a collection is already indexed?

db.Users.stats()

Above command will provide all the statistics of the Users table including a detailed report of its indices like follows:

"ns" : "mydb.Users",
"count" : 10930,
"size" : 11412704,
... : ...
... : ...
... : ...
"indexSizes" : {
"_id_" : 367920,
"Username_1" : 245280,
"Email_1" : 228928
}

How and when to apply Compound Indexing in MongoDB?
I want to get all users with “Country” India and “Email” starting with “A”. Are my indices going to help me? Yes but not as much as you would want them to. This is the perfect time to understand how to choose an efficient index.

An index can only be helpful when it is applied after analyzing what kind of queries will be performed frequently on that collection. Suppose I am running an email server and I want to send birthday wishes to my users but to only those belonging to some specific countries. This task will be run daily. If my User collection is huge enough, I should surely apply an index on “Country” and “Date_Of_Birth” so as to fasten this operation. To achieve this I can apply following index:

db.Users.ensureIndex({"Country":1, "Date_Of_Birth":1})

Remember, above command creates one index that maps both “Country” and “Date_Of_Birth” together. No, it is not equivalent to following commands:

db.Users.ensureIndex({"Country":1})
db.Users.ensureIndex({"Date_Of_Birth":1})

Watch out!!! Don’t be tempted to think that “fields” once included in an index will optimize all queries consisting those “columns”. You need to arrange your queries in a manner favorable to indices so as to achieve any optimization using them. E.g. for the index applied on fields “Country” and “Date_Of_Birth”, the filter queries should first mention the “Country” filter and then the “Date_Of_Birth” filter and not the other way round.

How are indices analyzed for a query by MongoDB?
Do remember that compound indices are applicable in more than one ways. Their applicability progresses as per the subsets. Refer below diagram:
indexes

So above index will be relevant and helpful in three types of queries:

  1. Queries matching ‘Date_Of_Birth’
  2. Queries matching ‘Date_Of_Birth’ and ‘Name’ in that order
  3. Queries matching ‘Date_Of_Birth’,  ‘Name’ and ‘Email’ in that order

I want to get all users with “Username” starting with “F” and “Email” starting with “M”. Are my indices going to help me? In order to understand that you need to understand how MongoDB utilizes indices to find which index are most suitable to use for any particular query.
For every query MongoDB performs, it creates several plans to perform that query as per applicable indices.

MongoDB has a query optimizer and is very clever about choosing which index to use. When you first do a query, the query optimizer tries out a number of query plans concurrently. The first one to finish will be used, and the rest of the query executions are terminated. That query plan will be remembered for future queries on the same keys. The query optimizer periodically retries other plans, in case you’ve added new data and the previously chosen plan is no longer best.

How to check the efficiency of my index?
There will come a day (and I really wish it never comes) when the world around you will come close to an end as even your best queries will take too long to complete. On that my friend, your last hope would be to analyze your queries yourself. There is a very helpful and utterly easy to use command called explain
db.Users.find({'Name':'xyz', 'Country':'Canada'}).explain()

‘explain’ function does exactly what it is meant to do. It provides a detailed JSON document explaining all aspects of your query:

  • Indexes applied
  • Records Filtered
  • time taken
  • many other important things that at first might seem to be rubbish

What next?
Although this blog is aimed at providing a fair idea about MongoDB indexing, your curious souls might be interested in studying about Geospatial IndexingIndex Administration and other advanced stuff.

Happy Indexing!!

Get in touch with our MongoDB developers in case of any query.

FOUND THIS USEFUL? SHARE IT

Leave a comment -