How to Setup Replica Sets in Mongo DB (AWS EC2)

22 / Jun / 2015 by Neeraj Gupta 7 comments

MongoDB is a popular NoSQL database that stores data in JSON like documents dynamic schemas instead of storing your data in tables and rows as you do with a relational database.

In this blog we will setup Mongo DB replication on single node to demonstrate high availability and failover mechanism.

What is a MongoDB Replica Set ?

We can achieve high availability and failover with the help of Replica Sets in MongoDB. In this configuration application/client can read and write data from/to primary node. The changes in primary node are replicated to secondary node using oplog (operations log). The secondary node can be used for read only operations depending on the use case. The main advantage of replica set is that it offers automatic failover mechanism in case primary node goes down.

Replica Set Components:

1. Primary Members: By default, all read, write transactions are made on the primary node
2. Secondary Members: Any changes made on primary nodes are replicated to secondary nodes using oplogs.
3. Arbitar Members: An arbitar is an optional member of the replica set that does not take part in replication. It only takes part in the selection process of new primary.
4. Selection Process: In case a primary node goes down, the automatic selection process takes place between remaining secondary nodes. And if number of secondary nodes are even, arbitar member acts as a tie-broker to ensure decision is reached.

Introduction_1


Note:
1. Data on secondary node can be outdated as replication between primary and secondary nodes is asynchronous. So this feature should be carefully enabled in production environment.
2. This kind of implementation requires code level changes to redirect read queries to secondary nodes and write queries to primary node.

1. Setup Replica Sets on Single EC2 Instance

For setting up test environment in few clicks, I am using AWS EC2 (Amazon Linux) for demonstration purpose. You can download Mongo package from MongoDB official website. Once you have MongoDB downloaded, you need to create a data folder for each mongo instances. I’m on a Amazon Linux (t2.micro) and created mongo-data0, mongo-data1 and mongo-data2.

Note: In production environment it is not recommended to run multiple Replica Set’s on a single machine with oplogSize 50, I am doing this only for demonstration purpose. 

You also need to run each instance on a separate port so I chose 27009, 27010 and 27011. I started the three instances as follows (in three separate Terminals):

[js]mongod –dbpath mongo-data0 –port 27009 –rest –replSet mongo-data –logpath log.mongo-data0 –fork –alogappend –smallfiles –oplogSize 50
mongod –dbpath mongo-data1 –port 27010 –rest –replSet mongo-data –logpath log.mongo-data1–fork –alogappend –smallfiles –oplogSize 50
mongod –dbpath mongo-data2 –port 27010 –rest –replSet mongo-data –logpath log.mongo-data2 –fork –alogappend –smallfiles –oplogSize 50 [/js]

Command Description:

Command Discription

Commands to setup replica set’s on single machines:

replicaset_3

Check the list of running mongo processes, using ps -ef | grep mongo (shown below):

Mongo_Replica_Set_Processes_4

2. Access Replica Sets
If you want you can add each node to have a separate name, to do so you can add the following entry to /etc/hosts:
127.0.0.1 mongo1 mongo2 mongo3
Once done you are ready to connect to Mongo DB Replica Set. You can start accessing replica set by using the command below in your terminal

[js]mongo –port 20017
> rs.initiate()
> rs.add("shard1.neerajg.in:27010")
> rs.add("shard1.neerajg.in:27011")[/js]

Note: In case of production environment, you will be running replica set’s on separate hosts, you can also add them using above mentioned command. For example:

[js]>rs.add(“host1.neerajg.in27010”)
>rs.add(“host2.neerajg.in:27010”)[/js]

Access Primary Replica Set (running on port 27009):
a. Initiate Replica Set on Primary mongo instance:

Mongo_Initiate_replica_set5

b. Check the status of primary mongo instance again

Mongo_replica_set_primary6

c. Add Secondary Replica Sets:

Mongo_replica_set_add_shard7

d. Print Replication Info:

print_replication_info8

3. Add data to primary node:

At this point we can add objects to the database and they’ll be replicated to the other instances – and you can read from the primary or the slaves.

[js]db.people.save( { name : "Neeraj", age: 26, emailid: "neeraj.gupta@tothenew.com" } )
db.people.find() [/js]

a. Add data to primary mongo instance using the below mentioned commands:

Add_Data_Primary9

You can try accessing secondary node and read data using the below mentioned commands.

[js]mongo –host mongo2 –port 22017
rs.slaveOk()
db.people.find()[/js]

Access Secondary Replica Set running on port 27010:

Mongo-Replica-Set-Seconday1_10a

Access Secondary Replica Set running on port 27011:

Mongo_replica_set_Seconday2_10b

4. Check the log files:

Accessing the log files:

a. Primary Replica Logs (27009):

Mongo_Primary_replica_logs_11

b. Secondary Replica Logs (27010):

Mogno_replica_seconday_log12

5. Test failover capabilities:

Now to test failover capabities, do the following:

a. You can either terminate terminal running primary mongodb node or extract process id using the below mentioned command and kill process.

[js]ps -ef | grep mongo

kill PID [/js]

Mongo_Kill_Primary13

b. During this activity kindly observe the other two terminals. It will show messages like transition to primary complete.

secondary_to_primary14

c. After few seconds, try to access the remaining two replica sets running on port 27010 and 27011. One of them will be primary and other will be secondary.

Mongo_Secondary_to_Primary_Example15

d. Operations supported:

i. Primary Mode: Read, write (to validate this try adding new content to the database by using below mentioned commands):

[js] db.people.save( { name : "Amit", age: 26, emailid: "ankit.kumar@tothenew.com" } ) [/js]

Above command will run successfully and add new content to the db (as shown in the above screenshot)

[js]db.people.find() [/js]

Above command will retrieve all the data stored in people collection
ii. Secondary Mode: Read Only (to validate this try adding new content to the database by using the below mentioned commands):

[js] db.people.save( { name : "Amit", age: 26, emailid: "ankit.kumar@tothenew.com" } ) [/js]

Mongo_replica_set_secondary_write_error16

NOTE: Above command will throw error messages, as seconday node runs in read only mode.

[js]db.people.find() [/js]

Above command will retrieve all data stored in people collection

FOUND THIS USEFUL? SHARE IT

comments (7)

  1. akash

    hi Neeraj,
    I used you blog and it helped me in setting setting up my cluster. I have one query that when one of my secondary becomes primary and then the node which was primary earlier restarts (considering it was part of an auto scaling group) then how will the new primary add it to the replica set.

    Reply
  2. Shara

    I have an production environment running with mongodb. Now i am planning to implement replication. Is there something in need to pre-configure before adding secondary members?

    Also whenever i add a secondary member to replica set, the primary becomes secondary. How to avoid this.

    Reply
  3. Neeraj Gupta

    Thanks for this.
    can you also add what would change to do the replication on a different EC2 instance ?
    There is no practical point of replicating on the same machine.

    Reply
    1. Neeraj Gupta

      Thanks for writing back. The changes required in case you are setting up replica set on multiple machines are:

      1. Running mongod on all the EC2 instances with same replica set name
      2. Use private DNS endpoints for all the EC2 instances for communication
      3. And use new private DNS endpoints while adding replica set nodes.

      I am using single machine to run complete replica set for demo/testing purpose only, it is not recommended to run in production environment.

      Reply
  4. Pingback: Mongo Delayed Node | TO THE NEW Blog

Leave a Reply

Your email address will not be published. Required fields are marked *