Part 3: Replica Sets in the Wild

A primary with 8 secondaries

This post assumes that you know what replica sets are and some of the basic syntax.
In part 1, we set up a replica set from scratch, but real life is messier: you might want to migrate dev servers into production, add new slaves, prioritize servers, change things on the fly… that’s what this post covers.

Before we get started…

Replica sets don’t like localhost. They’re willing to go along with it… kinda, sorta… but it often causes issues. You can get around these issues by using the hostname instead. On Linux, you can find your hostname by running the hostname command:

$ hostname
wooster

On Windows, you have to download Linux or something. I haven’t really looked into it.

From here on out, I’ll be using my hostname instead of localhost.

Starting up with Data

This is pretty much the same as starting up without data, except you should backup your data before you get started (you should always backup your data before you mess around with your server configuration).

If, pre-replica-set, you were starting your server with something like:

$ ./mongod

…to turn it into the first member of a replica set, you’d shut it down and start it back up with the –replset option:

$ ./mongod --replSet unicomplex

Now, initialize the set with the one server (so far):

> rs.initiate()
{
        "info" : "Config now saved locally.  Should come online in about a minute.",
        "ok" : 1
}

Adding Slaves

You should always run MongoDB with slaves, so let’s add some.

Start your slave with the usual options you use, as well as –replSet. So, for example, we could do:

$ ./mongod --dbpath ~/dbs/slave1 --port 27018 --replSet unicomplex

Now, we add this slave to the replica set. Make sure db is connected to wooster:27017 (the primary server) and run:

> rs.add("wooster:27018")
{"ok" : 1}

Repeat as necessary to add more slaves.

Adding an Arbiter

This is very similar to adding a slave. In 1.6.x, when you start up the arbiter, you should give it the option –oplogSize 1. This way the arbiter won’t be wasting any space. (In 1.7.4+, the arbiter will not allocate an oplog automatically.)

$ ./mongod --dbpath ~/dbs/arbiter --port 27019 --replSet unicomplex --oplogSize 1

Now add it to the set. You can specify that this server should be an arbiter by calling rs.addArb:

> rs.addArb("wooster:27019")
{"ok" : 1}

Demoting a Primary

Suppose our company has the following servers available:

  1. Gazillion dollar super machine
  2. EC2 instance
  3. iMac we found on the street

Through an accident of fate, the iMac becomes primary. We can force it to become a slave by running the step down command:

> imac = connect("imac.example.com/admin")
connecting to: imac.example.com/admin
admin
> imac.runCommand({"replSetStepDown" : 1})
{"ok" : 1}

Now the iMac will be a slave.

Setting Priorities

It’s likely that we never want the iMac to be a master (we’ll just use it for backup). You can force this by setting its priority to 0. The higher a server’s priority, the more likely it is to become master if the current master fails. Right now, the only options are 0 (can’t be master) or 1 (can be master), but in the future you’ve be able to have a nice gradation of priorities.

So, let’s get into the nitty-gritty of replica sets and change the iMac’s priority to 0. To change the configuration, we connect to the master and edit its configuration:

> config = rs.conf()
{
        "_id" : "unicomplex",
        "version" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "prod.example.com:27017"
                },
                {
                        "_id" : 1,
                        "host" : "ec2.example.com:27017"
                },
                {
                        "_id" : 2,
                        "host" : "imac.example.com:27017"
                }
        ]
}

Now, we have to do two things: 1) set the iMac’s priority to 0 and 2) update the configuration version. The new version number is always the old version number plus one. (It’s 1 right now so the next version is 2. If we change the config again, it’ll be 3, etc.)

> config.members[2].priority = 0
0
> config.version += 1
2

Finally, we tell the replica set that we have a new configuration for it.

> use admin
switched to db admin
> db.runCommand({"replSetReconfig" : config})
{"ok" : 1}

All configuration changes must happen on the master. They are propagated out to the slaves from there. Now you can kill any server and the iMac will never become master.

This configuration stuff is a bit finicky to do from the shell right now. In the future, most people will probably just use a GUI to configure their sets and mess with server settings.

Next up: how to hook this up with sharding to get a fault-tolerant distributed database.

  • Great set of tutorials!
    Just wondering what the best way is to add a replica set to a shard? Is it just a case of following the steps above for each mongod shard instance?

  • kristina1

    That's part 4! Basically, do what this post describes to create a shard. Then connect to your mongos and do:

    db.runCommand({“addshard” : “unicomplex/localhost:27017”, “allowLocal” : true})

    …replacing localhost:27017 with the address of a member of the set. Now your replica set is a shard.

  • Thanks for that. Another question:
    Is it better to start the mongod and mongos instances using a config file and fork the process, or to do it as a single command as you show? I have setup init scripts to start the forked processes if the server is restarted. The issue I see with doing this is that the config file specifies which shards, replica sets… it will be connecting too. These will obviously change as you add/remove shards etc. What are your thoughts?

  • kristina1

    Config files are definitely fine and should be used in most “real” setups. I just do it all via command line on my blog because I think it's easier for people to play around with that way.

    As far as specifying servers: for replica sets, you just need to give one server in the set (but you can give as many as you want) so something it tries to connect to should be up. Everything that's replica-set-aware knows that some servers might not exist and as long as they can find something that does exist, they'll use it.

    Shard information is stored permanently in the configuration database, you don't need to pass that on the command line.

  • gatesvp

    Christina, I'm a big MongoDB fan and I'm really happy this tutorial is in place.

    But I have one really big gripe: can you please use different servers in your examples?

    Your team has created “multi-server” tutorials where you do everything on “localhost”. Unless you actually want people to run “replica sets” on a single computer, please don't give us an example where you do just that.

    To illustrate the problem, look at these commands:

    $ hostname -> wooster
    $ ./mongod –replSet unicomplex/wooster:27017
    $ ./mongod –dbpath ~/dbs/slave1 –port 27018 –replSet unicomplex/wooster:27017
    $ ./mongod –dbpath ~/dbs/arbiter –port 27019 –replSet unicomplex/wooster:27017 –oplogSize 1

    In the “real world” I'm going to run those two commands on completely different computers. This means that I'm going to be changing the server instead of the port. If I'm a Mongo rookie, that's a big difference.

    It's especially big, b/c it's quite likely that I want the default port *and* the default folder for the master and the slave. But I probably want a non-default port/folder for the arbiter (who's probably just another server in the data center).

    In fact, you even confused yourself with the whole port mess, the following line is wrong (port #):
    > rs.add({“host” : “wooster:27018”, “arbiterOnly” : true})

    Instead here's what this looks like for people actually implementing this solution:
    Computer 1:
    ./mongod –replSet unicomplex/wooster:27017 –dbpath /db/data
    ./mongo wooster:27017 –eval 'rs.initiate()'

    Computer 2:
    ./mongod –replSet unicomplex/wabbit:27017 –dbpath /db/data
    ./mongo wooster:27017/admin –eval 'rs.add(“wooster:27018”)'

    Computer 3:
    ./mongod –dbpath ~/dbs/arbiter –port 27019 –replSet unicomplex/wooster:27017 –oplogSize 1
    ./mongo wooster:27017/admin –eval 'rs.add({“host” : “wooster:27018”, “arbiterOnly” : true})'

    (BTW, switching server sets when talking about demotion is also needlessly confusing)

    FYI, if you look at 10Gen's sharding example, you basically have the same problems. It starts failing the moment you try to use it on multiple boxes b/c it can't find “localhost” for the config DB. When I tried it, it also failed at finding host names and needed IPs.

    Again if you want people to use this on multiple boxes, please give us samples that work on multiple boxes. Otherwise I'll have to start my own blog and steal your traffic 🙂

  • kristina1

    I've tried to stick with localhost because I think people like trying things out and often don't have access to multiple machines. I think (hope!) people aren't using my blog as instructions on going into production.

    However, I'm definitely open to using multiple machines if people prefer. I'll give it a try. (Not the next post, it's too complicated to translate from multi-server to single server and easier to go from single to multi. I think.)

    You can do the shell commands as you describe, but there's no need to. You can start the shell anywhere, connect to one of the mongods, and run the commands I describe.

    Thanks for pointing out the code error, I fixed it.

    And you're welcome to my traffic! The more posts on this stuff the better!

  • gatesvp

    Kristina, the single-node “sharding” example is on the company website, in the official docs.

    If we can't find production-quality instructions on the company website AND we can't find them on the employee blogs where are we going to find them?

    There are not a lot of people blogging about production deploys of replica sets (probably b/c we don't yet have a production build). So for right now, you're it.

    And let's face it, any modern computer can fire up three small linux VMs and test out sharding / replica sets on 3 virtual boxes. You don't need 3 actual computers to test out sharding, just a copy of Virtual Box and a few gigs of RAM.

  • kristina1

    The single node sharding is in the official docs for the same reason it's in my blog: it's easier for people to try out. I'm not paid to blog, I just enjoy writing about this stuff. If you give me a couple bucks, I'll write whatever the hell you want 🙂

    A few other people have blogged about this already: http://blog.boxedice.com/2010/08/03/automating-… and http://www.coffeepowered.net/2010/08/06/setting… (single node).

  • Mongospanish

    Excellent post, Kristina. I’ve traslated it to Spanish: http://mongospanish.blogspot.com/2010/08/conjuntos-de-replica-parte-1.html

    It’s very interesting and useful.

  • Anonymous

    Very cool, thanks!

  • Pingback: ehcache.net()

kristina chodorow's blog