Replica Sets Part 2: What are Replica Sets?

The US's current primary

If you want to jump right into trying out replica sets, see Part 1: Master-Slave is so 2009.

Replica sets are basically just master-slave with automatic failover.

The idea is: you have a master and one or more slaves. If the master goes down, one of the slaves will automatically become the new master. The database drivers will always find the master so, if the master they’re using goes down, they’ll automatically figure out who the new master is and send writes to that server. This is much easier to manage (and faster) than manually failing over to a slave.

So, you have a pool of servers with one primary (the master) and N secondaries (slaves). If the primary crashes or disappears, the other servers will hold an election to choose a new primary.

Elections

A server has to get a majority of the total votes to be elected, not just a majority. This means that, if we have 50 servers and each server has 1 vote (the default, later posts will show how to change the number of votes a server gets), a server needs at least 26 votes to become primary. If no one gets 26 votes, no one becomes primary. The set can still handle reads, but not writes (as there’s no master).

Part 3 will cover demoting a primary

If a server gets 26 or more votes, it will becomes primary. All future writes will be directed to it, until it loses an election, blows up, gets caught breaking into the DNC, etc.

The original primary is still part of the set. If you bring it back up, it will become a secondary server (until it gets the majority of votes again).

Three’s a crowd (in a good way)

One complication with this voting system is that you can’t just have a master and a slave.

If you just set up a master and a slave, the system has a total of 2 votes, so a server needs both votes to be elected master (1 is not a majority). If one server goes down, the other server only has 1 out of 2 votes, so it will become (or stay) a slave. If the network is partitioned, suddenly the master doesn’t have a majority of the votes (it only has its own 1 vote), so it’ll be demoted to a slave. The slave also doesn’t have a majority of the votes, so it’ll stay a slave (so you’d end up with two slaves until the servers can reach each other again).

It’s a waste, though, to have two servers and no master up, so replica sets have a number of ways of avoiding this situation. One of the simplest and most versatile ways is using an arbiter, a special server that exists to resolves disputes. It doesn’t serve any data to the outside world, it’s just a voter (it can even be on the same machine as another server, it’s very lightweight). In part 1, localhost:27019 was the arbiter.

So, let’s say we set up a master, a slave, and an arbiter, each with 1 vote (total of 3 votes). Then, if we have the master and arbiter in one data center and the slave in another, if a network partition occurs, the master still has a majority of votes (master+arbiter). The slave only has 1 vote. If the master fails and the network is not partitioned, the arbiter can vote for the slave, promoting it to master.

With this three-server setup, we get sensible, robust failover.

Next up: dynamically configuring your replica set. In part 1, we fully specified everything on startup. In the real world you’ll want to be able to add servers dynamically and change the configuration.

kristina chodorow's blog