Resizing Your Oplog

The MongoDB replication oplog is, by default, 5% of your free disk space. The theory behind this is that, if you’re writing 5% of your disk space every x amount of time, you’re going to run out of disk in 19x time. However, this doesn’t hold true for everyone, sometimes you’ll need a larger oplog. Some common cases:

  • Applications that delete almost as much data as they create.
  • Applications that do lots of in-place updates, which consume oplog entries but not disk space.
  • Applications that do lots of multi-updates or remove lots of documents at once. These multi-document operations have to be “exploded” into separate entries for each document in the oplog, so that the oplog remains idempotent.

If you fall into one of these categories, you might want to think about allocating a bigger oplog to start out with. (Or, if you have a read-heavy application that only does a few writes, you might want a smaller oplog.) However, what if your application is already running in production when you realize you need to change the oplog size?

Usually if you’re having oplog size problems, you want to change the oplog size on the master. To change its oplog, we need to “quarantine” it so it can’t reach the other members (and your application), change the oplog size, then un-quarantine it.

To start the quarantine, shut down the master. Restart it without the --replSet option on a different port. So, for example, if I was starting MongoDB like this:

$ mongod --replSet foo # default port

I would restart it with:

$ mongod --port 10000

Replica set members look at the last entry of the oplog to see where to start syncing from. So, we want to do the following:

  1. Save the latest insert in the oplog.
  2. Resize the oplog
  3. Put the entry we saved in the new oplog.

So, the process is:

1. Save the latest insert in the oplog.

> use local
switched to db local
> // "i" is short for "insert"
>{op : "i"}).sort(
... {$natural : -1}).limit(1).next())

Note that we are saving the last insert here. If there have been other operations since that insert (deletes, updates, commands), that’s fine, the oplog is designed to be able to replay ops multiple times. We don’t want to use deletes or updates as a checkpoint because those could have $s in their keys, and $s cannot be inserted into user collections.

2. Resize the oplog

First, back up the existing oplog, just in case:

$ mongodump --db local --collection '' --port 10000

Drop the collection, and recreate it to be the size that you want:

> // size is in bytes
> db.runCommand({create : "", capped : true, size : 1900000}) 
{ "ok" : 1 }

3. Put the entry we saved in the new oplog.


Making this server primary again

Now shut down the database and start it up again with --replSet on the correct port. Once it is a secondary, connect to the current primary and ask it to step down so you can have your old primary back (in 1.9+, you can use priorities to force a certain member to be preferentially primary and skip this step: it’ll automatically switch back to being primary ASAP).

> rs.stepDown(10000)
// you'll get some error messages because reconfiguration 
// causes the db to drop all connections

Your oplog is now the correct size.

Edit: as Graham pointed out in the comments, you should do this on each machine that could become primary.

kristina chodorow's blog