Sharding with the Fishes

Sharding is the not-so-revolutionary way that MongoDB scales writes (it’s very similar to techniques described in the Big Table paper and by PNUTS) but many people are unfamiliar with what it is and how it works.  If you’ve seen a talk on MongoDB or looked at the website, you’ve probably seen a diagram of sharding that looks like this:

…which probably looks a bit like “I hope I don’t have to understand that.”

However, it’s actually quite simple: it’s exactly how the Mafia is structured (or, at least, how The Godfather taught me it was):

  • The shards are the peons: someone tells them to do something (e.g., a query or an insert), they do it and report back.
  • The mongos is the godfather. It knows what data each peon has and gives them orders.  It’s basically a router for the requests.
  • The config server is the consigliere. It knows where all of the data is at any given time and lets the boss know so that he can focus on bossing. The consigliere keeps the organization running smoothly.

For a concrete example, let’s say we have a collection of blog posts.  You choose a “shard key,” which is the value Mongo will use to split the data across shards.  We’ll choose “date” as the shard key, which means it will be split up based on values in the “date” field.  If we have four shards, they might contain data something like:

  • shard #1: beginning of time up to June 2009
  • shard #2: July 2009 to November 2009
  • shard #3: December 2009 to February 2010
  • shard #4: March 2010 through the end of time

Now that we’ve got our peons set up, let’s ask the godfather for some favors.

Queries

Say you query for all documents created from the beginning of this year (January 1st, 2010) up to the present.  Here’s what happens:

  1. You (the client) send the query to the godfather.
  2. The godfather knows which shards contain the data you’re looking for, so he sends the query to shards #3 and #4.
  3. shard #3 and shard #4 execute the query and return the results to the godfather.
  4. The godfather puts together the results he’s received and sends them back to the client.

Note how all of the sharding stuff is done a layer away from the client, so your application doesn’t have to be sharding-aware, it can just query the godfather as though it were a normal mongod instance.

Inserts

Suppose you want to insert a new document with today’s date.  Here’s the sequence of events:

  1. You send the document to the godfather.
  2. It sees today’s date and routes it to shard #4.
  3. shard #4 inserts the document.

Again, identical to a single-server setup from the client’s point of view.

So where’s the consigliere?

Suppose you start getting millions of documents inserted with the date September 2009.  Shard #2 begins to swell up like a bloated corpse.  The consigliere will notice this and, when shard #2 gets too big it will split the data on shard #2 and migrate some of it to other, emptier shards.  Then it will let the godfather know that now shard #2 contains July 2009-September 15th 2009 and shard #3 contains September 16th 2009-February 2010.

The consigliere is also responsibly for figuring out what to do when you add a new shard to the cluster.  It figures out if it should keep the new shard in reserve or migrate some data to it right away.  Basically, it’s the brains of the operation.

Whenever the consigliere moves around data, it lets the godfather know what the final configuration is so that the godfather can continue routing requests correctly.

Leave the gun.  Take the cannolis.

This scaling deliciousness is, unfortunately, still very alpha.  You can help us out by telling us where our documentation sucks (specifics are better than “it sucks”), testing it out on your machine, and voting for features you’d like to see.

  • Dan

    Three questions come out of this discussion on sharding:

    1) If you only have one big, kicking server, are you better off using it for a single mongod or would it be better to divide it up into four or five virtual machines with shards and mongos on them? And by “better off” I mean raw query response.

    2) Your “division by date” example is pretty simplistic. How about a division by type of data? Or perhaps chunks that would be normalized in a SQL database.

    and finally,
    3) Can the consigliere and the godfather live on the same box and either on a shard box?

  • Dan

    Three questions come out of this discussion on sharding:

    1) If you only have one big, kicking server, are you better off using it for a single mongod or would it be better to divide it up into four or five virtual machines with shards and mongos on them? And by “better off” I mean raw query response.

    2) Your “division by date” example is pretty simplistic. How about a division by type of data? Or perhaps chunks that would be normalized in a SQL database.

    and finally,
    3) Can the consigliere and the godfather live on the same box and either on a shard box?

  • John Bledsoe

    I think you meant Shard #4 should be March 2010.

  • John Bledsoe

    I think you meant Shard #4 should be March 2010.

  • @Dan:
    1) I’m actually not clear on this point myself. mongod figures out how much free space you have, but I think if you had the (slightly ridiculous) setup of a server with 100GB of storage and 2GB of ram and another with 8GB of storage and 8GB of ram, I’d imagine most of the data would end up on the 100/2 server. You could ask on the list (http://groups.google.com/group/mongodb-user/).

    2) As long as it’s a field, you can shard on it. It shards by differences in field value (so you wouldn’t want to shard on a boolean field, but other than that anything should work). Eventually you’ll be able to shard on compound fields, too, but I don’t think this is working yet.

    3) Yes, see http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-ServerLayout for an ugly diagram.

    @John Indeed I did, thanks.

  • @Dan:
    1) I’m actually not clear on this point myself. mongod figures out how much free space you have, but I think if you had the (slightly ridiculous) setup of a server with 100GB of storage and 2GB of ram and another with 8GB of storage and 8GB of ram, I’d imagine most of the data would end up on the 100/2 server. You could ask on the list (http://groups.google.com/group/mongodb-user/).

    2) As long as it’s a field, you can shard on it. It shards by differences in field value (so you wouldn’t want to shard on a boolean field, but other than that anything should work). Eventually you’ll be able to shard on compound fields, too, but I don’t think this is working yet.

    3) Yes, see http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-ServerLayout for an ugly diagram.

    @John Indeed I did, thanks.

  • Great word illustration!

    I’m looking forward to testing out sharding with the geospatial functionality. On an older project I ran into huge performance issues with ton loads of lat/long data and distance queries. I’m going to try to replicate the project using MongoDB.

  • Great word illustration!

    I’m looking forward to testing out sharding with the geospatial functionality. On an older project I ran into huge performance issues with ton loads of lat/long data and distance queries. I’m going to try to replicate the project using MongoDB.

  • @Fitz cool! Let us know how it goes.

  • @Fitz cool! Let us know how it goes.

  • Pingback: Mongo Smash! | deadkarma()

  • Thanks Kristina, very good illustration! I'm going to try that..

  • lol @ godfather analogy 🙂
    — desperately waiting for mongo sharding to be production ready…

    in the mean time, my app has to handle sharding based on ranges…

  • VJ

    Hello. We've just started using MongoDB 1.5 (CentOS) and are trying out sharding. I've installed mongo on 4 servers (A, B, C, D). Server A runs “mongod –configsvr” and “mongos –configdb”, and servers B, C, D run “mongod –shardsvr”. After setting up & enabling sharding, I inserted some test data, but I only see data on server B. I used a small chunk size (1 MB). Will I not see data on the other servers until I've inserted 1 MB of data?
    Thanks.

  • kristina1

    You won't see any sharding activity until you've inserted at least 200MB of data! If you just want to try it out and see some data move around, you can wipe everything and start again, starting mongos with the option –chunkSize 1. This will force it to start sharding data after 1 MB has been inserted. Also, make sure sharding is enabled at the db and collection levels.

  • VJ

    I am using a chunk size of 1 MB [“mongos –configdb localhost:20000 –chunkSize 1”]. I inserted 1 million records on server A, and I see 1 million records on server B using db.people.count().

  • kristina1

    It sounds like you may not have enabled sharding on the database and the collection. See http://www.mongodb.org/display/DOCS/A+Sample+Co…, particularly the enablesharding and shardcollection commands.

    Also: if you started mongos at some point without the –chunkSize, it might have registered a bigger chunk size in the configuration and won't go back, even though you're now using a smaller one.

  • VJ

    I just noticed these error msgs in the mongos.log on server A. Did we build mongo incorrectly?
    ========================================
    [Balancer] Wed Jul 14 14:45:54 balancer: chose [shard0] to [shard1] { _id: “test.people-name_MinKey”, lastmod: Timestamp 135000|0, ns: “test.people”, min: { name: MinKey }, max: { name: 1.0 }, shard: “shard0” }
    [Balancer] Wed Jul 14 14:45:54 moving chunk ns: test.people moving ( shard ns:test.people shard: shard0:68.67.172.15:10000 lastmod: 135|0 min: { name: MinKey } max: { name: 1.0 }) shard0:68.67.172.15:10000 -> shard1:68.67.172.16:10000
    [Balancer] Wed Jul 14 14:45:54 balancer: MOVE FAILED **** no such cmd { errmsg: “no such cmd”, bad cmd: { moveChunk: “test.people”, from: “68.67.172.15:10000”, to: “68.67.172.16:10000”, filter: { name: { $gte: MinKey, $lt: 1.0 } }, shardId: “test.people-name_MinKey”, configdb: “localhost:20000” }, ok: 0.0 }
    from: shard0 to: chunk: { _id: “test.people-name_MinKey”, lastmod: Timestamp 135000|0, ns: “test.people”, min: { name: MinKey }, max: { name: 1.0 }, shard: “shard0” }
    ========================================

  • VJ

    First of all, thanks for your prompt replies.

    I had started with a chunk size of 1 MB. Also, I followed the instructions to enable sharding on the collection.
    ========================================
    > db.runCommand({listshards:1})
    {
    “shards” : [
    {
    “_id” : “shard0”,
    “host” : “68.67.172.15:10000”
    },
    {
    “_id” : “shard1”,
    “host” : “68.67.172.16:10000”
    },
    {
    “_id” : “shard2”,
    “host” : “68.67.172.17:10000”
    }
    ],
    “ok” : 1
    }
    ========================================
    > db.printShardingStatus();
    — Sharding Status —
    sharding version: { “_id” : 1, “version” : 3 }
    shards:
    { “_id” : “shard0”, “host” : “68.67.172.15:10000” }
    { “_id” : “shard1”, “host” : “68.67.172.16:10000” }
    { “_id” : “shard2”, “host” : “68.67.172.17:10000” }
    databases:
    { “_id” : “admin”, “partitioned” : false, “primary” : “config” }
    { “_id” : “test”, “partitioned” : true, “primary” : “shard0”, “sharded” : { “test.people” : { “key” : { “name” : 1 }, “unique” : false } } }
    test.people chunks:
    { “name” : { $minKey : 1 } } –>> { “name” : 1 } on : shard0 { “t” : 135000, “i” : 0 }
    { “name” : 1 } –>> { “name” : 8012 } on : shard0 { “t” : 135000, “i” : 1 }
    { “name” : 8012 } –>> { “name” : 14514 } on : shard0 { “t” : 135000, “i” : 2 }
    :
    :
    :
    { “t” : 135000, “i” : 132 }
    { “name” : 987356 } –>> { “name” : “j” } on : shard0 { “t” : 135000, “i” : 133 }
    { “name” : “j” } –>> { “name” : { $maxKey : 1 } } on : shard0 { “t” : 135000, “i” : 134 }

    ========================================

  • kristina1

    Thanks for all the information! It looks like everything is set up right. You may have hit a bug, could you send all of this info to the user list? http://groups.google.com/group/mongodb-user

  • VJ

    I've submitted the info. Thanks.

  • Really enjoyed this, thanks a lot. So easy to understand!

  • Tom

    Awesome blog

  • Pingback: NoSQL Daily – Mon Sep 13 › PHP App Engine()

  • Rohith

    Great man I really enjoyed it

  • Pingback: Three Things To Watch Out For With NoSQL | Jeremiah Peschka()

  • Pingback: ehcache.net()

  • Pingback: Sharding Mongodb Overview | Software Development/Product Ideas/Games Development()

  • Bnelson

    The way you describe this makes me feel very very smart.  You should think about going into teaching full-time 🙂

  • Anonymous

    Haha, thank you!

  • Easy_010481

     i have a collection with documents in shape ({_id:{w1:”word 1″,W2:”word 2″}, value: 3}), i want sharding my collection. could i choose _id be shard key? my system is heavy in read..!! thanks

  • kristina1

    It really depends what your read load is like and what kinds of values w1 & W2 have.  You should ask & give as much info as you can on the mailing list: https://groups.google.com/group/mongodb-user.

  • Easy_010481

     thanks so much!!!
    Can you give me advice!!! when i finish config a cluster. i should create database directly on shard and then enablesharding or create on mongos and then enablesharding. which databases mongos can see on its cluster!!!

    Thanks!!

  • kristina1

    Please ask questions on the mailing list.

kristina chodorow's blog