Call for Schemas

The Return of the Mongoose Lemur

I just started working on MongoDB: The Definitive Guide, 2nd Edition! I’m planning to add:

  • Lots of ops info
  • Real-world schema design examples
  • Coverage of new features since 2010… so quite a few

However, I need your help on the schema design part! I want to include some real-world schemas people have used and why they worked (or didn’t). If you’re working on something 1) interesting and 2) non-confidential and you’d like to either share or get some free advice (or both), please email me (kristina at 10gen dot com) or leave a comment below. I’ll set up a little interview with you.

I am particularly looking for “cool” projects (video games, music, TV, sports), recognizable companies (Fortune 50 & HackerNews 500*), and geek elite (Linux development, research labs, robots, etc.). However, if you’re working on something you think is interesting that doesn’t fall into any of those categories, I’d love to hear about it!

* There isn’t really a HackerNews 500, I mean projects that people in the tech world recognize and thinks are pretty cool (DropBox, Github, etc.).

  • Steven Hatfield

    Hi Kristina,
    One problem that I’ve run into in the short time that I’ve been working with mongoDB is how to evolve a changing schema. I have a couple of collections with references to one from the other.  I can run renameCollection() to change the name of the collection being referenced, but it leaves the referring collection’s documents with broken ref links! A work around for this kind of stuff would be excellent to include in the book.
    Thanks,
    Steven

  • kristina1

    So noted, I’ll add that to the application administration section.  Thanks for the suggestion!

  • Is Mike working on the second edition again? Also, may I suggest including chocolatey directions for installing on windows, in addition to the windows directions. 

  • kristina1

    Unfortunately, Mike decided not work on the 2nd edition.

    I’ll make a note on the Windows thing, you mean like directions on installing it as a service instead of just unpack->run?  (Like here: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/#mongodb-as-a-windows-service?)  Mind if I run that section by you when it’s done? 🙂

  • Chocolatey (
    http://chocolatey.org/ ) is like yum/apt-get/pkg-add for windows. Alan Stevens wrote a mongo package for it so you can install mongo with “cinst mongo” from a command prompt.

    But yes, it would be g

  • kristina1

    Ah, I thought you were just using chocolatey as flavor (HA!) text.  

    I checked with the Windows people at 10gen and I don’t think it’s widely-enough known to include yet, sorry.

  • Fair enough, hit me up when you want the section reviewed.

  • Aravind Udayashankara

    Hey First I would like to thank you for contributing  this wonderful book , It will be very helpful if you could add some more information on Sharding by taking some more practical examples  Or Usecases like twitter .  Because in general MongoDB is recommended in the scenarios where large amount of sharding  is required . 

  • kristina1

    Thank you!  I’ll definitely include some concrete examples for sharding.

  • Aravind Udayashankara

    Hi kristina ,

    I have following scenario , in sharding  , Lets say I have following setup

    I have 1 server running config server

    I have 2 servers running mongod instances called node1 and node2

    I have 1 server running mongos

    I have added the servers node1 and node2 as shard servers in the mongos shell by using the command

    db.runCommand( { addshard : “serverhostname[:port]” } );

    later I created database in node1 and enabled sharding for that database from Mongos

    I created 300 collections by connecting to mongos  but still all collections are going to only one server what is wrong in the above setup , Actually it should create some collections in the other shard server . 

    NOTE : – Since I am not in a position to upgrade the application layer I cant shard in collection level , I have a script which creates several collections dynamically for some requirement .

    Please Provide your suggestions on how we can shard in database level so that collections gets equally distributed among all shard servers .

    I am looking for some solution like

    If i create 300 collections automatically mongos should create around 150 or 100 collections in each of the two sharded servers .

  • kristina1

    See http://www.kchodorow.com/blog/2012/07/25/controlling-collection-distribution/.

  • Aravind Udayashankara

    Hey this is really a good new feature I could learn , But my problem is collections are dynamically created , I cant manually shard them as and when they are created , Is there a way to distribute collections , which are created in a database for which sharding is enabled 

  • kristina1

    No, not yet.

  • Kristina, I’ve got one that might be interesting. I’ll umm just tell you about it at work. It was actually an app built with a relational db at first but it didn’t work well and we transformed it to mongo and it worked really well. Was for my old company http://www.readrboard.com/

  • whardier

    Here’s the example schema for my Informadiko generic search system.  This mainly covers the fundamental accounting and access privileges as well as all of the templating for each document repository (called collections in my software.. to be confusing).

    One of the tricks I’ve been using since I started working in this schema style has been the ‘shortnames’ technique as seen on ine 54.   This allows me to know the ObjectID of a _id key in a dict inside a list relative to the shortnames list.  In the end this gives me some very vital shortcuts within my code.  Some of which may be rather obvious when combined with my project here:  https://github.com/whardier/MongoDict.  MongoDict wraps a dict class so that when loading into a dict via PyMongo a shortcut list is generated on demand using this schema.  This allows the developer to reference an item in a list by a lookup key rather than traversing it.

    The Schema:
    https://gist.github.com/3610610

    Some other fun tricks have been how the timezone schema works for preference for each user as well as the company preference.  This has been very important when searching for time based records/documents.  With a simple function the allowed timezones and the preferred timezones can be merged and sorted and displayed as an option.  It has been very useful to search from east coast at one time to west coast at another.

    And of course pay attention to the indexes.  They have been paramount toward the speed of this solution.  One of the tricky indexes is on line 15 where we use the advance $returnKey option when finding documents to hit ONLY the index and not the collection itself.  Very simple Redis replacement at that point.

    company1_tv is actually company1.tv (as a domain) and company2_com is of course company2.com similarly.

  • kristina1

    Cool, that looks really interesting (I was looking for something like that for my blog a while ago, actually!).  Look forward to hearing about it.

kristina chodorow's blog