MongoDB: The Definitive Guide 2nd Edition is Out!

MongoDB: The Definitive Guide

The second edition of MongoDB: The Definitive Guide is now available from O’Reilly! It covers both developing with and administering MongoDB. The book is language-agnostic: almost all of the examples are in JavaScript.

Upgrading from a previous edition?

If you have read the first edition, the new edition covers a lot of new material (it is twice as long!). Everything has been updated and lots of sections on new features have been added.

Translations

I hear O’Reilly is working with their translators to get this edition translated into other languages, but I tend to be the last to hear about that so I don’t know what the schedule is.

So…

Pick up a copy and let me know what you think!

First Two Weeks at Google

I’ve been at Google two weeks now and I’m loving it so far. My team is great and the work’s very interesting, but I can’t talk about what I’m doing, so:

There’s a ton of orientation stuff new employees have to do, which hasn’t been much fun. There was a scavenger hunt (which is the kind of thing I usually loathe) but we did find some cool stuff. One of the places on the list was “the music room.” When we went in it was a soundproofed room with a half-dozen guitars along one wall, an electric drumset, synth, a mic, soundboard, amps… it was amazing.

Other things I’ve discovered so far are:

  • The library, which has big comfy armchairs and couches, shelves of physical books, and a sci-fi touch interface for checking out ebooks.
  • The juice bar: free smoothies anytime.
  • The workshop with metal/wood crafting supplies and a 3D printer.

I also got this, er, unique hat:

Google beanie

If anyone wants a propellor beanie, let me know.

Stock Option Basics

Here’s what I wish I’d known when I started working at 10gen. Disclaimer: don’t take this as financial advice, consult someone who actually knows what they’re talking about before making any financial decisions, this is for entertainment purposes only, etc. Also, the numbers used below do not match any startup that I know of, they’re just hypothetical.

Intro to Stock Options

Stock options are the option to buy X shares of stock in a company at a guaranteed price of $Y per stock. Generally, they have some time constraints: they are doled out to you slowly over 3-5 years (the vesting schedule) and expire after a certain number of years if you don’t buy them.

$Y is the strike price, the price of the stock when you’re given your stock grant. Basically, it is determined by taking the value of the company (say, $2,000,000) and dividing it by the number of shares that have been issued (say, 5,000,000). This would give you a strike price of $0.40.

If the company is successful, the stock price should be higher when you sell the shares. For example, say that the startup above is successful and their stock price rises to $5.00. Now you can buy your shares for $0.40 and sell them for $5.00, making a nice $4.60 profit on each share.

Except you can’t, because of taxes. If you are the kind of person who doesn’t know what a stock option is, you probably have common, or non-qualified stock options (companies prefer this type because you’ll be footing the tax bill instead of them). For non-qualified stock, you get taxed twice: when you buy (or exercise) your options and a second time when you sell the stock (these are called taxable events).

Let’s say you’re in the situation above: you have 10,000 shares with a strike price of $0.40 and you want to exercise your options. The current price is $5. You exercise your options for $4,000 ($0.40 * 10,000). However, according to the government, you just “made” $46,000 ($5*10,000 – $4,000), which you’ll now be taxed on. I have no idea how this tax rate is computed, but for me it was ridiculous. If my options matched this example (they don’t), I would have had to pay ~$30,000 in taxes (about 60% tax rate). Also, you have to hand the company a check for these taxes when you exercise the options, you can’t put it off until April.

So be careful: if you own a lot of options and the price rises a lot, you can “golden handcuff” yourself to a place because you cannot pay the taxes to actually buy your options.

The second taxable event is when you sell the stock. If you sell the stock within a year, you’re hammered again with short-term capital gains taxes. If you wait for more than a year to sell, you “only” get hit with long-term capital gains taxes.

Negotiating Options

When you get a job at a startup, often part of the offer will be stock options. If the startup is early stage, I’d recommend pretending that your options will be totally worthless forever. Is the salary acceptable on those conditions? (Almost every other startup I know of has failed in the time 10gen’s been around).

In retrospect, I wish I had negotiated more stock options instead of more salary when I started at 10gen, but if I was joining an early-stage startup again, I would do the same thing: not sacrifice salary for options.

The exception is if you’re joining a startup at a later stage and you’re pretty sure they’ll be successful. In that case, you might want to negotiate for more options.

Option Expiration

Typically, options have an expiration date. Make sure you buy them before they expire (if you want them). Also, options are typically designed as an incentive to stay, so they don’t follow you after you leave the company. If you quit or are fired, you’ll have to buy any vested ones before or shortly after leaving.

Buying Unvested Options

You can buy unvested options, but I can’t see why you would unless you’re pretty sure the company’s going to succeed, you’ll be there until the options vest, and you’re trying to avoid the tax hit. If all those hold true, Max Schireson wrote a good blog post on what you need to know about that. In fact, go read his post regardless, because it’s a really good and more technical coverage of a lot of these points.

Dilution

In the example above, you have 10,000 shares out of 5,000,000, meaning you own (or could own) 0.2% of the company, you tech mogul. However, whenever there’s a round of funding typically more shares are issued. Thus, instead of there being 5,000,000 shares, there are now 10,000,000 and you only own 0.1% of the company. Your company should tell you how many stocks are outstanding (the number issued in total) if you ask. I think this number is typically confidential, but I’ve heard people advise that you should ask during salary negotiations, so YMMV.

The other significant event is board meetings, where the board decides how much the company is worth. This changes what the current stock price is. Funding rounds also often have an effect on price.

Stockholder Privileges

Even if you’re at an early-stage startup, it might be worth buying a few shares to get stockholders rights. Find out how many shares you need to buy to get these, if you want a look at the books and so on.

Keeping Your Documents

Particularly at an early-stage startup, there may not be anyone keeping track of this stuff. When 10gen got an Finance/HR person, I asked them about the options grants they had on record and they were missing my original hiring grant. Luckily, I still had the paperwork. Keep everything safe somewhere, just in case.

Exercise Quickly

If you’re leaving a company, get on exercising your options as quickly as possible. You generally have a limited time period before your options disappear and your company might have arbitrary restrictions on when you can exercise (like “not within a month of a board meeting”). Also, it’s not exactly a speedy process: it took 10gen three weeks from when I signed the paperwork and gave them the checks to actually send the stock certificates. You don’t want to have to be dealing with HR at your last company while you’re dealing with new hire HR at your new job.

Recruiting in all the wrong places

A recruiter had an email exchange at me the other day. It started with the standard recruiter email:

Hi Kristina,

How are you enjoying 10Gen? Any interest in going to something earlier stage? I’m working with a client in <field I’m not interested in>. Obviously the specificities of that are fascinating. Let me know if you’d be interested in hearing more.

I ignored it. A few minutes later, I got a follow-up email from him:

Subject: Whoop, how humiliating

Just saw on LinkedIn you left 10gen last month. Do you have plans for what’s next?

Closer, but not quite. A few minutes later:

Ach! Forgive me. Congratulations on the Google job.

Should you ever need the services of an incompetent sourcer, please, don’t hesitate to reach out!

And I kind of wished that I needed a recruiter because that was pretty funny.

Programming a State Machine

A monster modeled after a dog in my neighborhood.

A monster modeled after a dog in my neighborhood.

My attempts at game programming usually turn into impenetrable spaghetti code: “If the player walks through this door, then have him talk to the princess, unless he’s killed a guard, in which case the guards attack, or if he comes out of the secret passage…”

The game I’m working on now is pretty simple, but I’ve kept it really clean (so far) by using a state machine to keep track of what should happen when. Basically, each scene in the game is a state. There’s an overarching state machine which runs the current state on each tap. A state can either return itself or choose a new state to run next time.

In Objective C (+cocos2d), a state looks like this:

@interface State : NSObject {
    CCLayer *ui;
}
 
-(id) init:(CCLayer*)layer;
-(Class) processTouch:(UITouch*)touch withEvent:(UIEvent*)event;
 
@end

The processTouch function either returns nil, which means “run me again next time” or the next state to run. The other half is a “machine” to run the states:

// -----------------
// Interface
// -----------------
 
@interface StateMachine : NSObject {
    State *currentState;
    CCLayer *ui;
}
 
-(id) init:(CCLayer*)layer;
-(BOOL) processTouch:(UITouch*)touch withEvent:(UIEvent*)event;
 
@end
 
// -----------------
// Implementation
// -----------------
 
@implementation StateMachine
 
// Initialize the state machine by setting currentState to the first state
-(id) init:(CCLayer*)layer {
    self = [super init];
 
    if (self) {
        ui = layer;
        currentState = [[FirstState alloc] init:ui];
    }
 
    return self;
}
 
// Run this from the UI's touch dispatcher: it runs the current state's processing code
-(BOOL) processTouch:(UITouch*)touch withEvent:(UIEvent*)event {
    Class nextState = [currentState processTouch:touch withEvent:event];
 
    if (nextState != nil) {
        currentState = [(State*)[nextState alloc] init:ui];
    }
}
 
@end

Then, you might have an implementation like this for a swords & sorcery RPG:

@interface PalaceDungeonState : State {
    Guard *guard;
}
 
@implementation PalaceDungeonState
 
-(id) init:(CCLayer*)layer {
    // Use ui to render a dungeon
}
 
-(State*) processTouch:(UITouch*)touch withEvent:(UIEvent*)event {
    if (guard.alive) {
        [guard updatePosition];
    }
 
    CGPoint touched = [ui convertTouchToNodeSpace:touch];
 
    switch (touched) {
    case GUARD:
         [guard dies];
         break;
    case STAIRWAY:
         return PalaceStairwayState;
    case SECRET_PASSAGE:
         return SecretPassageState;
    }
 
    return nil;
}
 
@end

I’m not thrilled with doing so much work in the init, so for this type of game I’d probably move that to a start method that would be called by StateMachine on state changes.

Regardless, I’ve found this makes it a lot easier to make a complicated sequence of events while keeping my code readable.

Mad Art Skillz

With all this free time, I’ve been working on an iOS game. I’m not even close to done yet, but I’ve wrestled Objective C into submission and now and I’m working on some assets. It’s going to be musical, so here’s Beethoven:

beethoven

And here’s a demon (it plays the piano):

monster2

And the player character, Calliope:

calliope

If you ever need to make some vector art, Chris Hildenbrand’s blog, 2D Game Art for Programmers, is fantastic. It teaches you how to create awesome vector art using Inkscape (which is free). I had never done vector art before and his instructions are perfect for beginners.

Finished The Definitive Guide

Or at least the writing it, it still has to be tech edited, “real” edited, illustrated, formatted, etc. The second edition is going to be about 400 pages (almost twice the length of the first edition), with majorly expanded sections on sharding, replication, and server administration.

Phew.

Now, some mea culpas:

To those of you who sent me schemas: I’m sorry if I never got back to you! I decided to go in a different direction and ended up not using any of them. Sorry to waste people’s time (but they were fascinating to read).

To those of you who sent in a schema and I asked for your mailing address: I forgot to forward those emails to my personal account before leaving 10gen so I’ve lost the addresses. Please resend your address to my personal email (k dot chodorow at gmail dot com).

Screen Shot 2013-03-08 at 1.57.54 PM

Databases & Dragons

Here are some exercises to battle-test your MongoDB instance before going into production. You’ll need a Database Master (aka DM) to make bad things happen to your MongoDB install and one or more players to try to figure out what’s going wrong and fix it.

This was going to go into MongoDB: The Definitive Guide, but it didn’t quite fit with the other material, so I decided put it here, instead. Enjoy!

Tomb of Horrors

Try killing off different components of your system: mongos processes, config servers, primaries, secondaries, and arbiters. Try killing them in every way you can think of, too. Here are some ideas to get you started:

  • Clean shutdown: shutdown from the MongoDB shell (db.serverShutdown()) or SIGINT.
  • Hard shutdown: kill -9.
  • Shut down the underlying machine.
  • If you’re running on a virtual machine, stop the virtual machine.
  • If you’re running on physical hardware, unplug the machine.

A slightly more difficult twist is to make these servers unrecoverable: decommission the virtual machine, firewall a box from the network, pick up a physical machine an hide it in a closet.

@markofu‘s suggestion: make netcat bind to 27017 so mongod can’t start back up again:

$ while [ 1 ]; do echo -e "MongoDB shell version: 2.4.0\nconnecting to: test\n>"; nc -l 27017; done

DM’s guide: make sure no data is lost.

The Adventure of the Disappearing Data Center

Similar to above, but more organized. You can either have a data center go down (shut down all the servers there) or you can just configure your network not to let any connections in or out, which is a more evil way of doing it. If you do this via networking, once your players have dealt with the data center going down, you can bring it back and make them deal with that, too.

Note that any replica set with a majority in the “down” data center will still have a primary when it comes back online. If your players have reconfigured the remainder of the set in another data center to be primary, these members will be kicked out of the set.

Find the Rogue Query

There are several types of queries that you can run that will pound on your system. If you’d like to teach operators how to track these types of queries down and kill them, this is a good game to play.

To test a query that stresses disk IO, run a query on a large collection that probably isn’t all in memory, such as the oplog. If you have a large, application-specific collection, that’s even better as it’ll raise less red-flags with the players as to why it’s running. Make sure it has to return hundreds of gigabytes of data.

Kicking off a complex MapReduce can pin a single core. Similarly, if you can do complex aggregations on non-indexed keys, you can probably get multiple cores.

Stressing memory and CPU can be done by building background indexes on numerous databases at the same time.

To be really tricky, you could find a frequently-used query that uses an index and drop the index.

DM’s guide: players should re-heat the cache to speed up the application returning to normal.

THAC0, aka Bad System Settings

Try setting readahead to 65,000 and watch MongoDB’s RAM utilization go down and the disk IO go through the roof.

Set slaveDelay=30 on most of your secondaries and watch all of your applications w: majority writes take 30 seconds.

Use rs.syncFrom() to create a replication chain where every server only has one server syncing from it (the longest possible replication chain). Then see how long it takes for w: majority writes to happen. How about if everyone is syncing directly from the primary?

LEROY JENKINS!

What happens if your MongoDB instance gets more than it can handle? This is especially useful if you’re on a multi-tenant virtual machine: what’s going to happen to your application when one of your neighbors is behaving badly? However, it’s also good to test what might happen if you get a lot more traffic than you expect. You can use the Linux dd tool to write tons of garbage to the data volume (not the data directory!) and see what happens to your application.

Server Concealment

Try using a script to randomly turn network on and off using iptables. For increased realism, it’s more likely that you’ll lose connectivity between data centers than within a data center, so be sure to check that.

Network issues will generally cause failovers and application errors. It can be very difficult to figure out what’s going on without good monitoring or looking at logs.

Guide to Tech Interviews

I’ve been interviewing people for programming jobs for five years and I’ve recently gotten a look at the interview process from the other side. Here are some suggestions for acing tech interviews.

Read Cracking the Coding Interview (available for free from here, Google Play, and various other places). It is incredible, it basically covers every questions that a sane interviewer could ask. That’s really my main suggestions, but here are a few other supplementary tips:

Pre-Prep

Go onto Glassdoor and see what people say about the interview process at the company. Often they’ll list questions they were asked and interviewers are not that creative: if there are any questions listed whatsoever, make sure you can code them up perfectly. Google, too, can be helpful: “<company> phone screen” or “<company> interview” often will give you other possible questions.

Once you’ve plumbed the internet, time to refresh the other stuff you might be asked. For data structures, make sure you know linked lists, trees, tries, heaps, sets, and hashtables.* For algorithms, make sure you still remember dynamic programming (I certainly didn’t).

*The answer is always “hashtables.” Use them early and often, they almost always make the problem easier.

Non-Technical Questions

You might be a brilliant coder, but you also have to come up with comprehensible answers stuff like, “Tell me about something you’ve debugged recently” or “Tell me about a project that shows your strengths.” I wasn’t sure how to prep for these questions in a general way, but Cracking the Coding Interview had a great system: make a table of the projects you’ve done and possible questions and fill it out. For example:

Project 1 Project 2 Project 3 Project 4
Most challenging
What you learned
Most interesting
Hardest bug
Enjoyed most
Conflicts with teammates

This pretty much covers all of the “soft” questions you’re likely to get. I did mine using a private Noodlin board, which worked out well for me (but a Google Spreadsheet would probably work fine, too):

NoodlinMatrix

Note that mine’s a bit sparse and that’s fine. It’s actually even sparser than it looks, as a lot of the stories in the same column are very similar. I’d say come up with at least two projects for each row, though.

Then practice your responses out loud! At least for me, saying things out loud is very different than saying them in my head. I’d go off on random tangents complaining about things and then realize how it sounded halfway through.

Keep your responses short (1-5 minutes) and talk about your coding. For example, you could say: “All of the code was in a giant switch statement, so I abstracted it all out into <data structure> and then traversed it.” A bad answer would be “I used <some framework> and <hot new tech>.”

Prepping for Technical Questions

For the technical interview, try to answer all of the questions in Cracking the Coding Interview without looking at the solutions. If you can’t even get started on a problem, see if you can see any similarities between it and a more common problem. How would you solve it brute-force? There are answers at the back of the book if you’re really stuck.

For all of the questions you can get, come up with alternate answers. Can you optimize for space? Can you optimize for time? Generally you can find a fast solution that uses lots of space and a slow solution that uses very little. If you did it recursively, could you do it iteratively (and visa versa)?

Make sure you’re actually coding up these answers, too. TopCoder is pretty good for this. Google around for SRM challenges covering specifc areas, or check out the problems linked to in the algorithm tutorials. However, you can also just code up answers in a plaintext editor and run them on your own test cases.

Before the Phone Screen

Try to schedule interviews for when you peak: e.g., I can barely function before noon, so I tried to schedule all interviews for the late afternoon or evening (super handy if you’re applying to places in CA from NY).

Once your interview is set up, make sure you’re all set 15 minutes before the prearranged time. If they didn’t tell you it would be a tech interview, assume it will be and have your computer ready with internet connection. Have a cellphone with hands-free headset (I got this one on Amazon for $10 and it worked fine).

Now get a nice, big, pad and write down some questions you’d like to ask them. Leave lots of free space on the pad to take notes. Try to keep taking notes and doodling while they’re talking: you absorb more info when you’re doodling than when you’re just listening.

Make sure you’re somewhere you can spread out, so you can lay your phone beside you and quickly switch between laptop and pad as needed. On one phone interview, I accidentally hit a button on my phone and started recording the conversation (I was so embarrassed and had no idea how to turn it off).

Finally, drugs! I make a cup of coffee about 15 minutes before the interview and drink most of it, but leave a little. I’ve found it’s nice to have a booster (or at least something for my mouth to do) if I get totally stuck on a question.

A minute or two before they’re scheduled to call, take some deep breaths, relax, and try to imagine you’re expecting a call from someone who you really like and with whom you enjoy talking about technical stuff (maybe a coworker or friend). They’re just calling to hear about the cool stuff you’ve been working on and get some tech advice on a problem they’ve been having.

Executing the Tech Questions

During the interview, take the coding in several phases:

  1. Understand the question. Hopefully it will only take a few minutes to clarify what they’re asking, but make sure you have a clear mental model. Having interviewed tons of people, the difference between people who really grasp the problem and those that do not is staggering.
  2. Get the algorithm down. This is the tough part.
    • As you think of edge cases, make notes about them (or clarify with the interiewer). You might realize you have to handle overflow 20 lines in, but have forgotten by line 30. Just add a // TODO: handle overflow and then make a pass through after to take care of all the TODOs.
    • If you aren’t sure you can keep the logic straight, make comments. For instance, for a connection pool problem, you might do something like:
      Connection* getConnection() {
          // Check if pool is empty
          // If so, create new connection
          // If not, return connection from pool
      }

      This lets your interviewer know what you’re thinking even if the code doesn’t come out right (and it turns the question into kind of a fill-in-the-blank exercise for you).

    • If you get to a part that is more complex than the surrounding area, just make it a function call, at least for the time being. For example, something like this:
      if (valid) {
          counter++;
      }
      else {
          while (x[i] == '\0') {
               int j;
               for (j = counter; j >= 0; --j) {
                    // oh crap, this is going to be even more complicated... 
               }
          }
      }

      should be turned in this:

      if (valid) {
          counter++;
      }
      else {
          counter += chompNulls(x, i);
      }

      Then go write chompNulls.

  3. Check your work. There is an almost irresistable temptation to say “okay, done” when all you have is a first draft. Once you’ve coded the last line, walk through your code twice before saying “done.”
    • First, check edge cases. Can you pass in null? Zero? Empty string? Negative numbers? Really short strings? Long strings? Answer at beginning/middle/end? You can probably find most bugs with a few quick checks here.
    • Make sure your method signature still makes sense. Often you realize halfway through that you’re returning something different than you thought or you don’t need a parameter.
    • Go through any test case they gave you, make sure it works. (If they gave multiple, you don’t have to go through them all, just choose one.)
  4. Report when your done. Keep your interviewer in the loop the whole time, of course, but it’s especially important once you stop programming and you’re just sitting there in silence (checking things over). “Now let me check the case where n is negative” will go a long way towards keeping the interviewer happy.

    Then let them know “Okay, I think that should do it” once you’re done (please don’t just sit there in silence).

The Long Haul

For all-day in-person interviews, be prepared to be mentally exhausted. If you have the time/dedication, try doing a “mock interview” where your friend asks you technical questions for five hours. Use this to identify when you start getting tired and what you do when you’re tired: stop checking edge cases? Make too many assumptions about the problem? Get cranky with the interviewer?

Whatever your response is to mental exhaustion, try to figure out ways to counteract it: take a 5-minute break after each interview (ask to use the restroom if they schedule them back-to-back), drink coffee, or boost your blood sugar with a quick snack.

Also, if you aren’t familiar with coding on a whiteboard, try answering some Cracking the Coding Interview answers longhand before you go in. Make sure you leave lots of whitespace between lines, even when you don’t thing you’ll need it. I’ve found that if I concentrate on spreading things out, my brain unconsciously leaves more space where I’ll need to add things later (e.g., checking for edge cases, adding more conditions to an if, etc.). Your milage may vary.

Finally, don’t get too hung up on making things flawless. In five years, I’ve had three interviewees that did perfectly. However, I’ve recommended hiring tons of other people (and I certainly didn’t perform flawlessly… more on that in my next post about the questions I was asked). So, don’t stress out too much if you mess up.

Afterwards

Take care of yourself. Do something special to celebrate getting it over with (even if you don’t feel like you did that well). Resist the urge to follow up with the company, try to put the whole thing out of your mind. You probably won’t be able to, but remind yourself that there is nothing you can do and concentrate on other things. (Also, feel free to write thank you notes, but I’ve never known a programmer at a geeky company who gave two shits whether you did or not.)

If you get a rejection, it really hurts. If you’re at work, take a few minutes to yourself to recover. Go out for lunch, call a friend, or just walk around a bit. Remember that you’re awesome, you did everything you could, it’s their loss, and it’s probably their flawed interview process’s fault (hint: everyone’s interview process is deeply flawed).

If you get an acceptance, congratulations! They love you. Now you just have to figure out what to do next. For negotiating (and more general interview/resume/cover letter) advice, I highly recommend Ask a Manager.

Find this useful? Please comment/upvote on Hacker News!

Goodbye 10gen, Hello Google

Mongo_map_from_Flash_Gordon

After five wonderful years, I’ve decided to leave 10gen and join Google. I’m going to miss working with all of my coworkers and the community tons, you guys are awesome.

I will hopefully continue blogging, but Snail in a Turtleneck will probably not be as MongoDB-focused anymore. If you’re looking for some good MongoDB reads, I recommend checking out Planet Mongo, an aggregator of MongoDB-related blogs, and in particular:

If I’ve missed any MongoDB blogs you find helpful, let me know in the comments and I’ll add them to the list!

I will be finishing the second edition of MongoDB: The Definitive Guide in a few weeks and it should be out by the end of the year.  Once I’m done with that, hopefully I’ll have a few weeks to relax.

kristina chodorow's blog