MapReduce – The Fanfiction

MapReduce is really cool, useful, and powerful, but a lot of people find it hard to wrap their heads around. This post is a fairly silly, non-technical explanation using Star Trek.

The Enterprise found a new planet, as it tends to do.

Kirk wanted to beam down immediately and start surveying the planet but Spock told him to wait a moment. “It usually takes us one hour to survey a planet, correct Captain?  In less than 5 minutes, I can calculate whether the chance of encountering friendly alien females outweighs the risk of attack by brain-eating monsters.”

“Interesting idea, Spock,” said Kirk.  “Go ahead.”

The Data

“Logically,” thought Spock, “if we can survey a whole planet in one hour, we can survey 1/16th of a planet in 3.75 minutes.”  Spock divided the planet into 16 equal-size pieces and summoned 16 red shirts.

“You’ll be beamed down to the surface of the planet with this special data collection device called an ’emitter.’  If you see a brain-eating monster, you press the “brain-eating monster” button on your emitter.  If you see an attractive female alien, you press the “hot alien chick” button.  Press either, neither, or both buttons, as your situation requires.”

The Map Step

The 16 red shirts were beamed down to the 16 parts of the planet.  As they found things, they would press the buttons on their emitter.

Back on the Enterprise, Spock started getting lots of data pairs that looked like:

| type                 | location |
|----------------------|----------|
| Brain-eating monster | 2        |
| Hot alien chick      | 7        |
| Brain-eating monster | 14       |
| Brain-eating monster | 7        |

The Reduce Step

“Computer,” Spock said.  “Initialize a counter to 0 for each new type you get.  Then, for every subsequent data pair with the same type, increment that counter.”

“I dinnae understand,” said Scotty.  “What’s that, then?”

“I basically told the computer to initialize two variables, ‘Brain-eating monster’ and ‘Hot alien chick’, setting them both to zero.  Every time the computer gets a ‘Brain-eating monster’ emit, it increments the ‘Brain-eating monster’ variable.  Every time it gets a ‘Hot alien chick’ emit, it increments the ‘Hot alien chick’ variable.

“Ah, I see,” said Scotty.  “But don’t you lose the location information?”

“Yes,” replied Spock.  “But I don’t actually care about location for this readout.  If I wanted the location, I could give the computer a slightly more complicated algorithm, but right now I just want the count.”

The Result

After 3.75 minutes, Spock beamed up the red shirts who were still alive and presented to Kirk: “There are brain-eating monsters on 7/8ths of the planet, Captain.  1/16 of the planet has hot alien chicks.”

“Excellent work Spock,” Kirk says.  “Let’s boldly go somewhere else.”

And so they did.

Captain’s log, star date 1419.7 (aka a summary of what we did)

  1. Goal – To generate a report on a planet.
  2. Data – 16 pieces of land with various attributes. Each piece of land could be represented by a JSON object such as:
    {
        "location" : 5
        "contains" : ["Brain-eating monsters", "rocks", "poison gas"]
    }
  3. Map – Send attributes for each piece of data back to the processor. In JSON, each emit would look something like:
    {
        "Brain-eating monsters" : 5
    }
  4. Reduce – Sum up the data, grouping by type
  5. Result – How much of each attribute is on the planet

Further reading: Kyle Banker has an excellent (and more technical) explanation of MapReduce.

  • Why do you “initialize three variables” when you only need two? 😉

  • Why do you “initialize three variables” when you only need two? 😉

  • Because I have a crappy editor 🙂

    Thanks, fixed.

  • Because I have a crappy editor 🙂

    Thanks, fixed.

  • This is easily the most accessible explanation of Map/Reduce I’ve ever seen! Finally, I have something to which I can direct all of my treky-but-non-map-reduce-savvy friends.

  • This is easily the most accessible explanation of Map/Reduce I’ve ever seen! Finally, I have something to which I can direct all of my treky-but-non-map-reduce-savvy friends.

  • @Matt thank you!

  • @Matt thank you!

  • Nice story! 🙂

  • Nice story! 🙂

  • Raymond Blum

    …much better than – “Why MapReduce is better than my cat at doing dishes!”

    This really is an excellent explanation of MR as a technique. great job!

  • Raymond Blum

    …much better than – “Why MapReduce is better than my cat at doing dishes!”

    This really is an excellent explanation of MR as a technique. great job!

  • Samir

    Rock on buddy. Awesome explanation 🙂

  • Samir

    Rock on buddy. Awesome explanation 🙂

  • Great article:)

  • Enjoy reading your post! Keep going!

  • Kaustubh P

    best explanation i have read till date. Now comes the implementation 😐

  • Anonymous

    For implementation, I generally point people to Kyle Banker’s excellent series on aggregation: http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/. Don’t worry, it’s not too bad once you understand the concept.

  • Pingback: Logging in a Red Shirt « Seeing things as they could be…()

  • Pingback: Mongodb Mapreduce 初窥 | NeXT()

  • OK, this is, without hyperbole, the best technical introduction article in the history of Earth.

  • Anonymous

    Wow, thank you!

  • Pingback: adult dating()

  • Pingback: 14 MP Digital Camera()

  • There’s a minor typo “but right not I just want the count” should read “but right now I just want the count” .. was just a minor distraction, takes nothing away from the story. Brilliant explanation of the use of MapReduce.

  • Anonymous

    Thank you!  Fixed.

  • Hauser

    Funny but useful. Thx 🙂

kristina chodorow's blog