SQL to MongoDB: An Updated Mapping

Rick Osborne's original chart.

The aggregation pipeline code has finally been merged into the main development branch and is scheduled for release in 2.2. It lets you combine simple operations (like finding the max or min, projecting out fields, taking counts or averages) into a pipeline of operations, making a lot of things that were only possible by using MapReduce doable with a “normal” query.

In celebration of this, I thought I’d re-do the very popular MySQL to MongoDB mapping using the aggregation pipeline, instead of MapReduce.

Here is the original SQL:

SELECT
  Dim1, Dim2,
  SUM(Measure1) AS MSum,
  COUNT(*) AS RecordCount,
  AVG(Measure2) AS MAvg,
  MIN(Measure1) AS MMin
  MAX(CASE
    WHEN Measure2 < 100
    THEN Measure2
  END) AS MMax
FROM DenormAggTable
WHERE (Filter1 IN (’A’,’B’))
  AND (Filter2 = ‘C’)
  AND (Filter3 > 123)
GROUP BY Dim1, Dim2
HAVING (MMin > 0)
ORDER BY RecordCount DESC
LIMIT 4, 8

We can break up this statement and replace each piece of SQL with the new aggregation pipeline syntax:

MongoDB Pipeline MySQL
aggregate: "DenormAggTable"
FROM DenormAggTable
{
    $match : {
        Filter1 : {$in : ['A','B']},
        Filter2 : 'C',
        Filter3 : {$gt : 123}
    }
}
WHERE (Filter1 IN (’A’,’B’))
  AND (Filter2 = ‘C’)
  AND (Filter3 > 123)
{
    $project : {
        Dim1 : 1,
        Dim2 : 1,
        Measure1 : 1,
        Measure2 : 1,
        lessThanAHundred : {
            $cond: [ 
                {$lt: ["$Measure2", 100] },
                "$Measure2", // if
                0]           // else
        }
    }
}
CASE
  WHEN Measure2 < 100
  THEN Measure2
END
{
    $group : {
        _id : {Dim1 : 1, Dim2 : 1},
        MSum : {$sum : "$Measure1"},
        RecordCount : {$sum : 1},
        MAvg : {$avg : "$Measure2"},
        MMin : {$min : "$Measure1"},
        MMax : {$max : "$lessThanAHundred"}
    }
}
SELECT
  Dim1, Dim2,
  SUM(Measure1) AS MSum,
  COUNT(*) AS RecordCount,
  AVG(Measure2) AS MAvg,
  MIN(Measure1) AS MMin
  MAX(CASE
    WHEN Measure2 < 100
    THEN Measure2
  END) AS MMax
 
GROUP BY Dim1, Dim2
{
    $match : {MMin : {$gt : 0}}
}
HAVING (MMin > 0)
{
    $sort : {RecordCount : -1}
}
ORDER BY RecordCount DESC
{
    $limit : 8
},
{
    $skip : 4
}
LIMIT 4, 8

Putting all of these together gives you your pipeline:

> db.runCommand({aggregate: "DenormAggTable", pipeline: [
{
    $match : {
        Filter1 : {$in : ['A','B']},
        Filter2 : 'C',
        Filter3 : {$gt : 123}
    }
},
{
    $project : {
        Dim1 : 1,
        Dim2 : 1,
        Measure1 : 1,
        Measure2 : 1,
        lessThanAHundred : {$cond: [{$lt: ["$Measure2", 100]}, {
            "$Measure2",
            0]
        }
    }
},
{
    $group : {
        _id : {Dim1 : 1, Dim2 : 1},
        MSum : {$sum : "$Measure1"},
        RecordCount : {$sum : 1},
        MAvg : {$avg : "$Measure2"},
        MMin : {$min : "$Measure1"},
        MMax : {$max : "$lessThanAHundred"}
    }
},
{
    $match : {MMin : {$gt : 0}}
},
{
    $sort : {RecordCount : -1}
},
{
    $limit : 8
},
{
    $skip : 4
}
]})

As you can see, the SQL matches the pipeline operations pretty clearly. If you want to play with it, it’ll be available soon to a the development nightly build.

If you’re at MongoSV today (December 9th, 2011), check out Chris Westin’s talk on the new aggregation framework at 3:45 in room B4.

  • Pingback: SQL to MongoDB: An Updated Mapping « Another Word For It

  • Anonymous

    This kinda stuff shouldn’t be in blog. It should go straight into tutorial! It’s that good! Thanks.

  • shawny

    I think:
    {
        $limit : 4
    },
    {
        $skip : 8
    }

    should be changed into:

    {    $limit : 8},{    $skip : 4}

  • Anonymous

    Whoops, thank you!  Fixed.

  • Juan Borda

    I would like to view this chart reviewed with the new aggregation framework introduced in mongo 2.1 (i can help too)

  • kristina1

    That’d be cool, let me know if you come up with a new version.

kristina chodorow's blog