Popping Timestamps into ObjectIds

ObjectIds contain a timestamp, which tells you when the document was created. Because the _id field is always indexed, that means you have a “free” index on your “created at” time (unless you have persnickety requirements for creation times, like resolutions of less than a second, synchronization across app servers, etc.).

Actually using this index can seem daunting (how do you use an ObjectId to query for a certain date?) so let’s run through an example.

First, let’s insert 100 sample docs, 10 docs/second.

> for (i=0; i<10; i++) { 
... print(i+": "+Date.now()); 
... for (j=0; j<10; j++) { 
...    db.foo.insert({x:i,y:j}); 
... } 
... sleep(1000); 
... }
0: 1324417241111
1: 1324417242112
2: 1324417243112
3: 1324417244113
4: 1324417245114
5: 1324417246115
6: 1324417247115
7: 1324417248116
8: 1324417249117
9: 1324417250117

Let’s find all entries created after 1324417246115 (when i=5).

The time is currently in milliseconds (that’s how JavaScript does dates), so we’ll have to convert it to seconds:

> secs = Math.floor(1324417246115/1000)
1324417246

(Your secs will be different than mine, of course.)

ObjectIds can be constructed from a 24-character string, each two characters representing a byte (e.g., “ff” is 255). So, we need to convert secs to hexidecimal, which luckily is super-easy in JavaScript:

> hexSecs = secs.toString(16)
4ef100de

Now, we create an ObjectId from this:

> id = ObjectId(hexSecs+"0000000000000000")
ObjectId("4ef100de0000000000000000")

If you get the wrong number of zeros here, you’ll get an error message that is, er, hard to miss.

Now, we query for everything created after this timestamp:

> db.foo.find({_id : {$gt : id}})
{ "_id" : ObjectId("4ef100de7d435c39c3016405"), "x" : 5, "y" : 0 }
{ "_id" : ObjectId("4ef100de7d435c39c3016406"), "x" : 5, "y" : 1 }
{ "_id" : ObjectId("4ef100de7d435c39c3016407"), "x" : 5, "y" : 2 }
{ "_id" : ObjectId("4ef100de7d435c39c3016408"), "x" : 5, "y" : 3 }
{ "_id" : ObjectId("4ef100de7d435c39c3016409"), "x" : 5, "y" : 4 }
{ "_id" : ObjectId("4ef100de7d435c39c301640a"), "x" : 5, "y" : 5 }
{ "_id" : ObjectId("4ef100de7d435c39c301640b"), "x" : 5, "y" : 6 }
{ "_id" : ObjectId("4ef100de7d435c39c301640c"), "x" : 5, "y" : 7 }
{ "_id" : ObjectId("4ef100de7d435c39c301640d"), "x" : 5, "y" : 8 }
{ "_id" : ObjectId("4ef100de7d435c39c301640e"), "x" : 5, "y" : 9 }
{ "_id" : ObjectId("4ef100df7d435c39c301640f"), "x" : 6, "y" : 0 }
{ "_id" : ObjectId("4ef100df7d435c39c3016410"), "x" : 6, "y" : 1 }
{ "_id" : ObjectId("4ef100df7d435c39c3016411"), "x" : 6, "y" : 2 }
{ "_id" : ObjectId("4ef100df7d435c39c3016412"), "x" : 6, "y" : 3 }
{ "_id" : ObjectId("4ef100df7d435c39c3016413"), "x" : 6, "y" : 4 }
{ "_id" : ObjectId("4ef100df7d435c39c3016414"), "x" : 6, "y" : 5 }
{ "_id" : ObjectId("4ef100df7d435c39c3016415"), "x" : 6, "y" : 6 }
{ "_id" : ObjectId("4ef100df7d435c39c3016416"), "x" : 6, "y" : 7 }
{ "_id" : ObjectId("4ef100df7d435c39c3016417"), "x" : 6, "y" : 8 }
{ "_id" : ObjectId("4ef100df7d435c39c3016418"), "x" : 6, "y" : 9 }
Type "it" for more

If we look at the explain for the query, you can see that it’s using the index:

> db.foo.find({_id:{$gt:id}}).explain()
{
	"cursor" : "BtreeCursor _id_",
	"nscanned" : 50,
	"nscannedObjects" : 50,
	"n" : 50,
	"millis" : 0,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		"_id" : [
			[
				ObjectId("4ef100de0000000000000000"),
				ObjectId("ffffffffffffffffffffffff")
			]
		]
	}
}

We’re not quite done, because we’re actually not returning what we wanted: we’re getting all docs greater than or equal to the “created at” time, not just greater than. To fix this, we’d just need to add 1 to the secs before doing anything else. Or I can claim that we were querying for documents created after i=4 all along.

  • Great post. A while back I created an ObjectId 
    implementation JavaScript in Javascript
    . It requires a browser to store use html5 local storage or a cookie to store the machine id (a random number) since I could not get a hostname/ip/mac address.

    The .NET driver lets you query an Objectid, as well as the php driver (which you knew since you wrote it). Maybe the shell version of ObjectId should have a convenience functions to get the timestamp out of an ObjectId.

  • Anonymous

    Thanks! You actually can get the timestamp out of an ObjectId pretty easily in the shell: ObjectId().getTimestamp().  I thought it would be pedagogically clearer for people to start with a timestamp and convert it to an ObjectId, instead of ObjectId->timestamp->ObjectId.

  • This is neat trick. A plain listing by creation time can be done without an additional index this way.

    Although, what would be even awesome is if MongoDB could use multiple indexes for the same query (index intersection if I am not wrong). That is however beyond the scope of this post. 🙂

  • Anonymous

    ..deleted..

  • Zohar Sabari

    Hi, can you please explain why you needed to add 1, the condition says ‘$gt’ which means ‘greater than’ rather then ‘greater or equal’?

  • kristina1

    The timestamp prefix is the same for all documents created a second 4ef100de, so we’ll get (for example) 4ef100de7d435c39c3016405 because it’s strictly greater than 4ef100de0000000000000000, but still in the second 4ef100de. Thus, if you want to *only* get documents created in seconds 4ef100df, 4ef100e0, etc. you need to add 1 to the second.

  • Zohar Sabari

    Hi Kristina, thanks for the reply, I understand what you mean, I think that you meant to say above that you want to find all entries that were created in the second which follows the one entry i=5 was created, when I read ‘all entries created after 1324417246115 (when i=5)’ I had in mind also those that were crated in the same second but in a higher milliseconds. It a small semantic and doesn’t matter much as the article is very clear and explanatory 🙂

kristina chodorow's blog