Equivalent Mongodb idiom for batch queries

Go To StackoverFlow.com

0

What is the most efficient way to query mongo collections in batches? For instance in SQL I'd do something like:

SELECT * FROM Foo WHERE id > {{floor}} limit 1000;

I've found that I can do something as outlined here

but the big warning about skip makes me think I shouldn't use that. (Assume the collection is large enough for it to matter).

My collection doesn't have any indexes besides on the autogenerated _id field. I'm not sure if there is a way to use that and $gt $lt.

If it matters, I'll be using the Casbah driver, in case there is secret sauce built in.

2012-04-04 20:20
by Dan.StackOverflow


2

The literal translation of that SQL query to mongo is

db.foo.find({"id": {"$gt": floor}}).limit(1000)

This query can be served by an index on the id field (or by the default index on _id if you use that column instead of one named id).

There are no performance issues with limit(), but skip() for very large values can perform poorly because the server essentially needs to iterate through lots of records before it can begin to return results to you.

2012-04-04 21:10
by dcrosta
right, I only have the _id default index, and so I don't know the min id to start at. Though, I suppose if I start at 0 and do limit 1000, and then take the last _id and increment it, it could work... right - Dan.StackOverflow 2012-04-04 21:53
Indeed it appears I can do something like db.foo.find( {_id: { $gt:new ObjectId("000000000000000000000000") } } ).limit(1000) to get me started. Nice - Dan.StackOverflow 2012-04-04 22:06
For paginagtion, that approach (remembering the last, then starting just after it) is recommended. You can do db.foo.find({_id: {$gt: {last_object_id}}).sort({_id: 1}).limit(num_per_page). Rembmer to use the sort() or else you may get quirky results - dcrosta 2012-04-05 12:46
Ads