Using "Year" and "Month" fields of a collection as shards keys in MongDB

Go To StackoverFlow.com

0

We are currently implementing a solution where we have a collection which we want to shard based on the following fields:

Year (INT)

Month (INT)

We expect to generate around 2GB of data per year for the collection.

2012-04-05 15:04
by Nikola Stjelja
What is your question? Whether that is a good idea? Are year and month the creation time, or DOB, or what - Eve Freeman 2012-04-05 15:33


6

If you don't mind me asking, why are you considering sharding? 2GB should easily fit on a single server.

This being said, if you are definitely going to shard your collection, then it is important to choose a non-incrementing shard key that is finely-grained enough such that a situation can never occur in which chunks can not be split.

For example, if a collection were split by Month alone then there would be only 12 possible chunks. If January was a popular month for inserts, the situation could arise where a million (just to pick an arbitrary large number) records could be inserted into that chunk, and it would never be able to be split.

It is also important not to choose a shard key that increments (or decrements). As new documents are inserted into the collection, each subsequent document will be added into the same chunk, until this chunk reaches its size limit and must be split. The lower chunk may then be moved to a different server, creating a "waterfall" effect (one shard keeps filling up with chunks, which then get moved to other shards). Meanwhile, all new documents are constantly being written to the same shard creating what is known as a "hot spot". If the insert rate is sufficient, the disk could reach its IO limit attempting to write new documents while migrating existing data to another shard at the same time.

The Mongo "Sharding Introduction" document has more details on how documents are stored in a sharded collection. http://www.mongodb.org/display/DOCS/Sharding+Introduction

Additionally, the Mongo Document "Choosing a Shard Key" provides good details on what to consider when choosing a shard key. http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key

If possible, I recommend reading "Scaling MongoDB" by Kristina Chodorow. http://shop.oreilly.com/product/0636920018308.do This provides an excellent introduction to sharding as well as more detailed explanations on the Dos and Donts of choosing shard keys that I mentioned above.

Here are links to some questions that other users have asked regarding sharding and choosing Shard Keys. (You may recognize some of the links and some of the authors ) Hopefully these resources will improve your understanding of how sharding works, and if you still decide to shard your collection, allow you to choose an efficient shard key.

"sharding imbalance" - http://groups.google.com/group/mongodb-user/browse_thread/thread/1328250382087448

"What is a best method to design the shard index for this dataset" - http://groups.google.com/group/mongodb-user/browse_thread/thread/5bda4a39d9be54f5

"low cardinality shard keys" - http://groups.google.com/group/mongodb-user/browse_thread/thread/3c96d1c254f113b1

"shard key analysis" - http://groups.google.com/group/mongodb-user/browse_thread/thread/9cf0b8657d4515e2

2012-04-05 16:12
by Marc
Ads