email as _id in a MongoDB user collection
Asked Answered
E

2

7

I have a user collection in a MongoDB. The _id is currently the standard MongoDB generated ObjectId. I also have a unique key constraint against a required 'email' field. This seems like a waste.

Is there any reason why I should not ditch the 'email' field and make that data the _id field?

Esau answered 1/5, 2014 at 0:59 Comment(0)
P
19

I have read Neil's answer and I partially agree with it (also I am really skeptical about 'significant performance gains'). One thing I have not found in your question is 'what are you going to do with this email'. Are you going to search by it or it is just saved there? And one of the most important things which was not addressed in previous answer: is it going to be changed?

It is not uncommon that people who would use your system will be going to change their email (lost / is not used anymore). If you will put your _id as their email you will not be able to change it easily (you can not modify _id in mongo). You will need to copy, remove add new element in this case (which will not be atomic).

So I would put this as one big reason not to do so. But you need to decide whether you will allow people to change email addresses.

Plumbum answered 1/5, 2014 at 1:34 Comment(3)
That rare use case - email change - can be avoided by adding separate document / row / column with actual email, no? You don't have to copy all data - you just add actual email for notifications.Chuvash
@NikolayFominyh it is not such a rare case. And I highly doubt that is it simpler to add a new document with some information instead of just not having email as your _id field.Plumbum
@SalvadorDali, yeah. But you still don't have to copy all data. I'm talking only about it. Adding alias require some more logic in app, but not so much as it looks like. :)Chuvash
C
7

Generally speaking, no there is no real reason and in fact there are significant performance gains to be realized if you actually do use your "email" as a primary key.

  1. Where most of your lookup's are actually on that primary key. Even creating a unique key for a different field, MongoDB is optimized so that "finding" the _id field index is a no-brainer. It's always there.

  2. No additional space used for an index. So again where you are looking up your primary key there is not need to pull in anything other than the default index, as well as this naturally saving on disk space in addition to the I/O cost that would be incurred otherwise.

Perhaps the only real relevant consideration would be with sharding. And that would only be if your use case was better suited to some different form of "bucketed" distribution of "high/low" volume users for example. In that case some other form of Primary key would be required in order to facilitate that.

The default ObjectId type that generally occupies the _id field is great as it maintains a natural insertion order and also even makes it possible to do such things as general range based queries or even time based queries (within reason). So where there is a need for a natural insertion order it is generally be best choice and is highly collision safe.

But if you are generally looking for efficient lookup of Primary key values, then anything that serves as a natural primary key is ideally put in the _id field of the collection, as long as it is reasonably guaranteed to be unique.

Centuplicate answered 1/5, 2014 at 1:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.