MongoDB and composite primary keys

Asked 19/4, 2014 at 0:38 Answered 28/4, 2014 at 11:42

Solved mongodb composite-primary-key primary-key-design

I'm trying to determine the best way to deal with a composite primary key in a mongo db. The main key for interacting with the data in this system is made up of 2 uuids. The combination of uuids is guaranteed to be unique, but neither of the individual uuids is.

I see a couple of ways of managing this:

Use an object for the primary key that is made up of 2 values (as suggested here)
Use a standard auto-generated mongo object id as the primary key, store my key in two separate fields, and then create a composite index on those two fields
Make the primary key a hash of the 2 uuids
Some other awesome solution that I currently am unaware of

What are the performance implications of these approaches?

For option 1, I'm worried about the insert performance due to having non sequential keys. I know this can kill traditional RDBMS systems and I've seen indications that this could be true in MongoDB as well.

For option 2, it seems a little odd to have a primary key that would never be used by the system. Also, it seems that query performance might not be as good as in option 1. In a traditional RDBMS a clustered index gives the best query results. How relevant is this in MongoDB?

For option 3, this would create one single id field, but again it wouldn't be sequential when inserting. Are there any other pros/cons to this approach?

For option 4, well... what is option 4?

Also, there's some discussion of possibly using CouchDB instead of MongoDB at some point in the future. Would using CouchDB suggest a different solution?

MORE INFO: some background about the problem can be found here

Disraeli answered 19/4, 2014 at 0:38 Comment(7)

probably the most important question is how are you going to be accessing this data? writing obviously inserts - updates? what about queries? deletes ever? – Kleenex 23/4, 2014 at 17:37

Mostly writes. Then updates (mostly in large bursts). A fair number of inserts (again, mostly in big bursts). Occasional deletes. – Disraeli 23/4, 2014 at 18:31

what field(s) will the updates be using? one or both of the uuids? – Kleenex 23/4, 2014 at 19:33

Updates, deletes and queries will be against both fields – Disraeli 23/4, 2014 at 19:57

together or individually? Or some of each? I.e. is each update going to be targeted to a single unique document (hence providing both uuid values)? – Kleenex 24/4, 2014 at 1:39

That's correct, each update will target a specific document (no multi document updates) – Disraeli 24/4, 2014 at 18:49

@Disraeli Hi, can you show me how you have implemented the composite primary key? What I am doing right now is

StringJoiner joiner = new StringJoiner("/"); 			joiner.add(info.getUserID()).add(idOfApp); 			String idName = joiner.toString();

, and it is working fine but using composite key is what I think is a far better approach and I tried it by using BasciDBObject and appending the two values of my field, but that did not work. Maybe I'm doing something wrong. So if you can show me the full implementation then that will be of great help. Thanks a lot in advance. – Gabby 12/4, 2017 at 6:17

You should go with option 1.

The main reason is that you say you are worried about performance - using the _id index which is always there and already unique will allow you to save having to maintain a second unique index.

For option 1, I'm worried about the insert performance do to having non sequential keys. I know this can kill traditional RDBMS systems and I've seen indications that this could be true in MongoDB as well.

Your other options do not avoid this problem, they just shift it from the _id index to the secondary unique index - but now you have two indexes, once that's right-balanced and the other one that's random access.

There is only one reason to question option 1 and that is if you plan to access the documents by just one or just the other UUID value. As long as you are always providing both values and (this part is very important) you always order them the same way in all your queries, then the _id index will be efficiently serving its full purpose.

As an elaboration on why you have to make sure you always order the two UUID values the same way, when comparing subdocuments { a:1, b:2 } is not equal to { b:2, a:1 } - you could have a collection where two documents had those values for _id. So if you store _id with field a first, then you must always keep that order in all of your documents and queries.

The other caution is that index on _id:1 will be usable for query:

db.collection.find({_id:{a:1,b:2}})

but it will not be usable for query

db.collection.find({"_id.a":1, "_id.b":2})

Kleenex answered 26/4, 2014 at 22:43 Comment(6)

Okay and thanks, I'll keep that in mind. I actually figured it out by myself.

BasicDBObject compositeKey = new BasicDBObject("deviceId", deviceID).append("id", id);			    			String newID = compositeKey.toJson();

where deviceID and id contains the value that I need to append. Thanks once again. – Gabby 12/4, 2017 at 12:50

As for the second caution you mentioned, do you think that if we go with the first option (create a composite ID from the two values), should we replicate the two values as well? Thus we would have the performance benefits of using the composite ID and also able to query the values. – Leaves 3/8, 2018 at 12:25

Guys are you sure the dot notation is not working? For me it does. Using MongoDB v4.2.6 dropbox.com/s/4z1jgch0lhnqebq/… – Motivity 27/4, 2020 at 20:7

Oh the queries will work they just won’t be able to use _id index. – Kleenex 27/4, 2020 at 20:17

One thing to note is that option 1 will be hard to change in the future if something about the primary key changes. It will be a lot easier to drop and add a new index than it will be to rewrite all your data with a different "_id". – Palliate 19/2, 2021 at 17:52

This is the best solution for "big data". Other answers seem to assume indexes are "free". They take up quite a bit of space. – Cleghorn 2/3, 2022 at 22:20

I have an option 4 for you:

Use the automatic _id field and add 2 single field indexes for both uuid's instead of a single composite index.

The _id index would be sequential (although that's less important in MongoDB), easily shardable, and you can let MongoDB manage it.
The 2 uuid indexes let you to make any kind of query you need (with the first one, with the second or with both in any order) and they take up less space than 1 compound index.
In case you use both indexes (and other ones as well) in the same query MongoDB will intersect them (new in v2.6) as if you were using a compound index.

Boltrope answered 28/4, 2014 at 11:42 Comment(2)

Sharding a sequential index is a bad idea, it will not be evenly shared accross all shards + you will have useless index space filled up because you have 2 index instead of 1 (big deal on large collections). – Futch 12/4, 2016 at 13:44

@JonathanMuller Not if you shard them using a hash. "Hashed keys work well with fields that increase monotonically like ObjectId" docs.mongodb.org/manual/core/sharding-shard-key/… – Boltrope 12/4, 2016 at 14:9

I'd go for the 2 option and there is why

Having two separate fields instead of the one concatenated from both uuids as suggested in 1st, will leave you the flexibility to create other combinations of indexes to support the future query requests or if turns out, that the cardinality of one key is higher then another.
having non sequential keys could help you to avoid the hotspots while inserting in sharded environment, so its not such a bad option. Sharding is the best way, for my opinion, to scale inserts and updates on the collections, since the write locking is on database level (prior to 2.6) or collection level (2.6 version)

Daggna answered 26/4, 2014 at 8:9 Comment(5)

Thanks, that's helpful! Just to clarify, you say that having non-sequential keys could help w/ sharded environments. However, option #2 is the one that would give you sequential ids. Am I not understanding something? – Disraeli 26/4, 2014 at 18:23

You right, the non sequential keys comment is related to the statement in your question, where you said, that sequential keys are helpful in RDBMS – Daggna 26/4, 2014 at 19:49

In most RDBMS, records are physically stored on disk ordered by their primary key. When doing a non sequential insert, all the records have to be physically moved. This is what makes inserts slow w/ a non sequential id. It's also why queries against the clustered key are so fast. (You likely already knew that, but I just wanted to put my question in context). Are you saying that this is not the case in MongoDB? My reading had led me to believe that was still a consideration. I know that the standard MongoDB ObjectIDs are sequential. I assumed that was part of the reason. – Disraeli 27/4, 2014 at 0:3

Clustered index in RDBMS implies that the data is kept in the same order as index. Which requires additional work to insert the data in "between". In mongo, there is no such a thing as clustered index and data is not required to be added to the disk at the same order as the _id index – Daggna 27/4, 2014 at 7:28

Ah, that's what I was missing. Thanks a ton! – Disraeli 27/4, 2014 at 15:34

I would've gone with option 2. You can still make an index that handles both the UUID fields, and performance should be the same as a compound primary key, except it'll be much easier to work with.

Also, in my experience, I've never regretted giving something a unique ID, even if it wasn't strictly required. Perhaps that's an unpopular opinion though.

Wattle answered 22/4, 2014 at 18:15 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags