Relations in Document-oriented database?

Asked 25/2, 2010 at 14:57 Answered 9/5, 2013 at 4:35

I'm interested in document-oriented databases, and I'd like to play with MongoDB. So I started a fairly simple project (an issue tracker), but am having hard times thinking in a non-relational way.

My problems:

I have two objects that relate to each other (e.g. issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}} - here I have a user related to the issue). Should I create another document 'user' and reference it in 'issue' document by its id (like in relational databases), or should I leave all the user's data in the subdocument?
If I have objects (subdocuments) in a document, can I update them all in a single query?

Comptom answered 25/2, 2010 at 14:57 Comment(6)

have a look ayende.com/blog/4465/… – Lethargy 3/5, 2013 at 9:30

This is an interesting video on this very topic: 10gen.com/presentations/schema-design-basics-1 – Loganloganberry 3/5, 2013 at 15:20

MongoDB by itself is a wonderful NoSQL document oriented database. It really comes alive when you couple it with Mongoose ORM. I just made a blind assumption that you are using Node.js, so if not, my apologies. MongoDB and Mongoose paired together make NoSQL capable of what you can accomplish in a RDBMS with 20x more flexibility and power. Mongoose provides the concept of a "population", which gives you the joins between documents you are looking for. Again, Mongoose is javascript/Node.js, but I'm sure there are other ORMs for other languages. – Knar 7/5, 2013 at 1:23

See @oyatek response to this question: #13325846 – Loner 8/5, 2013 at 20:20

@Loner no offence but I would not consider that a "substantial" answer, also not true all of the time – Tidewater 9/5, 2013 at 17:22

@Tidewater no offense taken, was just pointing out the similar question. – Loner 9/5, 2013 at 18:14

I'm totally new to document-oriented databases, and right now I'm trying to develop sort of a CMS using node.js and mongodb so I'm facing the same problems as you.

By trial and error I found this rule of thumb: I make a collection for every entity that may be a "subject" for my queries, while embedding the rest inside other objects.

For example, comments in a blog entry can be embedded, because usually they're bound to the entry itself and I can't think about a useful query made globally on all comments. On the other side, tags attached to a post might deserve their own collection, because even if they're bound to the post, you might want to reason globally about all the tags (for example making a list of trending topics).

Doak answered 5/11, 2011 at 11:12 Comment(2)

What is meant by " I make a collection for every entity that may be a "subject" for my queries, while embedding the rest inside other objects."? – Tidewater 9/5, 2013 at 10:58

Also why have you used the blog relation which is quite a different type of subdocument to the one explained in this question? – Tidewater 9/5, 2013 at 17:19

In my mind this is actually pretty simple. Embedded documents can only be accessed via their master document. If you can envision a need to query an object outside the context of the master document, then don't embed it. Use a ref.

For your example

issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}

I would make issue and reporter each their own document, and reference the reporter in the issue. You could also reference a list of issues in reporter. This way you won't duplicate reporters in issues, you can query them each separately, you can query reporter by issue, and you can query issues by reporter. If you embed reporter in issue, you can only query the one way, reporter by issue.

If you embed documents, you can update them all in a single query, but you have to repeat the update in each master document. This is another good reason to use reference documents.

Loner answered 8/5, 2013 at 20:32 Comment(3)

Embedded documents can be retrieved without needing the master, i.e. to count all reporters who are qwer : db.tickets.aggregate({$match:{"reporter.username":"qwer"}},{$group:{_id:'$reporter.username', c:{$sum:1}}}) also: "If you embed documents, you cannot update them all in a single query." Yes you can, it could be a in place update; the problem is repitition of the update across multiples masters if the user is duplicated but even then it is a in-place $set update, you don't have to load each master document. – Tidewater 9/5, 2013 at 11:7

@Tidewater thanks for the clarification, I have updated my answer. Sorry for my confusion, I have worked with mongodb primarily via an ORM layer, so my understanding of the underpinnings is a little off. – Loner 9/5, 2013 at 18:19

Indeed many ORMs are orientated around flat structures and do not handle subdocuments very well or at all – Tidewater 9/5, 2013 at 18:43

The beauty of mongodb and other "NoSQL" product is that there isn't any schema to design. I use MongoDB and I love it, not having to write SQL queries and awful JOIN queries! So to answer your two questions.

1 - If you create multiple documents, you'll need make two calls to the DB. Not saying it's a bad thing but if you can throw everything into one document, why not? I recall when I used to use MySQL, I would create a "blog" table and a "comments" table. Now, I append the comments to the record in the same collection (aka table) and keep building on it.

2 - Yes ...

Constrain answered 4/3, 2010 at 18:47 Comment(4)

Sure, it makes sense to keep comments in the blog object, but normalisation/DRY still applies: it's a bad form to put the same data in multiple places so if you want to have users appear more than once throughout a collection, it's probably better to reference them by ID and put them in their own collection. The beauty is that you don't have to create special collections just to express many-to-many relationships. – Molini 11/3, 2010 at 20:38

Very good suggestion luckytaxi. The one thing to be aware of is there is a maximum document size of 2mb for 32-bit systems, and 4mb fir 64-bit systems. As long as you re certain a document will never reach this size, go for it. With loosely-related data such as users and bugs, I'd recommend keeping them in two separate collections. And if you really want to, you can also do some sort of "join" in Mongo, although I've never travelled down that road before. – Reverie 28/12, 2010 at 17:6

You do have to do JOINs in MongoDB in a relational schema, just client side, it does not do server side joins. – Tidewater 9/5, 2013 at 17:20

Actually there is always a "schema" to design and this schema is in your code. You can deal with that using some ORM framework and you don't need to use SQL at all. – Dinerman 27/2, 2018 at 23:18

Redid this answer since the original answer took the relation the wrong way round due to reading incorrectly.

issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}

As to whether embedding some important information about the user (creator) of the ticket is a wise decision or not depends upon the system specifics.

Are you giving these users the ability to login and report issues they find? If so then it is likely you might want to factor that relation off to a user collection.

On the other hand, if that is not the case then you could easily get away with this schema. The one problem I see here is if you wish to contact the reporter and their job role has changed, that's somewhat awkward; however, that is a real world dilemma, not one for the database.

Since the subdocument represents a single one-to-one relation to a reporter you also should not suffer fragmentation problems mentioned in my original answer.

There is one glaring problem with this schema and that is duplication of changing repeating data (Normalised Form stuff).

Let's take an example. Imagine you hit the real world dilemma I spoke about earlier and a user called Nigel wants his role to reflect his new job position from now on. This means you have to update all rows where Nigel is the reporter and change his role to that new position. This can be a lengthy and resource consuming query for MongoDB.

To contradict myself again, if you were to only have maybe 100 tickets (aka something manageable) per user then the update operation would likely not be too bad and would, in fact, by manageable for the database quite easily; plus due to the lack of movement (hopefully) of the documents this would be a completely in place update.

So whether this should be embedded or not depends heavily upn your querying and documents etc, however, I would say this schema isn't a good idea; specifically due to the duplication of changing data across many root documents. Technically, yes, you could get away with it but I would not try.

I would instead split the two out.

If I have objects (subdocuments) in a document, can I update them all in a single query?

Just like the relation style in my original answer, yes and easily.

For example, let's update the role of Nigel to MD (as hinted earlier) and change the ticket status to completed:

db.tickets.update({'reporter.username':'Nigel'},{$set:{'reporter.role':'MD', status: 'completed'}})

So a single document schema does make CRUD easier in this case.

One thing to note, stemming from your English, you cannot use the positional operator to update all subdocuments under a root document. Instead it will update only the first found.

Again hopefully that makes sense and I haven't left anything out. HTH

Original Answer

here I have a user related to the issue). Should I create another document 'user' and reference it in 'issue' document by its id (like in relational databases), or should I leave all the user's data in the subdocument?

This is a considerable question and requires some background knowledge before continuing.

First thing to consider is the size of a issue:

issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}

Is not very big, and since you no longer need the reporter information (that would be on the root document) it could be smaller, however, issues are never that simple. If you take a look at the MongoDB JIRA for example: https://jira.mongodb.org/browse/SERVER-9548 (as a random page that proves my point) the contents of a "ticket" can actually be quite considerable.

The only way you would gain a true benefit from embedding the tickets would be if you could store ALL user information in a single 16 MB block of contigious sotrage which is the maximum size of a BSON document (as imposed by the mongod currently).

I don't think you would be able to store all tickets under a single user.

Even if you was to shrink the ticket to, maybe, a code, title and a description you could still suffer from the "swiss cheese" problem caused by regular updates and changes to documents in MongoDB, as ever this: http://www.10gen.com/presentations/storage-engine-internals is a good reference for what I mean.

You would typically witness this problem as users add multiple tickets to their root user document. The tickets themselves will change as well but maybe not in a drastic or frequent manner.

You can, of course, remedy this problem a bit by using power of 2 sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes which will do exactly what it says on the tin.

Ok, hypothetically, if you were to only have code and title then yes, you could store the tickets as subdocuments in the root user without too many problems, however, this is something that comes down to specifics that the bounty assignee has not mentioned.

If I have objects (subdocuments) in a document, can I update them all in a single query?

Yes, quite easily. This is one thing that becomes easier with embedding. You could use a query like:

db.users.update({user_id:uid,'tickets.code':'asdf-1'}, {$set:{'tickets.$.title':'Oh NOES'}})

However, to note, you can only update ONE subdocument at a time using the positional operator. As such this means you cannot, in a single atomic operation, update all ticket dates on a single user to 5 days in the future.

As for adding a new ticket, that is quite simple:

db.users.update({user_id:uid},{$push:{tickets:{code:asdf-1,title:"Whoop"}}})

So yes, you can quite simply, depending on your queries, update the entire users data in a single call.

That was quite a long answer so hopefully I haven't missed anything out, hope it helps.

Tidewater answered 3/5, 2013 at 9:27 Comment(3)

nobody proposed embedding tickets into the user document that would be rather inane. I don't see why you discuss whether or not it would be a good idea. The question was about whether to embed user information (for user who reported it?) inside each ticket rather than just a reference. – Helmut 5/5, 2013 at 8:0

@AsyaKamsky hmm it seems re-reading the question I got the relation the wrong way round – Tidewater 5/5, 2013 at 8:35

@AsyaKamsky edited, am a bit tired but I think I may have covered it – Tidewater 5/5, 2013 at 12:47

The schema design in Document-oriented DBs can seems difficult at first, but building my startup with Symfony2 and MongoDB I've found that the 80% of the time is just like with a relational DB.

At first, think it like a normal db:

To start, just create your schema as you would with a relational Db:

Each Entity should have his own Collection, especially if you'll need to paginate the documents in it.

(in Mongo you can somewhat paginate nested document arrays, but the capabilities are limited)

Then just remove overly complicated normalization:

do I need a separate category table? (simply write the category in a column/property as a string or embedded doc)
Can I store comments count directly as an Int in the Author collection? (then update the count with an event, for example in Doctrine ODM)

Embedded documents:

Use embedded documents only for:

clearness (nested documents like: addressInfo, billingInfo in the User collection)
to store tags/categories ( eg: [ name: "Sport", parent: "Hobby", page: "/sport" ] )
to store simple multiple values (for eg. in User collection: list of specialties, list of personal websites)

Don't use them when:

the parent Document will grow too large
when you need to paginate them
when you feel the entity is important enough to deserve his own collection

Duplicate values across collection and precompute counts:

Duplicate some columns/attributes values from a Collection to another if you need to do a query with each values in the where conditions. (remember there aren't joins)

eg: In the Ticket collection put also the author name (not only the ID)

Also if you need a counter (number of tickets opened by user, by category, ecc), precompute them.

Embed references:

When you have a One-to-Many or Many-to-Many reference, use an embedded array with the list of the referenced document ids (see MongoDB DB Ref).

You'll need to use an Event again to remove an id if the referenced document get deleted. (There is an extension for Doctrine ODM if you use it: Reference Integrity)

This kind of references are directly managed by Doctrine ODM: Reference Many

Its easy to fix errors:

If you find late that you have made a mistake in the schema design, its quite simply to fix it with few lines of Javascript to run directly in the Mongo console.

(stored procedures made easy: no need of complex migration scripts)

Waring: don't use Doctrine ODM Migrations, you'll regret that later.

Ethiopia answered 3/5, 2013 at 10:25 Comment(6)

Paginating subdocuments is easily possible, with the aggregation framework you can also do complex sorting etc. Also MongoDBRef is a particular type of reference and most of the time it is less space consuming to use just an OjbectId, but yeah. – Tidewater 3/5, 2013 at 11:21

Pagination was possible since the $slice modifier, but the problem is that you can't use the same query commands you'll use to paginate a Collection. Surely the aggregation framework help, but take a look at the other issues with arrays: jira.mongodb.org/secure/QuickSearch.jspa The main problem is simply: they aren't Collections. – Ethiopia 3/5, 2013 at 12:46

For the MongoDBRef: in Doctrine ODM you can use the simple=true modifier to a Reference to store only the ObjectId. However, the DbRef is cleaner since it stores also the collection and the db of the referenced document. Also, if you don't have several millions of rows, why bother? – Ethiopia 3/5, 2013 at 12:49

Do note that referencing the behaviour of a certain ORM might not be a good idea, there is no reference to any specific language in this question as such you cannot rely on that functionality outside of that specific program. The DBref uses a lot more space and in turn working set, especially if you are dealing with 7 million rows in your working set (if) and you will quickly notice that using a DBRef everywhere might result you needing to change your schema. – Tidewater 3/5, 2013 at 13:1

There are issues with subdocuments however, most of those issues are mitigatable with certain projection and schema orientation mechanisms (i.e. storing batches of 100 elements in the root document and projecting only what you need). But it is true that using subdocuments extensively does increase the working set plus use of in-memory operators such as $push etc can result in performance problems. – Tidewater 3/5, 2013 at 13:3

I should also note that MongoDB has no sense of stored procedures, also migration scripts are not the same as stored procedures, they are completely different things for completely different tasks – Tidewater 3/5, 2013 at 13:23

I like MongoDB, but I have to say that I will use it a lot more soberly in my next project.

Specifically, I have not had as much luck with the Embedded Document facility as people promise.

Embedded Document seems to be useful for Composition (see UML Composition), but not for aggregation. Leaf nodes are great, anything in the middle of your object graph should not be an embedded document. It will make searching and validating your data more of a struggle than you'd want.

One thing that is absolutely better in MongoDB is your many-to-X relationships. You can do a many-to-many with only two tables, and it's possible to represent a many-to-one relationship on either table. That is, you can either put 1 key in N rows, or N keys in 1 row, or both. Notably, queries to accomplish set operations (intersection, union, disjoint set, etc) are actually comprehensible by your coworkers. I have never been satisfied with these queries in SQL. I often have to settle for "two other people will understand this".

If you've ever had your data get really big, you know that inserts and updates can be constrained by how much the indexes cost. You need fewer indexes in MongoDB; an index on A-B-C can be used to query for A, A & B, or A & B & C (but not B, C, B & C or A & C). Plus the ability to invert a relationship lets you move some indexes to secondary tables. My data hasn't gotten big enough to try, but I'm hoping that will help.

Vicissitude answered 9/5, 2013 at 4:35 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Original Answer

Recommended topics

Hot tags