The correct way of storing document reference in one-to-one relationship in MongoDB
Asked Answered
P

4

7

I have two MongoDB collections user and customer which are in one-to-one relationship. I'm new to MongoDB and I'm trying to insert documents manually although I have Mongoose installed. I'm not sure which is the correct way of storing document reference in MongoDB.

I'm using normalized data model and here is my Mongoose schema snapshot for customer:

/** Parent user object */
user: {
    type: Schema.Types.ObjectId,
    ref: "User",
    required: true
}

user

{ 
    "_id" : ObjectId("547d5c1b1e42bd0423a75781"), 
    "name" : "john", 
    "email" : "[email protected]", 
    "phone" : "01022223333", 
}

I want to make a reference to this user document from the customer document. Which of the following is correct - (A) or (B)?

customer (A)

{ 
    "_id" : ObjectId("547d916a660729dd531f145d"), 
    "birthday" : "1983-06-28", 
    "zipcode" : "12345", 
    "address" : "1, Main Street", 
    "user" : ObjectId("547d5c1b1e42bd0423a75781")
}

customer (B)

{ 
    "_id" : ObjectId("547d916a660729dd531f145d"), 
    "birthday" : "1983-06-28", 
    "zipcode" : "12345", 
    "address" : "1, Main Street", 
    "user" : {
        "_id" : ObjectId("547d5c1b1e42bd0423a75781")
    }
}
Prink answered 2/12, 2014 at 10:43 Comment(0)
C
4

Use variant A. As long as you don't want to denormalize any other data (like the user's name), there's no need to create a child object.

This also avoids unexpected complexities with the index, because indexing an object might not behave like you expect.

Even if you were to embed an object, _id would be a weird name - _id is only a reserved name for a first-class database document.

Corny answered 2/12, 2014 at 10:52 Comment(2)
Thank you. So, the variant B is a way of denormalizing any other data of the user, right? or is it a form of embedding object other than reference?Prink
Yes - usually not required in a 1:1 scenario. But let's say you have a very large collection (like a newsfeed), then you might not want to fetch the name of all referenced users every time, but copy the user names to the newsfeed. That will make the feed be eventually consistent, i.e. if a user changes name, you'll need an async worker that modifies the feed documents. I'd try to avoid that where possible. For instance, using an indexed $in query and joining client-side for subsets of the data often works.Corny
A
5

Remember these things

Embedding is better for...

  • Small subdocuments
  • Data that does not change regularly
  • When eventual consistency is acceptable
  • Documents that grow by a small amount
  • Data that you’ll often need to perform a second query to fetch Fast reads

References are better for...

  • Large subdocuments
  • Volatile data
  • When immediate consistency is necessary
  • Documents that grow a large amount
  • Data that you’ll often exclude from the results
  • Fast writes

Variant A is Better. you can use also populate with Mongoose

Alita answered 2/12, 2014 at 10:52 Comment(3)
Thanks, but my question is not Embedding vs References.Prink
@mnemosyn I have take this list from MongoDb Book ( Definitive Guide ). and for me it's rightAlita
@mnemosyn it's true about the list is not complete. sorry for my english :)Alita
C
4

Use variant A. As long as you don't want to denormalize any other data (like the user's name), there's no need to create a child object.

This also avoids unexpected complexities with the index, because indexing an object might not behave like you expect.

Even if you were to embed an object, _id would be a weird name - _id is only a reserved name for a first-class database document.

Corny answered 2/12, 2014 at 10:52 Comment(2)
Thank you. So, the variant B is a way of denormalizing any other data of the user, right? or is it a form of embedding object other than reference?Prink
Yes - usually not required in a 1:1 scenario. But let's say you have a very large collection (like a newsfeed), then you might not want to fetch the name of all referenced users every time, but copy the user names to the newsfeed. That will make the feed be eventually consistent, i.e. if a user changes name, you'll need an async worker that modifies the feed documents. I'd try to avoid that where possible. For instance, using an indexed $in query and joining client-side for subsets of the data often works.Corny
A
2

One to one relations

1 to 1 relations are relations where each item corresponds to exactly one other item. e.g.:

  • an employee have a resume and vice versa
  • a building have and floor plan and vice versa
  • a patient have a medical history and vice versa

    
    //employee
    {
    _id : '25',
    name: 'john doe',
    resume: 30
    }
    
    //resume
    {
    _id : '30',
    jobs: [....],
    education: [...],
    employee: 25
    }
    
    

We can model the employee-resume relation by having a collection of employees and a collection of resumes and having the employee point to the resume through linking, where we have an ID that corresponds to an ID in th resume collection. Or if we prefer, we can link in another direction, where we have an employee key inside the resume collection, and it may point to the employee itself. Or if we want, we can embed. So we could take this entire resume document and we could embed it right inside the employee collection or vice versa.

This embedding depends upon how the data is being accessed by the application and how frequently the data is being accessed. We need to consider:

  • frequency of access
  • the size of the items - what is growing all the time and what is not growing. So every time we add something to the document, there is a point beyond which the document need to be moved in the collection. If the document size goes beyond 16MB, which is mostly unlikely.
  • atomicity of data - there're no transactions in MongoDB, there're atomic operations on individual documents. So if we knew that we couldn't withstand any inconsistency and that we wanted to be able to update the entire employee plus the resume all the time, we may decide to put them into the same document and embed them one way or the other so that we can update it all at once.
Argus answered 29/8, 2016 at 16:32 Comment(0)
C
1

In mongodb its very recommended to embedding document as possible as you can, especially in your case that you have 1-to-1 relations.

Why? you cant use atomic-join-operations (even it is not your main concern) in your queries (not the main reason). But the best reason is each join-op (theoretically) need a hard-seek that take about 20-ms. embedding your sub-document just need 1 hard-seek.

I believe the best db-schema for you is using just an id for all of your entities

{
    _id : ObjectId("547d5c1b1e42bd0423a75781"),
    userInfo : 
    {
        "name" : "john", 
        "email" : "[email protected]", 
        "phone" : "01022223333",
    },
    customerInfo : 
    {
        "birthday" : "1983-06-28", 
        "zipcode" : "12345", 
        "address" : "1, Main Street", 
    },
    staffInfo : 
    {
        ........
    }
}

Now if you just want the userinfo you can use

db.users.findOne({_id : ObjectId("547d5c1b1e42bd0423a75781")},{userInfo : 1}).userInfo;

it will give you just the userInfo:

/* 0 */
{
    "name" : "john",
    "email" : "[email protected]",
    "phone" : "01022223333"
}

And if you just want the **customerInfo ** you can use

db.users.findOne({_id : ObjectId("547d5c1b1e42bd0423a75781")},{customerInfo : 1}).customerInfo;

it will give you just the customerInfo :

/* 0 */
{
    "birthday" : "1983-06-28",
    "zipcode" : "12345",
    "address" : "1, Main Street"
}

and so on.

This schema has the minimum hard round-trip and actually you are using mongodb document-based feature with best performance you can achive.

Carlotacarlotta answered 2/12, 2014 at 10:51 Comment(8)
This is simply wrong. Making a user be the child of a customer object makes no sense if the user is a first-class citizen in the application (which it usually is). How to model data in NoSQL databases depends on the way the data is used, the query patterns, etc. A simple rule like "always embed" is just bad advice.Corny
I'm just looking for the correct usage in normalized data model as I mentioned in my question.Prink
@mnemosyn, embeding a one-to-one relation is strongly advised and with proper index we'll have a high performance queries. in this case a user can be sub-document for customer. but the sub-document schema could be flawlessCarlotacarlotta
This isn't the point - it creates a catastrophic object/document mapping, makes the wrong code (a kind of CustomerService) be in charge of aspects of a user, i.e. it violates key principles reg. isolation and single responsibility. If it doesn't make sense from an object oriented perspective, its not a smart move in the DB either only because it can be done. Using two separate collections makes a lot of sense often, setting aside the fact that the OP already figured that out and that the 'always embed' rule doesn't answer which WAY to embed.Corny
Using two separate collections makes sense in tabular dbs not no-sql db,2 collections brings extra (redundant) join-like operations for mongodb, as we know mongodb has not an atomic command join-like operations.Carlotacarlotta
While in SQL, the db structure is dictated by the data structure, the idea of NoSQL is to use a structure that adjusts to the query patterns and reduces the object-relational mismatch, not to use one simple rule to kill every discussion. No atomic updates needed here anyway. Your arguments feel like copy paste to me; MongoDB has no join operations so the only additional effort will be a query which makes sense in a web environment where the user is required for authentication while the customer will be used in a much later query during the request.Corny
No, its not a copy-paste thing, its a no-sql database design. 2 collection needs 2 read operations. embedding needs just 1 (its 1to1 relation). anyway there is a-lot of db designs and I advice @Prink to consider embedding design. (no-sql db needs new thinning)Carlotacarlotta
I presented only user and customer here. In fact, the user is sharing among some tables/collections, i.e., I have another collection staff which is also 1:1 relationship to user. This means that customer and staff has their own functionalities, but there are same under the title log-in user. That's why I made 1:1.Prink

© 2022 - 2024 — McMap. All rights reserved.