MongoDB Many-to-Many Association
Asked Answered
A

5

164

How would you do a many-to-many association with MongoDB?

For example; let's say you have a Users table and a Roles table. Users have many roles, and roles have many users. In SQL land you would create a UserRoles table.

Users:
    Id
    Name

Roles:
    Id
    Name

UserRoles:
    UserId
    RoleId

How is same sort of relationship handled in MongoDB?

Ann answered 25/2, 2010 at 18:46 Comment(1)
See also answers to this question and this questionSqueal
P
107

Depending on your query needs you can put everything in the user document:

{name:"Joe"
,roles:["Admin","User","Engineer"]
}

To get all the Engineers, use:

db.things.find( { roles : "Engineer" } );

If you want to maintain the roles in separate documents then you can include the document's _id in the roles array instead of the name:

{name:"Joe"
,roles:["4b5783300334000000000aa9","5783300334000000000aa943","6c6793300334001000000006"]
}

and set up the roles like:

{_id:"6c6793300334001000000006"
,rolename:"Engineer"
}
Proparoxytone answered 25/2, 2010 at 19:19 Comment(6)
The latter would be better since I need to get a list of all available roles. The only bad part is I need to setup both ends of the association then. When doing the SQL way, adding a UserRole will make the User know about the Role and the Role know about the User. This way means I'll have to set the Role on the User, and the User on the Role. I guess that's fine though.Ann
Just because a database does not support sql does not mean that references are not useful tools NoSQL != NoReference see this explanation: mongodb.org/display/DOCS/Schema+DesignTout
This doesn't seem like a good idea. If you only have six roles, sure, but what if you had 20000 objects that could be linked to 20000 more objects (in a many-many relationship)? Even the MongoDB docs hint that you should avoid having mutable, huge arrays of references. docs.mongodb.org/manual/tutorial/…Quit
Obviously for many-to-many relationships with lots of objects you want to use a different solution (like the publisher/book example in the docs). In this case it works fine and would only complicate things if you create separate user-role documents.Proparoxytone
This works for most systems coz roles are usually a small set and we usually take a user and then look at his/her roles. But what if the roles are large? or what if I ask you to give me a list of users who have role == "Engineer"? Now you would have to query your entire users collection (visiting all users who dont have role Engineer as well) only to get 2 or 3 users who may have this role amongst million such users for example. A separate table or collection is much better.Berger
If i want to use the second way, How i do the query for get this: User: { name: 'Joe', roles: [ { _id: '6c6793300334001000000006', rolename: 'Engineer' }, { _id: '4b5783300334000000000aa9', rolename: 'Admin' } ] }Ferd
C
41

Instead of trying to model according to our years of experience with RDBMS's, I have found it much easier to model document-repository solutions using MongoDB, Redis, and other NoSQL data stores by optimizing for the read use cases, while being considerate of the atomic write operations that need to be supported by the write use cases.

For instance, the uses of a "Users in Roles" domain follow:

  1. Role - Create, Read, Update, Delete, List Users, Add User, Remove User, Clear All Users, Index of User or similar to support "Is User In Role" (operations like a container + its own metadata).
  2. User - Create, Read, Update, Delete (CRUD operations like a free-standing entity)

This can be modeled as the following document templates:

User: { _id: UniqueId, name: string, roles: string[] }
    Indexes: unique: [ name ]
Role: { _id: UniqueId, name: string, users: string[] }
    Indexes: unique: [ name ]

To support the high frequency uses, such as Role-related features from the User entity, User.Roles is intentionally denormalized, stored on the User as well as Role.Users having duplicate storage.

If it is not readily apparent in the text, but this is the type of thinking that is encouraged when using document repositories.

I hope that this helps bridge the gap with regard to the read side of the operations.

For the write side, what is encouraged is to model according to atomic writes. For instance, if the document structures require acquiring a lock, updating one document, then another, and possibly more documents, then releasing the lock, likely the model has failed. Just because we can build distributed locks doesn't mean that we are supposed to use them.

For the case of the User in Roles model, the operations that stretch our atomic write avoidance of locks is adding or removing a User from a Role. In either case, a successful operation results in both a single User and a single Role document being updated. If something fails, it is easy to perform cleanup. This is the one reason the Unit of Work pattern comes up quite a lot where document repositories are used.

The operation that really stretches our atomic write avoidance of locks is clearing a Role, which would result in many User updates to remove the Role.name from the User.roles array. This operation of clear then is generally discouraged, but if needed can be implemented by ordering the operations:

  1. Get the list of user names from Role.users.
  2. Iterate the user names from step 1, remove the role name from User.roles.
  3. Clear the Role.users.

In the case of an issue, which is most likely to occur within step 2, a rollback is easy as the same set of user names from step 1 can be used to recover or continue.

Chablis answered 11/9, 2013 at 17:10 Comment(0)
P
21

I've just stumbled upon this question and, although it's an old one, I thought it would be useful to add a couple of possibilities not mentioned in the answers given. Also, things have moved on a bit in the last few years, so it is worth emphasising that SQL and NoSQL are moving closer to each other.

One of the commenters brought up the wise cautionary attitude that “if data is relational, use relational”. However, that comment only makes sense in the relational world, where schemas always come before the application.

RELATIONAL WORLD: Structure data > Write application to get it
NOSQL WORLD: Design application > Structure data accordingly

Even if data is relational, NoSQL is still an option. For example, one-to-many relationships are no problem at all and are widely covered in MongoDB docs

A 2015 SOLUTION TO A 2010 PROBLEM

Since this question was posted, there have been serious attempts at bringing noSQL closer to SQL. The team led by Yannis Papakonstantinou at the University of California (San Diego) have been working on FORWARD, an implementation of SQL++ which could soon be the solution to persistent problems like the one posted here.

At a more practical level, the release of Couchbase 4.0 has meant that, for the first time, you can do native JOINs in NoSQL. They use their own N1QL. This is an example of a JOIN from their tutorials:

SELECT usr.personal_details, orders 
        FROM users_with_orders usr 
            USE KEYS "Elinor_33313792" 
                JOIN orders_with_users orders 
                    ON KEYS ARRAY s.order_id FOR s IN usr.shipped_order_history END

N1QL allows for most if not all SQL operations including aggregration, filtering, etc.

THE NOT-SO-NEW HYBRID SOLUTION

If MongoDB is still the only option, then I'd like to go back to my point that the application should take precedence over the structure of data. None of the answers mention hybrid embedding, whereby most queried data is embedded in the document/object, and references are kept for a minority of cases.

Example: can information (other than role name) wait? could bootstrapping the application be faster by not requesting anything that the user doesn't need yet?

This could be the case if user logs in and s/he needs to see all the options for all the roles s/he belongs to. However, the user is an “Engineer” and options for this role are rarely used. This means the application only needs to show the options for an engineer in case s/he wants to click on them.

This can be achieved with a document which tells the application at the start (1) which roles the user belongs to and (2) where to get information about an event linked to a particular role.

   {_id: ObjectID(),
    roles: [[“Engineer”, “ObjectId()”],
            [“Administrator”, “ObjectId()”]]
   }

Or, even better, index the role.name field in the roles collection, and you may not need to embed ObjectID() either.

Another example: is information about ALL the roles requested ALL the time?

It could also be the case that the user logs in to the dashboard and 90% of the time performs tasks linked to the “Engineer” role. Hybrid embedding could be done for that particular role in full and keep references for the rest only.

{_id: ObjectID(),
  roles: [{name: “Engineer”, 
           property1: value1,
           property2: value2
          },   
          [“Administrator”, “ObjectId()”]
         ]
}

Being schemaless is not just a characteristic of NoSQL, it could be an advantage in this case. It's perfectly valid to nest different types of objects in the “Roles” property of an user object.

Phyfe answered 4/12, 2015 at 22:2 Comment(0)
C
4

There are two approaches can be used:

1st approach

Add reference link into user document roles list (array):

{
  '_id': ObjectId('312xczc324vdfd4353ds4r32')
  user:faizanfareed,
  roles : [
           {'roleName':'admin', # remove this because when we will be updating some roles name we also need to be update in each user document. If not then ignore this.
             roleId: ObjectID('casd324vfdg65765745435v')
          },
            {'roleName':'engineer',
           roleId: ObjectID('casd324vfdvxcv7454rtr35vvvvbre')
           },
          ]
}

And (Base on requirements for queries) we can also add user reference id into role document users list (array):

{
  roleName:admin,
  users : [{userId: ObjectId('312xczc324vdfd4353ds4r32')}, .......]
}

But adding users id into role document size will be exceeded 16MB which is not good at all. We can use this approach if role document size not exceeded and size of users is bounded. If not required we can add roles id into user docs only.


2nd approach which is traditional

Create new collection in which each document contains id's of both user and role.

{
  '_id': ObjectId('mnvctcyu8678hjygtuyoe')
  userId: ObjectId('312xczc324vdfd4353ds4r32')
  roleId: ObjectID('casd324vfdg65765745435v')
            
}

Document size will not be exceeded but read operation is not easy in this approach.


Base on requirements go with 1st or 2nd approach.

Final comments on this : Go with 1st approach and add only roleId into user document array because no of roles will not be greater-than users. User document size will not be exceeded 16MB.

Crucible answered 20/12, 2020 at 7:6 Comment(0)
C
3

in case when employee and company is entity-object try to use following schema:

employee{
   //put your contract to employee
   contracts:{ item1, item2, item3,...}
}

company{
   //and duplicate it in company
   contracts:{ item1, item2, item3,...}
}
Camise answered 17/3, 2011 at 8:27 Comment(1)
this will have read performance but the updates should be atomic which requires some locks or sth similar , right?Eldenelder

© 2022 - 2024 — McMap. All rights reserved.