mongodb limit in the embedded document

Asked 9/12, 2011 at 22:14 Answered 11/4 at 7:20

I need to create a message system, where a person can have a conversation with many users. For example I start to speak with user2, user3 and user4, so anyone of them can see the whole conversation, and if the conversation is not private at any point of time any of participants can add any other person to the conversation.

Here is my idea how to do this. I am using Mongo and my idea is to use dialog as an instance instead of message.

The schema is listed as follows:

{
_id : ...., // dialog Id
'private' : 0 // is the conversation private
'participants' : [1, 3, 5, 6], //people who are in the conversation
'msgs' :[
  {
   'mid' : ...// id of a message
   'pid': 1, // person who wrote a message
   'msg' : 'tafasd' //message
  },
  ....
  {
   'mid' : ...// id of a message
   'pid': 1, // person who wrote a message
   'msg' : 'tafasd' //message
  }
]
}

I can see some pros for this approach - in a big database it will be easy to find messages for some particular conversation. - it will be easy to add people to the conversation.

but here is a problem, for which I can't find a solution: the conversation is becoming too long (take skype as an example) and they are not showing you all the conversation, they are showing you a part and afterwards they are showing you additional messages. In other situations skip, limit solves the case, but how can I do this here?

If this is impossible what suggestions do you have?

Interject answered 9/12, 2011 at 22:14 Comment(0)

The MongoDB docs explain how to select a subrange of an array element.

db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: 5}}) // first 5 comments
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: -5}}) // last 5 comments
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: [20, 10]}}) // skip 20, limit 10
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: [-20, 10]}}) // 20 from end, limit 10

You can use this technique to only select the messages that are relevant to your UI. However, I'm not sure that this is a good schema design. You may want to consider separating out "visible" messages from "archived" messages. It might make the querying a bit easier/faster.

Heroics answered 9/12, 2011 at 22:36 Comment(1)

No problem. If my response helped you with your problem, please mark the answer as selected. This will give me points, and make users more likely to answer your questions in the future :) – Heroics 9/12, 2011 at 23:47

There are caveats if your conversation will have many many messages:

You will notice significant performance reduction on slicing messages arrays as mongodb will do load all of them and will slice the list before return to driver only.
There is document size limit (16MB for now) that could be possibly reached by this approach.

My suggestions is:

Use two collections: one for conversations and the other for messages.
Use dbref in messages to conversation (index this field with the message timestamp to be able to select older ranges on user request).
Additional use separate capped collection for every conversation. It will be easy to find it by name if you build it like "conversation_"

Result:

You will have to write all messages twice. But into separate collections which is normal.
When you want to show your conversation you will need just to select all the data from one collection in natural sort order which is very fast.
Your capped collections will automatically store last messages and delete old.
You may show older messages on the user request by querying main messages collection.

Julietajulietta answered 12/12, 2011 at 11:34 Comment(2)

@SalvadorDali You do not need to afraid about huge number of collections. Choosing the right one is very fast and there is no theoretical limit on that number. But you are right it will be hard to support such a big number of collections. Now I'm going to suggest to use one huge capped collection with additional index on conversation. There will be two additional issues in such case: some old conversation will load without any previous messages and it is not very good to have an index in capped collection. – Julietajulietta 12/12, 2011 at 13:52

May be it will be easier to deal with a big number of collections if they will be separated into another db. Speaking on the document size. It is even not good to have a bunch of huge documents that is about 1MB in size. Because it will reduce driver performance, replication and sharding performance. Personally I will never store conversation in one document. There are many possible issues: searching over messages, sharing or copying single message, etc. – Julietajulietta 12/12, 2011 at 13:52

I think one can use the subset pattern here to avoid running out of the 16GB limit of a document as alluded by @lig here. Here's how you would model it with the subset pattern:

dialogs collection:

{
 _id : ...., // dialog Id
 'private' : 0 // is the conversation private
 'participants' : [1, 3, 5, 6], //people who are in the conversation
 'messages' :[
  // Most recent messages embedded here (example: latest 100 messages)
  {
   'mid' : ...// id of a message
   'pid': 1, // person who wrote a message
   'msg' : 'How are you doing?' //message
  },
  ....
  {
   'mid' : ...// id of a message
   'pid': 2, // person who wrote a message
   'msg' : 'I am fine buddy!' //message
  }
 ]
}

Notice, this is identical to the original example, except we do not embed all the messages in the dialogue document as before, we only embed the subset of them (most recent) and store all the older ones including the most recent in its own collection - messages collection - like below:

messages collection:

{
  'mid' : ...// id of a message
  'pid': 1, // person who wrote a message
  'msg' : 'How are you doing?' //message
}

By keeping only the most frequently accessed messages directly within the dialogs documents, we reduce the working set and improve the performance. Meanwhile, the older messages peacefully reside in their own collection, ready to be fetched whenever needed.

One tradeoff that we must make when using the subset pattern is that we must manage the subset and also if we need to pull in older reviews or all of the information, it will require additional trips to the database to do so.

Additional resources:

I have written an article on using the Subset pattern on Medium here: https://medium.com/@desai.ashique/data-modelling-for-many-to-many-relationship-in-mongodb-48f1c80910b7

MongoDB blog on Subset Pattern: https://www.mongodb.com/blog/post/building-with-patterns-the-subset-pattern

Lenity answered 11/4 at 7:20 Comment(0)

Recommended topics

Hot tags