MongoDB / NOSQL: Best approach to handling read/unread status on messages

Asked 15/11, 2010 at 1:41 Answered 31/8, 2017 at 4:12

Solved mongodb database-design non-relational-database nosql

Suppose you have a large number of users (M) and a large number of documents (N) and you want each user to be able to mark each document as read or unread (just like any email system). What's the best way to represent this in MongoDB? Or any other document database?

There are several questions on StackOverflow asking this question for relational databases but I didn't see any with recommendations for document databases:

What's the most efficient way to remember read/unread status across multiple items?

Implementing an efficient system of "unread comments" counters

Typically the answers involve a table listing everything a user has read: (i.e. tuples of user id, document id) with some possible optimizations for a cut off date allowing mark-all-as-read to wipe the database and start again knowing that anything prior to that date is 'read'.

So, MongoDB / NOSQL experts, what approaches have you seen in practice to this problem and how did they perform?

Yajairayajurveda answered 15/11, 2010 at 1:41 Comment(0)

{
_id: messagePrefs_uniqueId,
type: 'prefs',
timestamp: unix_timestamp
ownerId: receipientId,
messageId: messageId,
read: true / false,
}

{
_id: message_uniqueId,
timestamp: unix_timestamp
type: 'message',
contents: 'this is the message',
senderId: senderId,
recipients: [receipientId1,receipientId2]
}

Say you have 3 messages you want to retrieve preferences for, you can get them via something like:

db.messages.find({
messageId : { $in : [messageId1,messageId2,messageId3]},
ownerId: receipientId, 
type:'prefs'
})

If all you need is read/unread you could use this with MongoDB's upsert capabilities, so you are not creating prefs for each message unless the user actually reads it, then basically you create the prefs object with your own unique id and upsert it into MongoDB. If you want more flexibility(like say tags or folders) you'll probably want to make the pref for each recipient of the message. For example you could add:

tags: ['inbox','tech stuff']

to the prefs object and then to get all the prefs of all the messages tagged with 'tech stuff' you'd go something like:

db.messages.find({type: 'prefs', ownerId: recipientId, tags: 'tech stuff'})

You could then use the messageIds you find within the prefs to query and find all the messages that correspond:

db.messages.find((type:'message', _id: { $in : [array of messageIds from prefs]}})

It might be a little tricky if you want to do something like counting how many messages each 'tag' contains efficiently. If it's only a handful of tags you can just add .count() to the end of your query for each query. If it's hundreds or thousands then you might do better with a map/reduce server side script or maybe an object that keeps track of message counts per tag per user.

Asphaltite answered 15/11, 2010 at 2:43 Comment(6)

Thanks, so your recommendation is essentially the same kind of 'tuple/join' table as the relational case, right? Any particular reason you store both the messages and the prefs in the same collection? – Yajairayajurveda 15/11, 2010 at 3:11

The thing with MongoDB is that usually the flatter you can make your object the better. While it can store nested structures it's not the best at querying or getting into those structures later to alter them. So a lot of stuff may end up looking similar to a relational, but with less abstraction due to not using tables. Also there is really no reason I store them in the same collection other than not liking to have a bazillion collections. If you do plan on having millions of messages it might be wise to use different collections so that you can setup the indexes to fit each object better. – Asphaltite 15/11, 2010 at 4:46

Have to downvote this, as it completely misses the actual topic. Adding users to the document is just plain stupid, when you have 1000 or 10000 users you have 10000 ids on the document which are read every time you read the document and delivery abysmal performance as the number of users grow – Bogus 31/5, 2023 at 8:34

@Bogus the question merely stated you'd have a large number of users, not that each message would have a large number of recipients. If you truly have messages with a subset of thousands or hundreds of thousands of recipients from your user pool, then yeah, you'll want to reconsider your approach. That seems atypical for an email or direct messaging system. – Asphaltite 1/6, 2023 at 15:13

The OP is pretty clear "large number of users (M) and a large number of documents (N) and you want each user to be able to mark each document", emphasis on bold parts. "Each User" means "every user". Generally with large number of users you can assume at least 10000, which is 80kb only for the marked keys, no other performance considered. Second the question mentions nothing about messages or receivers, the "like any email system" was describing/comparing the read/unread feature with something well-known that has a similar feature. – Bogus 2/6, 2023 at 16:59

If they did not think the answer was on topic then they didn't need to accept it. If you think you know a better answer then post it instead of whining in the comments here. – Asphaltite 3/6, 2023 at 5:18

If you're only storing a simple boolean value, like read/unread, another method is to embedded an array in each Document that contains a list of the Users who have read it.

{
  _id: 'document#42',
  ...
  read_by: ['user#83', 'user#2702']
}

You should then be able to index that field, making for fast queries for Documents-read-by-User and Users-who-read-Document.

db.documents.find({read_by: 'user#83'})

db.documents.find({_id: 'document#42}, {read_by: 1})

However, I find that I'm usually querying for all Documents that have not been read by a particular User, and I can't think of any solution that can make use of the index in this case. I suspect it's not possible to make this fast without having both read_by and unread_by arrays, so that every User is included in every Document (or join table), but that would have a large storage cost.

Putrescible answered 31/8, 2017 at 4:12 Comment(3)

Regarding that last point about querying for unread messages but using a read_by field, correct me if I'm wrong but couldn't a $not clause achieve this, as in $not: {$in: [{id: 'user#83'}]}? – Arrivederci 24/10, 2017 at 13:53

I would keep that counter in separate collection, the reason why is that you will reach it only when you really need it, and when it would be a part of the message document it could be returned even when not needed. Of course, if we assume that user takes only what he neeeds from the database then that approach is fine, but as I experienced it's better to keeep things separated. – Festa 2/7, 2018 at 7:16

Depending on your application architecture, it may be important to consider: MongoDB arrays - atomic update or push element – Attentive 1/7 at 15:49

Recommended topics

Hot tags