Twitter-like app using MongoDB
Asked Answered
M

1

14

I'm making an app that uses the classic "follow" mechanism (the one used by Twitter and a lot of other apps around the web). I'm using MongoDB. My system has a difference, though: an user can follow groups of users. That means that if you follow a group, you'll automatically follow all the users who are members of that group. Of course users can belong to more than one group.

This is what I came up with:

  • when user A follows user B, id of user B gets added to an embedded array (called following) in user A's document
  • for unfollowing, I remove the id of the followed user from the following array
  • groups work in the same way: when user A follows group X, id of group X gets added to the following array. (I actually add a DBRef so I know if the connection is to an user or a group.)

  • when I have to check if user A follows group X, I just search for the group's id in user A's following array.

  • when I have to check if user A follows user B, things gets a little trickier. Each user's document has an embedded array listing all the groups the user belongs to. So I use an $or condition to check if user A is either following user B directly or via a group. Like this:

    db.users.find({'$or':{'following.ref.$id':$user_id,'following.ref.$ref','users'},{'following.ref.$id':{'$in':$group_ids},'following.ref.$ref':'groups'}}})

This works fine, but I think I have a few issues. For example how do I show a list of followers for a particular user, including pagination? I can't use skip() and limit() on an embedded document.

I could change the design and use an userfollow collection, which would do the same job of the embedded following document. The problem with this approach, which I tried, is that with the $or condition I used earlier, users following two groups containing the same user would be listed twice. To avoid this I could use group or MapReduce, which I actually did and it works, but I'd love to avoid this to keep things simpler. Maybe I just need to think out of the box. Or maybe I took the wrong approach with both tries. Anyone already had to do a similar thing and came up with a better solution?

(This is actually a follow-up to this older question of mine. I decided to post a new question to explain my new situation better; I hope it's not a problem.)

Madness answered 28/10, 2010 at 12:29 Comment(2)
My vote is on using map to write the follower list out to a temporary collectionUnderplay
I heard Map/Reduce can be slow, so I can't do it on every page load. That means follower lists wouldn't be up-to-date, so I'd prefer to avoid this solution...Madness
R
15

You have two possible ways in which a user can follow another user; either directly, or indirectly through a group, in which case the user directly follows the group. Let's begin with storing these direct relations between users and groups:

{
  _id: "userA",
  followingUsers: [ "userB", "userC" ],
  followingGroups: [ "groupX", "groupY" ]
}

Now, you'll want to be able to quickly find out which users user A is following, either directly or indirectly. To achieve this, you can denormalize the groups that user A is following. Let's say that group X and Y are defined as follows:

{
  _id: "groupX",
  members: [ "userC", "userD" ]
},
{
  _id: "groupY",
  members: [ "userD", "userE" ]
}

Based on these groups, and the direct relations user A has, you can generate subscriptions between users. The origin(s) of a subscription are stored with each subscription. For the example data the subscriptions would look like this:

// abusing exclamation mark to indicate a direct relation
{ ownerId: "userA", userId: "userB", origins: [ "!" ] },
{ ownerId: "userA", userId: "userC", origins: [ "!", "groupX" ] },
{ ownerId: "userA", userId: "userD", origins: [ "groupX", "groupY" ] },
{ ownerId: "userA", userId: "userE", origins: [ "groupY" ] }

You can generate these subscriptions pretty easily, using a map-reduce-finalize call for an individual user. If a group is updated, you only have to re-run the map-reduce for all users that are following the group and the subscriptions will be up-to-date again.

Map-reduce

The following map-reduce functions will generate the subscriptions for a single user.

map = function () {
  ownerId = this._id;

  this.followingUsers.forEach(function (userId) {
    emit({ ownerId: ownerId, userId: userId } , { origins: [ "!" ] });
  });

  this.followingGroups.forEach(function (groupId) {
    group = db.groups.findOne({ _id: groupId });

    group.members.forEach(function (userId) {
      emit({ ownerId: ownerId, userId: userId } , { origins: [ group._id ] });
    });
  });
}

reduce = function (key, values) {
  origins = [];

  values.forEach(function (value) {
    origins = origins.concat(value.origins);
  });

  return { origins: origins };
}

finalize = function (key, value) {
  db.subscriptions.update(key, { $set: { origins: value.origins }}, true);
}

You can then run the map-reduce for a single user, by specifying a query, in this case for userA.

db.users.mapReduce(map, reduce, { finalize: finalize, query: { _id: "userA" }})

A few notes:

  • You should delete the previous subscriptions of a user, before running map-reduce for that user.
  • If you update a group, you should run map-reduce for all the users that follow the group.

I should note that these map-reduce functions turned out to be more complex than what I had in mind, because MongoDB doesn't support arrays as return values of reduce functions. In theory, the functions could be much simpler, but wouldn't be compatible with MongoDB. However, this more complex solution can be used to map-reduce the entire users collection in a single call, if you ever have to.

Rhoads answered 28/10, 2010 at 14:25 Comment(15)
This sounds like a good solution, thanks. The pagination problem is still there though: I can't use skip()/limit() with embedded documents. Basically as I said in the question, I need to list all the stuff an user is following (pretty much like Twitter does).Madness
@Brainfeeder: You could store each subscription as a document in a separate collection to get around the skip/limit limitation. Then "userA" would be the ownerId for each of the subscriptions I mentioned, e.g. { ownerId: "userA", userId: "userB", origins: [ "!" ] }.Rhoads
Exactly what I was thinking. Thanks a lot!Madness
@Niels van der Rest: One thing: when should I run the Map/Reduce? Every time user A follows an user/group? Sorry, I'm still a bit confused about this as I'm new to NoSQL.Madness
@Niels van der Rest: sorry to bother you, but what do you think about this related question I wrote a while ago? #3838966 Thanks.Madness
Also, one more thing. They say Map/Reduce shouldn't be used in realtime, because of it slowness. Could this be a problem for my app? Especially in the future, when the number of users, followers and groups will hopefully be pretty high?Madness
@Brainfeeder: That's only the case for map-reduce on entire collections. But your map-reduce will only target a single user at a time. You're not map-reducing the entire users collection, but only a single document, so it shouldn't be slow. I'll update my answer with an example and have a look at your other question.Rhoads
@Brainfeeder: I've updated my answer, although it turned out to be a little more complex than what I had in mind. Regarding your other question: I agree with the posted answers. I've thought about that question before, but couldn't see any other efficient solution. In this question, map-reduce solves the need to differentiate between users and groups. But in the other question there is no such complexity, so using map-reduce wouldn't add any value :)Rhoads
Thanks for the massive answer, it all makes much more sense now. I was actually trying to write the MapReduce myself but I didn't even think I could use the finalize function to update the subscriptions collection, so... Anyway, I'm curious about why the functions are more complex than you initially thought. I'm just wondering how could they be simpler if MongoDB supported arrays as a return value? Sorry if I'm feeling kind of devoid of thoughts on this at the moment.Madness
@Brainfeeder: You're welcome. In theory, the map function should just emit the _id of the followed user as the key, and the origin (direct or group) as the value. MongoDB then merges all these origins into a single array for each key, which is exactly what we need. So for user C the merged array would be [ "!", "groupX" ]. But you aren't allowed to simply return this array from the reduce function; you can only return a single value or object. But this returned value is then added to the array as well, which makes the final result incorrect.Rhoads
@Brainfeeder: To work around this, I had to introduce an object to hold the origins array and merge the values myself in the reduce function, using the concat() function.Rhoads
@Niels van der Rest: Finally that makes sense. I'm glad I asked my last question, I learned quite a lot thinking about it.Madness
@Niels van der Rest: I just realized I also need to list all the groups an user is following. What do you think about storing relations between users and groups in another collection (thus duplicating the followingGroups embedded array)?Madness
@Brainfeeder: I can't think of a good reason why you want to save the group relations in a separate collection just yet. A query on groups based on the IDs in the followingGroups array should be fast enough, in case you need additional group info, such as a name or description. If you see that this query is forming a bottleneck, then you should think about map-reducing it into a separate collection, or duplicate the required group properties directly to the followingGroups array.Rhoads
@Niels van der Rest: Yeah, I was a bit concerned about performance, but after all it's not like this structure is set in stone. I can always adapt it in the future if it turns out to be slow with a lot of users. So I'll go with what you suggested. Thanks again.Madness

© 2022 - 2024 — McMap. All rights reserved.