I have a website with 500k users (running on sql server 2008). I want to now include activity streams of users and their friends. After testing a few things on SQL Server it becomes apparent that RDMS is not a good choice for this kind of feature. it's slow (even when I heavily de-normalized my data). So after looking at other NoSQL solutions, I've figured that I can use MongoDB for this. I'll be following data structure based on activitystrea.ms json specifications for activity stream So my question is: what would be the best schema design for activity stream in MongoDB (with this many users you can pretty much predict that it will be very heavy on writes, hence my choice of MongoDB - it has great "writes" performance. I've thought about 3 types of structures, please tell me if this makes sense or I should use other schema patterns.
1 - Store each activity with all friends/followers in this pattern:
{ _id:'activ123', actor:{ id:person1 }, verb:'follow', object:{ objecttype:'person', id:'person2' }, updatedon:Date(), consumers:[ person3, person4, person5, person6, ... so on ] }
2 - Second design: Collection name- activity_stream_fanout
{ _id:'activ_fanout_123', personId:person3, activities:[ { _id:'activ123', actor:{ id:person1 }, verb:'follow', object:{ objecttype:'person', id:'person2' }, updatedon:Date(), } ],[ //activity feed 2 ] }
3 - This approach would be to store the activity items in one collection, and the consumers in another. In activities, you might have a document like:
{ _id: "123", actor: { person: "UserABC" }, verb: "follow", object: { person: "someone_else" }, updatedOn: Date(...) }
And then, for followers, I would have the following "notifications" documents:
{ activityId: "123", consumer: "someguy", updatedOn: Date(...) } { activityId: "123", consumer: "otherguy", updatedOn: Date(...) } { activityId: "123", consumer: "thirdguy", updatedOn: Date(...) }
Your answers are greatly appreciated.