I'm trying to model a voting system in MongoDB. You could imagine it as a voting system similar to reddit. Requirements:
- Votes are connected to objects
- It is very fast to check if a user has voted on an object. The application will need to know if the logged in user has voted on an object while it loops through a list of objects rendering vote buttons.
- Most importantly, it must be able to retrieve objects ordered by their aggregate scores over a given time period (last hour, day, month, etc) with reasonable performance.
- Should be able to support thousands of votes per object.
I see two approaches here (correct me if I'm wrong!):
- Embed an array of vote documents in each object. I'd probably store the ObjectId of the user that voted, the vote amount, and the vote time. The voterId would be the key for each embedded vote document in the votes array to allow for a quick hash lookup.
- Keep a separate votes collection with votes that reference objects.
I've also played with the idea of embedding votes into 'buckets' grouped by hour in a separate collection.
No. 1 would be very fast for requirement No. 2 but I don't know if requirement No. 3 is even possible in this scenario.
No. 2 would be slower for requirement No. 2 and I'm not sure what the performance would be like for requirement No. 3 / how it would be achieved (map reduce?).
Basically it seems like I need to start with a reasonably fast solution for requirement No. 3, and then make sure that requirement No 2 is not too slow. Ideas?
Potential Solution
Use embedded method. Add a parameter to each object for hourly-score, daily-score, monthly-score, etc. Add another boolean parameter recently-voted, recent-hourly, and recent-daily. Create a script that runs a map-reduce on objects to calculate and update these parameters.
The script would be run via cron in three variations.
- 10 minute interval: Calculate hourly-score for objects with a previous hourly-score > 0 OR objects that have recently-voted = true. Set recently-voted = false after running this script. Set recent-hourly = true.
- 3 hour interval: calculate daily-score for any objects that have recent-hourly = true. Set recent-hourly = false. Set recent-daily = true.
- 24 hour interval: calculate monthly-score for any objects that have recent-daily = true. Set recent-daily = false.
The idea is to minimize unnecessary processing on objects that aren't relevant to the score calculation script being run (hourly should only be run on objects that have been voted on since the last time hourly was run, or objects that have not been voted on and need to be reset to 0). Another nice benefit is the *-score values don't just have to be calculated based on the object votes. You could include page views for example, or whatever. Thoughts on this approach?