Storing "votes" in a database
Asked Answered
L

5

14

I'm writing what will be an intranet application, and one of its features is roughly analogous to content voting - not unlike what SO, Amazon, and many other sites do.

Assuming each votable piece of content has a unique ID, and each user (they're authenticated) has a unique ID, the easiest way would seem to be to have a "votes" table...

ContentID int
UserID int
VoteValue int

But this creates one row per vote - with millions of pieces of content and tens of thousands of users, that table's gonna be huge huge huge. Is this the best way to do it? I mean, if an int takes 4 bytes, each row takes 12 bytes. If a million pieces of content get a hundred votes, that's 400MB+ in storage, yeah? Seems... like a lot :). Even if the VoteValue is a tinyint (which is probably fine) and only 1 byte, that's still a couple hundred megabytes in the table. I mean sheesh.

Is there a smarter way? Should I store this "votes" table in a separate database (ignoring potential data integrity issues) to partition it from the "main" data in terms of storage and performance?

(I do realize that in today's world 400MB ain't a ton - but it seems like a LOT just to store votes, yeah?)

Leapfrog answered 5/12, 2008 at 1:12 Comment(0)
C
7

Personally as long as you have good indexes in place, you are going about it the right way. Depending on your usage, for performance you might try to avoid hitting the votes table by storing secondary count information, but overall if you must track WHO has voted something, you need to do it in the way you have listed.

I wouldn't bother moving to another database, if you are REALLY concerned in SQL Server you could create a separate filegroup to hold it.....but most likely not necessary.

Choline answered 5/12, 2008 at 1:15 Comment(0)
M
11

Well, yes but you need to look at the bigger picture. With a million pieces of CONTENT:

(Size of Content) >> (Size of Votes) : where ">>" means "much greater."

If you have a million pieces of content then that might be a terabyte of data where as the votes are 400MB. Big deal right?

I would also add, if you are worried about scalability, check out this blog:

http://highscalability.com/

Magenmagena answered 5/12, 2008 at 1:19 Comment(0)
C
7

Personally as long as you have good indexes in place, you are going about it the right way. Depending on your usage, for performance you might try to avoid hitting the votes table by storing secondary count information, but overall if you must track WHO has voted something, you need to do it in the way you have listed.

I wouldn't bother moving to another database, if you are REALLY concerned in SQL Server you could create a separate filegroup to hold it.....but most likely not necessary.

Choline answered 5/12, 2008 at 1:15 Comment(0)
H
4

If you need to track whether a user has voted for a particular item, and if there are different values of vote (so 1 star to 5 stars, for example), then this is about as compact as it gets.

Don't forget that for sensible access speeds, you'll need to index the data (two indexes, probably - one with ContentID as the leading column, one with userID as the leading column).

You'll need to decide whether there is a reason not to store the table separately from other tables. What this means depends on the DBMS you use - with Informix, the table would be in the same database but stored in a different dbspace, and you might have the indexes stored in two other different dbspaces.

Haymow answered 5/12, 2008 at 1:19 Comment(0)
A
4

You will probably also want the ID of the author of the content in the table, for easier detection of voting abuse. (Yes, this is presumably redundant information. An alternative is regularly building a summary table to see who is voting on whom.)

For what it's worth, the perlmonks vote table looks like this:

 `vote_id` int(11) NOT NULL default '0',
 `voter_user` int(11) NOT NULL default '0',
 `voted_user` int(11) default NULL,
 `weight` int(11) NOT NULL default '0',
 `votetime` datetime NOT NULL default '0000-00-00 00:00:00',
 `ip` varchar(16) default NULL,
 PRIMARY KEY  (`vote_id`,`voter_user`),
 KEY `voter_user_idx` (`voter_user`,`votetime`),
 KEY `voted_user_idx` (`voted_user`,`votetime`)

(vote_id is the content id, ip is an IP address.)

Amos answered 5/12, 2008 at 3:25 Comment(0)
L
0

I'd say you need to figure out how these votes will be used and design specific queries for your data model first. This is not necessarily SQL model. If you're coming from SQL world, passing through official MongoDB tutorial helps to clear the mind for the start.

For example, if you only need to store and display votes for a single issue page, it may be convenient to store votes in a single string field of the issue, which will look like id1:id2:id3:. Assuming all ids are of the same length, there are some interesting properties:

  1. Count all votes for the issue:

    len(issue.votes)/len(id)

  2. Find I voted on the issue

    myid in issue.votes

  3. Find all issues you voted on:

    select issue.id from issues where issue.votes contains(myid)

  4. Find the most voted issues

    select issue.id from issues order by len(issue.votes) desc limit 10

This architecture allows to avoid expensive calculations on read in these specific cases, but updating issue.votes on vote might be more expensive than adding a row in a table. In this case 100 votes with 4 bytes per id + separator is 500 bytes string. In your proposed variant 100 votes are 800 bytes.

Disclaimer: I never implemented anything like this, it is just an idea.

Laborer answered 26/6, 2013 at 8:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.