Scoring algorithms: how to convert the number & % of "Likes" & "Dislikes" into a single score?
Asked Answered
P

3

9

I have a website where users can "Like" and "Dislike" items.

So for each item, I have data such as the total number of "Likes" and the % of total votes that are "Likes".

I'd like to calculate just a single score to show to users. Using just % wouldn't work because even though item_A might have a 90% of "Likes" while item_B might have a 80% of "Likes", item_B should still rank in front of item_A if item_B has 10,000 total votes while item_A only has 1,000 total votes.

Likewise using just total "Likes" wouldn't work because while an item might have a large number of "Likes" it shouldn't be ranked very high if the % of "Likes" is low.

What would be a good algorithm to create a single score out of the data above?

Ideally the score should be "meaningful" or "normalized" in some way. For example if I go to IMDB and I see that a movie has a score of 8/10, I'd immediately know that it is a good movie. On the other hand if I see a score of 1,370 I wouldn't necessarily know if that is good or bad.

Pulverable answered 2/12, 2010 at 2:15 Comment(2)
An algorithm you are trying to describe is not so simple to implement :) In the first stage of the project I would simply implement the simple 'percentage algorithm' and keep an close eye at the result. It is fairly simple (knowing some programming basics) to develop the algorithm accordingly. I do believe there is no uniform answer to your question (unfortunately)Abert
en.wikipedia.org/wiki/Bayesian_averageSolleret
F
10

There's a couple of very good articles on how Reddit does this sort of ranking here, and here. In a nutshell, rank posts by the lower end of the 90% confidence interval of their scores. Entries with fewer votes have larger confidence intervals, and hence tend to rank lower than entries with more votes but the same average.

Footworn answered 2/12, 2010 at 2:40 Comment(2)
But then the problem is how to calculate the confidence interval. Do you use the standard deviation of the sample, or the standard deviation of the entire set of votes, or some kind of their weighted average, or an arbitrary number? What is the confidence interval of the score of an item that has 10 "yes" votes and 0 "no" votes?Solleret
@user434507 The formula for calculating the confidence interval is in the article.Footworn
C
10

Bayesian Rating is a perfect fit for what you want to do. It takes care of the fewer votes but higher rating issue.

Bayesian Rating is using the Bayesian Average. This is a mathematical term that calculates a rating of an item based on the “believability” of the votes. The greater the certainty based on the number of votes, the more the Bayesian rating approximates the plain, unweighted rating. When there are very few votes, the bayesian rating of an item will be closer to the average rating of all items.

Use this equation:

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

Legend:

avg_num_votes: The average number of votes of all items that have num_votes>0
avg_rating: The average rating of each item (again, of those that have num_votes>0)
this_num_votes: number of votes for this item
this_rating: the rating of this item

Note: avg_num_votes is used as the “magic” weight in this formula. The higher this value, the more votes it takes to influence the bayesian rating value.

You can read more here

Crine answered 2/12, 2010 at 20:41 Comment(0)
D
1

Perhaps you can use a percentage based stat, but then color it according to volume? e.g. red/orange/yellow for highest number of interest, blue/green/purple for lowest interest, and then allow the user to sort according to percentage or color.

Downwards answered 2/12, 2010 at 2:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.