How do I model product ratings in the database?
Asked Answered
L

3

23

What is the best approach to storing product ratings in a database? I have in mind the following two (simplified, and assuming a MySQL db) scenarios:

Create two columns in the products table to store the number and the sum of all votes respectively. Use the columns to get an average at run time or using a query.

This approach means I only need to access one table, simplifying things.

Normalize the data by creating an additional table to store the ratings.

This isolates the ratings data into a separate table, leaving the products table to furnish data on available products. Although it would require a join or a separate query for ratings.

Which approach is best, normalised or denormalised?

Longitudinal answered 23/5, 2010 at 17:29 Comment(0)
P
43

A different table for ratings is highly recommended to keep things dynamic. Don't worry about hundreds (or thousands or tens of thousands) of entries, that's all peanuts for databases.

Suggestion:

table products

  • id
  • name
  • etc

table products_ratings

  • id
  • productId
  • rating
  • date (if needed)
  • ip (if needed, e.g. to prevent double rating)
  • etc

Retrieve all ratings for product 1234:

SELECT pr.rating
FROM products_ratings pr
INNER JOIN products p
  ON pr.productId = p.id
  AND p.id = 1234

Average rating for product 1234:

SELECT AVG(pr.rating) AS rating_average -- or ROUND(AVG(pr.rating))
FROM products_ratings pr
INNER JOIN products p
  ON pr.productId = p.id
  AND p.id = 1234;

And it's just as easy to get a list of products along with their average rating:

SELECT
  p.id, p.name, p.etc,
  AVG(pr.rating) AS rating_average
FROM products p
INNER JOIN products_ratings pr
  ON pr.productId = p.id
WHERE p.id > 10 AND p.id < 20 -- or whatever
GROUP BY p.id, p.name, p.etc;
Preen answered 23/5, 2010 at 17:49 Comment(8)
Thanks Alec, your approach makes more sense. But I also have to think about how to best integrate it with a user review. So not only can a user rate a product, but also leave a comment. I guess I can convert the ratings table into a reviews table... essentially just extend it's functionality. ThanksLongitudinal
A problem just occurred to me with this approach: If I turn this into a 'reviews' table, then chances are most users will only vote, and not necessarily add a review. This will leave a lot of empty cells in a table where a review title and review text should go. Is this an issue?Longitudinal
@Mel: With the above approach, you should also create a separate reviews table and use a join, the same way as with ratings. So your typical query would fetch the product, its ratings and its reviews.Furred
@Mel: You could create another table like Tom suggested. However, adding a rating, or adding a rating and some text are very similar things. In this case I would combine them in a single table to prevent redundancy down the road, and because there's no real advantage to split those up. Empty columns don't take up space or influence speed; it's perfectly fine (as long as they have a proper function, which is the case here). It's the same as adding a 'notes' column to the product table, but not every product might have or need a note.Preen
Sir i had the same query, i want to ask should i made the rating_id and product_id as composite key, to reduce redundancy (a user can not rate singe product again and if it rate it again the the previous query get updated instead of new insertion. so how can i doitYaker
will actually using nosql databases and having the rating list as embedded data be a good idea!? which one is better for scaling ?Aenea
I see a problem here, say for example I need to list all the products and the ratings can be 0, also meaning there is no rating for a particular product. How should we handle this ?Legionnaire
how to get a weighted average with this?Tolerant
C
6

I know that my answer is not what you actually ask for, but you might want to have a chance of facilitating that new products with your system can almost never beat the old products. Say that you would get a product with 99% rating. It would be very difficult for new products to get high if you sort by products with the highest rating.

Checkered answered 23/5, 2010 at 17:35 Comment(1)
David, I circumvent that problem by taking the average rating (voteSum/voteCount). If I decided to emphasize newest products, I can sort by release date first, and then sort by rating. But generally speaking, I'm not concerned about how old a product is.Longitudinal
D
5

Do not store a record of each rating unless you absolutely need them specifically. An example of such a case could be a psychological experiment that tends to analyze specific properties of the raters themselves. So, yeah! You'd have to be just as crazy to store each rate in a separate record.

Now, coming to the solution, add two more columns to your product table: AverageRating and RateCount. What would you store in them? Well, suppose you have an already-calculated average of the two numbers: 2 and 3, which is 2.5; having a new rate of 10, you'll multiply the average (2.5) by the rate count (2 in this case). Now, you have 5. Add this result to the new rate value (10) and divide the result by 3.

Let's cover all the above in a simple formula,

(AverageRating * RateCount + NewRateValue) / (RateCount + 1)

So (2.5 * 2 + 10) / (2 + 1) = 5.

Calculate the average on the server-side (not in your database) and store the average in the AverageRating column and the rate count in the RateCount column.

Simple, right?!

Edit

This solution doesn't require storing each rating separately as long as no review, edit or delete operations are involved. Yet, for such cases; let's assume that you've got a review with a rating of 3 that the owning user would like to modify to 4. Then, the formula to recalculate the average rating would be like this,

(AverageRating * RateCount - OldRateValue + NewRateValue) / RateCount

References

https://math.stackexchange.com/a/106314

Disruption answered 19/10, 2021 at 13:17 Comment(2)
How it will work if a user updates his ratings. eg: 3 to 2?Hellen
Letting a user update their rating requires storing a separate record of that particular rating in the first place. My solution applies to scenarios that don't involve reviewing, editing or deleting.Disruption

© 2022 - 2024 — McMap. All rights reserved.