Sorting A List Of Songs By Popularity
Asked Answered
M

2

9

For student council this year, I'm on the "songs" committee, we pick the songs. Unfortunately, the kids at the dances always end up hating some of the stupid song choices. I thought I could make it different this year. Last thursday, I created a simple PHP application so kids could submit songs into the database, supplying a song name, artist, and genre (from a drop-down). I also implemented a voting feature similar to Reddit's. Click an upvote button, you've upvoted the song, incremented the upvote count. Same with downvotes.

Anywho, in the database, I have three tidbits of information I thought I could use to rate these songs, upvotes, downvotes, and a timestamp. For a while, the rank was created by simply having the songs with the higher "vote" count at the top. That is, the more upvotes, less downvotes (upvotes - downvotes) would be at the top of the list. That worked, for a while, but there were about 75 songs on the list by Sunday, and the songs that were submitted first were simply at the top of the list.

Sunday, I changed the rank algorithm to (upvotes - downvotes) / (CurrentTimestamp - CreationTimestamp), that is, the higher the vote count in the lesser amount of time, the higher the song would be on the list. This works, better, but still not how i'd like it.

What happens now, is that the instant a song is created and upvoted to a vote count of 1, it ends up at the top of the list somewhere. Songs who have vote counts in the negatives aren't viewed often because kids don't usually scroll to the bottom.

I guess I could sort the data so the lower songs appear at the top, so people are forced to see the lower songs. Honestly, I've never had to work on a "popularity" algorithm before, so, what are your thoughts?

Website's at http://www.songs.taphappysoftware.com - I don't know if I should put this here or not, might cause some unwanted songs at the dance :0

Mislike answered 14/9, 2010 at 1:16 Comment(4)
I think, the algorithm is fine. The problem is the UI. There are too many unnecessary information about a song. Why don't just put the rank then the song with a + and - sign beside it. present them in a tile view (flikr). With that, you show more songs without scrolling. You can also make the title and the instruction box a lot smaller and just give them a more attention-catching color to compensate for its size.Unijugate
I stand corrected. The ranking algo by @David Johnstone is better.Unijugate
Don't trust an algorithm with this. Use your own taste (you are on the songs committee after all). Or hire a DJ. Enough computer-generated playlists on the radio already.Clovis
Well, the songs committee has about 8 kids on it, to represent a high-school of about 1600 kids. I'm posting on Stack-Overflow, I'm obviously not a great dancer, and my music tastes aren't probably representative of the entire school. - We're putting ads up around the school for the website, and we're running a video announcement the next two Fridays. The website is only to get a good idea of what people want, that way the committee isn't completely shooting blind.Mislike
S
6

That's a very good question. There are a few similar questions that have been asked here.

This article is probably a good place to start. Apparently upvotes minus downvotes is a bad way to do it. The better way is to use complicated maths to assign a score to each and sort by that.

Here is a scoring function in Ruby from the article:

require 'statistics2'

def ci_lower_bound(pos, n, power)
    if n == 0
        return 0
    end
    z = Statistics2.pnormaldist(1-power/2)
    phat = 1.0*pos/n
    (phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end

pos is the number of positive rating, n is the total number of ratings, and power refers to the statistical power: pick 0.10 to have a 95% chance that your lower bound is correct, 0.05 to have a 97.5% chance, etc.

As a usability thing, I would sort the data by the score, but I would not show the score to the user. I would only show the number of upvotes and downvotes.

Spellman answered 14/9, 2010 at 1:23 Comment(22)
Yeah, the scores come out to be some crazy decimals, currently. I only show votes, and then if you hover over a table row the (+/-) count un-hides.Mislike
@Matt Egan: I wouldn't show the upvotes minus downvotes at all as it is doesn't help anything. Also, your rankings don't look quite right at the moment, as "Hey Leonardo" (+1/-8) is ranked above a few +1/-2 songs.Spellman
I haven't implemented this popularity rating yet. I'm about to lie down into bed, I'm going to look over some of my Statistics notes from Sophomore year so I can actually comprehend what I'm doing here. I like to understand what I'm doing. - Also, now that you point that out, I don't think this whole "time" rating is working too well either :/Mislike
What should I/he use for power? I'm not exactly sure what it means.Zandra
Using time to boost the ranking of new songs is not a bad idea (although you'd want to think carefully about what sort of mathematical function you would use) — you could make a useful "Sort by what's hot" using this in conjunction with the actual score (calculated using the lower bound of a confidence interval, as described in the article). I would love to see how this turns out.Spellman
@Justin L.: The short answer is to choose a desired confidence and use power = (100% - confidence) * 2, i.e., power = (100% - 95%) * 2 = 0.10. Understand that the equation is calculating the lower bound of what the true score of the item/song is. The long answer is to read through the Wikipedia page and related pages, but I can't help too much because statistics is not my forté.Spellman
Actually, if you want a "hotness" ranking, this is worth a read: evanmiller.org/rank-hotness-with-newtons-law-of-cooling.htmlSpellman
Yeah, I'm seriously wishing I had remembered my statistics unit from my mathematics class last year. I'm about to go talk to my teacher from last year between class changes, she might find it interesting as well. - Don't worry I'm working on it, I won't let you guys down :DMislike
Ah, I'm reading this article about Newton's Law Of Cooling, and I remember covering this next year. This seems like it would be nice to implement, and a little less intensive than Wilson's Score interval. - One question on the Newton's cooling, if the process of upvoting increases a song's "temperature" should downvoting decrease the temperature (I'm thinking by a smaller amount, e.g. weigh upvotes more)?Mislike
@Matt Egan: The exponential decay "hotness" algorithm is certainly simpler than the Wilson's score, although they do different things. In the end, you will still need to be able to determine how good each song is (to be fair, you don't need a computer to work this out). Implementing it shouldn't be too hard, especially if you have a stats package like Ruby has in the example. Re downvoting and temperature: I was thinking about that myself, and I think they should both increase "hotness" (maybe upvotes +10, downvotes +5?)Spellman
What's your reasoning behind having downvotes increase hotness?Mislike
@Matt Egan: I was thinking "hotness" should be a measure of how actively people are interacting with it, and not be a measure of how good it is. But it depends on how you want it to behave. For example, what do you want to happen if heaps of people all vote up? Or heaps of people all vote down? Or vote half-half? Or 75% up? Or 75% down? Your call :-) Either way, you can implement it now, and tweak the numbers as you see fit.Spellman
The only problem with the hotness, is that from the start, I wasn't recording temperature, I'm trying to come up with something else... Any ideas?Mislike
@Matt Egan: That's a good point. If you haven't been logging when each vote was made since the start then you would probably have to fudge starting temperatures — if you want to go down that path. Anyway, I think that using the Wilson's score would be best. At least this gives the best possible ranking by popularity. Protip: you only need to understand the inner workings of things well enough to implement them :-)Spellman
Thanks David! I'm still working on it, I'm just a person who likes to understand how things work. - This is also SUPER interesting/exciting, I've never had such a dataset to work with on such a "real" population.Mislike
@Matt Egan: By the way, here's a PHP implementation (including an implementation of Ruby's Statistics2.pnormaldist): derivante.com/2009/09/01/php-content-rating-confidenceSpellman
Thanks, I figured I didn't need am implementation of the pnormaldist, as I could choose a confidence interval and calculate the normal distribution using my ole trusty TI-84. - One more thing I just realized, the Wilson score doesn't factor time into this equation...Mislike
And, as a bonus, that article also contains code for how you can boost new items (based solely on age), which sounds exactly what you'd want.Spellman
Ah, looks like this link has a section about that precise thing :PMislike
Well, I got it working, for your viewing at : songs.taphappysoftware.com/index.php/songs/diagnostics - I'll have it working on the main table in a moment.Mislike
@Matt Egan: "Rank is determined by a complicated process involving confidence intervals and gravity over time." I like it! May I suggest not displaying the net number of votes, and always display the number of upvotes and downvotes? The net number of votes isn't a meaningful metric.Spellman
Done, done and done-er. Thanks for all your help David, it's be tremendous! Although, I am thinking about going back to a more understandable positive votes over time, except weighing positive votes more than downvotes... - We'll see :PMislike
E
0

How about sorting songs by posting time or number of votes (negative + positive)? If your goal is to give every song equal attention, this sounds good enough.

Extrorse answered 14/9, 2010 at 1:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.