people · sounds · ideas · resume
search · pictures · bits · times  
some interesting words: aphorism, proboscis, jeremiad, pariah, agronomist, felicitation  
  the blog
hot or not  · ratings ·  comments 

Two Variable Voting System

Once upon a time, I was asked to create a database of songs for a website. I figured out a good method of storing those - no problem. Then came the peculiar issue of listing all the songs. What do you do when people want to list the songs in the order of how good they are?

I want to be able to know what the best songs on the system are. So I will clearly need to have a rating scale defined (0-100 for now) and a table to store these votes in. The table would look something like: songid, rating, tracking-value (user id, ip, time, whatever). On a busy site this table would grow massively large in no time.

But -- I don't want to keep track of every single vote that comes to each and every song. I want to store only two variables, the number of ratings this song has received, and the total of all the ratings.

"SONG" table from database
ID C_ID Name Length Format URL Votes Ratings
1 1 Song One 2:31 audio/mp3 http://.../songone.mp3 10 1000
2 1 Song Two 4:32 audio/mp3 http://.../songtwo.mp3 3 300
3 1 Song Three 6:12 audio/mp3 http://.../songthree.mp3 90 8800
4 1 Song Four 3:21 audio/mp3 http://.../songfour.mp3 30 1000
5 1 Song Five 2:12 audio/mp3 http://.../songfive.mp3 1030 1030
· C_ID or ContainerID points to an ID in another table which could be the data for the Album, or perhaps a Webcast - whatever.
· Song One has been voted on by 10 people. They all ranked this song at a perfect 100. So Ratings = 10*100, or 1000.
· Song Four has been voted on 30 times. Each person voted differently on this song, but in the end, the 1000 means the average voter ranked Song Four as a 33.3.
· Everyone hates Song Five. Somehow it became very popular voting item. Maybe people have a bad songs mailing list? Perhaps a few people liked it, but enough people ranked it a 0 that the average ranking fell to 1.

Now we want to order all of these songs by their ratings (best to worst). Here are some ways to do this, and the reasons why I accept or reject them:

order byReasoningResulting Order
R(atings)This is just stupid. A highly seen "popular" song - which thousands hated, made it to slot 2!! That crappy song beat out Song One which atleast 10 people heard and gave a perfect score. Popular does not mean good.

Notice how a song with alot of weak votes will take out a song with great votes - just not enough of them.
3 (8800)
5 (1030)
1 (1000)
4 (1000)
2 (300)
V(otes)This is even worse. Now the crappiest song in all of creation is on the top of our charts!5 (1030)
3 (90)
4 (30)
1 (10)
2 (3)
    Ratings    
Votes
Okay, now we are using common sense... Wrong!! That's what you get for using common sense. The big problem here is that if we have 2 songs that everyone voted a perfect 100 on, but one of the songs had 100 more votes, this ordering value won't reflect it.

So why not just do a secondary sort by number of votes? That will not help, you will have songs with 1 vote of 100 showing up before songs with 200 votes that were all 100 and one vote that was 99. This just won't work.

Don't forget, you have to ignore a song with no ratings or votes because you would get a division by zero error.
1 (100)
2 (100)
3 (97.7)
4 (33.3)
5 (1)
R · (int(  V  ) +1)
V         100

A popularity scaling average. This is much smoother, but still a bit lumpy. This method plays averaging but favors the popular. It winds up grouping all songs with a number of votes in the same 100 ballpark. It still suffers from the basic averaging problem. All songs (in the same 100 votes ballpark) that have been given the same score over and over again - wether it received 10 or 99 votes, will be treated the same.

We also have a problem in that 100 votes of 70 will beat 80 votes of 100. Also a song of 90 votes, each 88 points, will be beaten by a song with exactly 12 votes, each worth 90 points. I assume that that we would prefer to see the "more sure" rating dominate the less resolute one.

1 (100)
2 (100)
3 (97.7)
4 (33.3)
5 (11)
    Ratings    
Votes + 1

A biased average. Great!!! This eliminates the problem of two songs ever being equal (unless they really have been rated that way). A song with 100 votes that were all for 100 points will have an ordering score of 99.0099. While a song with 10 votes, each for 100 points, will have a score of 90.90.

If we really wanted to, we could take a lesson from the popularity scaling average and apply that here. But I don't care about popularity for now.

3 (96.7033~)
1 (90.90)
2 (75)
4 (32.2691~)
5 (0.9990~)

This last way (the biased average) is the best way that I have found to rank items that don't have the same number of votes. It is what I use for the photos on my site. As described above, I only keep track of two values, the total number of votes, and the total of all the votes. I have had this system reffered to as "Comparing Apples with Oranges" because the number of votes for each item is not the same. For this reason, this method of biased-averaging allows you to rank any thing which you have collected some votes for against any other collection. Be sure to scale the values of the votes out first (if one system uses 0-10 voting, and the other 0-100 - multiply the first sets values by 10).

I just wanted to put all this down so that maybe someone searching on this topic will get a little help from my efforts. This really does seam like a simple solution - but somehow it wasn't obvious to me or the people that I asked about it. It just sort of came to me one day, when I slipped off the toilet while changing a light bulb (not really).

Let me know if you have a better way of doing this. If you have found something that proves this not to be the best possible way to rank items with a different number of votes, throw it at me.

Thanks for listening.

Add a Comment

Solve the math to post!

Visitor Comments

2008 Jan 22 Marques
I couldn't find any results on google, so I thought I'd add my own: The sets are ordered by sum/(count+1)
2008 Jan 22 Marques
aka Order by Ranking aka Rank by Stars
2008 Feb 19 Marques
Another explanation can be found here: wiki.answers.com IMDB / Bayesian Ratings question. Or at the bottom of the IMDB Top 250 page.