About Youtube views count
Asked Answered
A

3

6

I'm implementing an app that keeps track of how many times a post is viewed. But I'd like to keep a 'smart' way of keeping track. This means, I don't want to increase the view counter just because a user refreshes his browser.

So I decided to only increase the view counter if IP and user agent (browser) are unique. Which is working so far.

But then I thought. If Youtube, is doing it this way, and they have several videos with thousands or even millions of views. This would mean that their views table in the database would be overly populated with IP's and user agents....

Which brings me to the assumption that their video table has a counter cache for views (i.e. views_count). This means, when a user clicks on a video, the IP and user agent is stored. Plus, the counter cache column in the video table is increased.

Every time a video is clicked. Youtube would need to query the views table and count the number of entries. Won't this affect performance drastically?

Is this how they do it? Or is there a better way?

Atchison answered 28/9, 2011 at 19:16 Comment(0)
P
2

First of all, afaik, youtube uses BigTable, so do not worry about querying the count, we don't know the exact structure of the database anyway.

Assuming that you are on a relational model, create a column view_count, but do not update it on every refresh. Record the visists and periodically update the cache.

Also, you can generate hash from IP, browser, date and any other information you are using to detect if this is an unique view, and do not store the whole data.

Also, you can use session/cookie to record the view being viewed. Since it will expire, it won't be such memory problem - I don't believe anyone is viewing thousand of videos in one session

Proportion answered 28/9, 2011 at 20:2 Comment(5)
So you are suggesting, if I keep a record of all visits in a table in the db, it shouldn't be a problem? Even if I have millions of rows?Atchison
I suggest not to keep all of the records, but periodically aggregate and delete them.Proportion
So basically have some kind of cron job in the background to delete any view records that are later than 24 hours?Atchison
Exactly. You can also use mysql-backed memcached, since the increase operation in memcached is atomic, and loosing a visit or two usually is not critical.Proportion
Sessions sounds like a good idea. But how would I prevent bots, crawlers, etc from randomly increasing the count?Atchison
F
2

I would leverage client side browser fingerprinting to uniquely identify view counts. This library seems to be getting significant traction:

https://github.com/Valve/fingerprintJS

I would also recommend using Redis for anything to do with counts. It's atomic increment commands are easy to use and guarantee your counts never get messed up via race conditions.

This would be the command you would want to use for incrementing your counters:

http://redis.io/commands/incr

The key in this case would be the browser fingerprint hash sent to you from the client. You could then have a Redis "set" that would contain a list of all browser fingerprints known to be associated with a given user_id (the key for the set would be the user_id).

Finally, if you really need to, you run a cron job or other async process that dumps the view counts for each user into your counter cache field for your relational database.

You could also take the approach where you store user_id, browser fingerprint, and timestamps in a relational database (mysql?) and counter cache them into your user table periodically (probably via cron).

Figueroa answered 5/9, 2013 at 9:4 Comment(0)
S
1

If you want to store all the IP's and browsers, then make sure you have enough DB storage space, add an index and that's it. If not, then you can use the rails session to store the list of videos that a user has visited, and only increment the view_count attribute of a video when he's visiting a new video.

Splenetic answered 28/9, 2011 at 19:46 Comment(3)
With the latter. Wouldn't that reach the memory limit of how much a session can store per user? Imagine a user viewing several thousand or more videosAtchison
I wouldn't worry about that. You would store a hash of ints (video IDs), which are 8 bytes worst case. 1000 * 8 =~ 8KB, which is nothing in my opinion :)Splenetic
Besides, it's not too common for a user to watch 1000+ videos in the same session.Splenetic

© 2022 - 2024 — McMap. All rights reserved.