Storage of user data

A

5

12

When looking at how websites such as Facebook stores profile images, the URLs seem to use randomly generated value. For example, Google's Facebook page's profile picture page has the following URL:

https://scontent-lhr3-1.xx.fbcdn.net/hprofile-xft1/v/t1.0-1/p160x160/11990418_442606765926870_215300303224956260_n.png?oh=28cb5dd4717b7174eed44ca5279a2e37&oe=579938A8

However why not just organise it like so:

https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png

Clearly this would be much easier in terms of storage and simplicity. Am I missing something? Thanks.

Ambivert answered 21/3, 2016 at 23:54 Comment(2)

This may be of interest, it doesn't answer your question but it gives insight to how Facebook's CDN urls used to be constructed, and shows some of the issues with not obscuring/hashing parameters in urls. lightbluetouchpaper.org/2009/02/11/new-facebook-photo-hacks – Tannertannery 24/3, 2016 at 13:9

I recently came across this video on youtube that covers exactly that (among other things): Will YouTube Ever Run Out Of Video IDs? (I am neither the guy in that video nor am I in any way affiliated with him, I just think this is interesting to watch) – Burnisher 29/3, 2016 at 23:2

C

6

Simply put, I think it can boil down to two main reasons: Security and Cache:

Security - Adding these long unpredictable hashes prevent others from guessing photo URLs and makes it pretty hard to download photos you aren't supposed to.

Consider what would happen if I could easily guess your profile photo URL and download it, even when you explicitly chose to share it only with friends.

Cache - by adding "random" query params to each photo, you make sure each photo instance gets its own URL. Thus you can store the photo in browser's cache for a long time, knowing that whenever you replace it with a new one, the new photo will have a fresh URL and the browser won't keep showing you the old photo.

If you were to keep the same URL for each user's profile photo (e.g. https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png), and then upload a new photo, either one of these can happen:

If you stored the photo in browser's cache for a long time, the browser will keep showing you the cached version (as long as URL is the same, and cache hasn't expired, there's no need to re-download the image).
If, instead, you only keep the image in cache for short period of time, you end up hitting your server much more then actually needed, increasing the load and hurting performance.

I hope this clarifies it.

Compte answered 31/3, 2016 at 1:2 Comment(2)

+1 for cache busting. Security not so much...security through obscurity is weak, but it doesn't hurt either. – Witch 31/3, 2016 at 2:5

10x :) Regarding security - it's not about obscurity, it about needing to know a secret in order to access the resource (which is a solid concept in security, and how jsession or oauth token works). Compared to the constant URL per user, as @PSidhu suggested, it's much harder to gain access to a profile photo, unless I know the full URL with the "random" token. – Compte 31/3, 2016 at 2:17

P

7

Companies like Facebook have fairly intense CDNs. They may look like randomly generated urls but they aren't, each individual route is on purpose and programed to be handled in that manner.

They aren't after simplicity of storage like you would be if you were just using a FTP to connect to a basic marketing website server. While you may put all your images in a /images folder, Facebook is much too complex for this. Dozens of different types of applications accessing hundreds if not thousands of CDNs and servers world wide.

If you ever build a web app, such as a Ruby on Rails app, and you work with a services such as AWS (Amazon Web Services) you'll also encounter what seems like nonsensical urls. But it's all part of the fast delivery network provided within the architecture. Every time you "push" your app up to the server new urls are generated for each unique resource automatically, css files, JavaScript files, image files, etc all dynamically created. You don't have to type in each of these unique urls individually each time you publish the app, the code simply knows where to look for those as a part of the publishing process.

Example: you tell the web app to look for

//= require jquery

and it returns you http://example.com/assets/jquery-eb3e278249152b5b5d5170b73d9dbf52.js?body=1 in your header.

It doesn't matter that the url is more complex than it should be, the application recognizes it, and that's all that matters.

Porty answered 29/3, 2016 at 22:53 Comment(0)

C

6

Simply put, I think it can boil down to two main reasons: Security and Cache:

Security - Adding these long unpredictable hashes prevent others from guessing photo URLs and makes it pretty hard to download photos you aren't supposed to.

Consider what would happen if I could easily guess your profile photo URL and download it, even when you explicitly chose to share it only with friends.

Cache - by adding "random" query params to each photo, you make sure each photo instance gets its own URL. Thus you can store the photo in browser's cache for a long time, knowing that whenever you replace it with a new one, the new photo will have a fresh URL and the browser won't keep showing you the old photo.

If you were to keep the same URL for each user's profile photo (e.g. https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png), and then upload a new photo, either one of these can happen:

If you stored the photo in browser's cache for a long time, the browser will keep showing you the cached version (as long as URL is the same, and cache hasn't expired, there's no need to re-download the image).
If, instead, you only keep the image in cache for short period of time, you end up hitting your server much more then actually needed, increasing the load and hurting performance.

I hope this clarifies it.

Compte answered 31/3, 2016 at 1:2 Comment(2)

+1 for cache busting. Security not so much...security through obscurity is weak, but it doesn't hurt either. – Witch 31/3, 2016 at 2:5

10x :) Regarding security - it's not about obscurity, it about needing to know a secret in order to access the resource (which is a solid concept in security, and how jsession or oauth token works). Compared to the constant URL per user, as @PSidhu suggested, it's much harder to gain access to a profile photo, unless I know the full URL with the "random" token. – Compte 31/3, 2016 at 2:17

V

3

With your route scheme, how would you avoid strangers to access the pictures of a private account? The hash also prevent bots to downloads all the pictures.

Venireman answered 24/3, 2016 at 12:1 Comment(0)

P

2

I get your pain :-) I might not stay with describing how this problem could appear more, but rather let me speak of a solution. Well it is normal that in general code while dealing with hashed value or even base64ed value it seems likes mess to deal with, but with an identifier to explain along, it does not remain much!

I use to work in a company where we use to collate Facebook post, using Graph API get its Insights Object and extract information from it for easy passing around within UI and sending back to our Redis cache store; and once we defined a data-structure in TaffyDB how an object organization is going to look like, everything just made sense with its ability to query the useful finite from long junk looking stream of minified Javascript stream Refer: http://www.taffydb.com/

Phloem answered 30/3, 2016 at 23:17 Comment(0)

D

0

The extra values in the URL are useful to:

Track access. This is like when a newspaper appends "&homepage" vs. "&email" to an article URL, so their system knows how a reader found the page.
Avoid abuse and control access. Imagine that a user loaded a small, popular pornographic image into a profile image. They could then hijack the CDN to be a free web host for their porn site. But that code is used internally by the CDN to limit the number of views.

Derma answered 31/3, 2016 at 6:5 Comment(0)

Recommended topics

Hot tags