What is the URL schema of Tumblr images?
Asked Answered
M

2

25

What is the schema of a image-file at Tumblr? (I don't mean HTTP) I've only figured out that the domain of the servers where images are stored is <n>.media.tumblr.com, where n is a number between 1 and 31 and the name of the image file is prefixed with "tumblr_.

I'm asking because I want to find URLs that refer to the same image.

EDIT: I'm also processing URLs from other sources, not only Tumblr.

Marchetti answered 30/5, 2013 at 9:50 Comment(0)
O
54

Overview

When you upload an image to Tumblr, multiple sizes (of the same image) are generated and stored across their network.

Once uploaded, you can use template tags to request this image at the following sizes: 75, 100, 250, 400, 500 and 1280.

It's worth mentioning the following:

  1. The value in the template tag is the maximum size the requested image will be. Example: A 400 version of an image could be anywhere between 251px and 400px wide / high.
  2. There may not be a high res or 1280 version of an image available. If the original image is 500px or less, a 1280 version isn't generated.
  3. Photosets don't produce a 100 version.

Image URL

The image URL will be either of the two below. The first URL seems to be associated with images upload more than 6 months ago (this is a guess), the second URL seems to be for newer images:

http://36.media.tumblr.com/tumblr_o4qxa0n2BP1r6ec7zo1_500.jpg

or

http://36.media.tumblr.com/83099a60d4e0cbeeb30d90394e222878/tumblr_o4qxa0n2BP1r6ec7zo1_500.jpg

URL Schema

This can be split into three parts, two variables, one constant.

  1. http://36
  2. .media.tumblr.com/83099a60d4e0cbeeb30d90394e222878/tumblr_o4qxa0n2BP1r6ec7zo1
  3. _500.jpg

1 This is a server number and can differ for each image size. AFAIK there is no guarantee that an image size will be available on all servers. @Ally mentioned in the comments you can remove this part from the URL and the image will still be found.
2 This is the Tumblr subdomain, directory (if applicable) and partial file name. This will be the same for all sizes.
3 This is the requested size (which matches the template tag) and file extension.

Generating URLs for all sizes available using template tags.

The only foolproof method I have found is to use the corresponding template tags and assign them to a data- attribute.

Example Template Code:

<img src="{PhotoURL-100}" data-250u="{PhotoURL-250}" data-400u="{PhotoURL-400}" data-500u="{PhotoURL-500}" data-1280u="{block:HighRes}{PhotoURL-HighRes}{/block:HighRes}" />

Example Rendered Code:

<img src="http://36.media.tumblr.com/83099a60d4e0cbeeb30d90394e222878/tumblr_o4qxa0n2BP1r6ec7zo1_100.jpg" data-250u="http://36.media.tumblr.com/83099a60d4e0cbeeb30d90394e222878/tumblr_o4qxa0n2BP1r6ec7zo1_250.jpg" data-400u="http://36.media.tumblr.com/83099a60d4e0cbeeb30d90394e222878/tumblr_o4qxa0n2BP1r6ec7zo1_400.jpg" data-500u="http://36.media.tumblr.com/83099a60d4e0cbeeb30d90394e222878/tumblr_o4qxa0n2BP1r6ec7zo1_500.jpg" data-1280u="http://36.media.tumblr.com/83099a60d4e0cbeeb30d90394e222878/tumblr_o4qxa0n2BP1r6ec7zo1_1280.jpg" >

With this method, you can be certain you have the correct URLs and you know what sizes are available.

Hacking all size URLs based on just one URL.

Using this information the URL would become:

http://36.media.tumblr.com/83099a60d4e0cbeeb30d90394e222878/tumblr_o4qxa0n2BP1r6ec7zo1_500.jpg

Below is a test to confirm we access all the available sizes:

You still wouldn't know if the 1280 size has been generated, but its a step closer. With this method you could replace the value (part 3) with an new size and you should be able to get the image.

Onetime answered 30/5, 2013 at 12:29 Comment(27)
Thanks for the good explanation. The problem is that I only get the URL of the image. Is something like a reverse-lookup possible?Marchetti
It is possible, but it is not fail safe. Originally I was taking the URL and replacing (part 3) with a different size. However it turned out that in some cases that image with the new size wasn't on the same server (part 1) as the previous size. You could probably do something like testing a URL for a response, if 404, increment the server number, rinse and repeat, but it is hacky. Can I ask why you can only get the img URL?Onetime
I get the image-URLs from diffrent sources, not only tubmlr. Sometimes it's just a file on a webserver. Do you think I should better force users to give the post-URL?Marchetti
Generally speaking, you don't need the number for the server. http://media.tumblr.com/... should work. If you dead set on using the server number though, I'm not sure how to grab that outside of Tumblr. I'll give it some thought.Forecourse
Well, you learn something new everyday! Thanks @Forecourse I will update my answer.Onetime
@JimmyT. I think Ally's comment may be what you need. Afaik, getting the post URL is little help as the images are detached from the post ID etc. If you can get one URL, strip the server part, you should be able to get all the sizes.Onetime
What about that directory? I've never seen that before.Marchetti
@JimmyT. sorry you lost me.Onetime
Your URLs contain this part: /c0d47ade54475ccb18a5e35a790f149d/. All URLs I've processed so far are images which are located directly on the server. Did Tubmlr changed something? A sample URL which has the format I mean (a picture I googled now): 24.media.tumblr.com/tumblr_mcu2jq7ruq1r84p84o1_500.jpgMarchetti
Now I follow. Images I pull directly from Tumblr seem to have a directory. One random Tumblr image: http://media.tumblr.com/cd2dc02de75f51490ec84a954b73c3d4/tumblr_mniv5aSwOz1rd1n1oo1_250.jpg (directly from the dashboard). If I remove the directory from the URL above, it fails. Again, as I mentioned, the schema is a mythical beast. Also the example image doesn't have a 1280 size (media.tumblr.com/tumblr_mcu2jq7ruq1r84p84o1_1280.jpg) something you would have to test for. Are your images all taken from Google Images?Onetime
@Onetime No most are taken from blogs, I just googled to get an example fast. But the pictures are often from reposts, maybe that's the cause for the missing directory?Marchetti
@JimmyT. I dug back and I can find an image with no directory: 24.media.tumblr.com/tumblr_m1c3k5kyCH1ro5vpyo1_1280.jpg (I posted this myself). This is from a photoset. It seems photoset images don't get a directory, whilst single images do? Can you test this theory?Onetime
@Onetime I tested you theory but it is not true. But I got curious and so I went through a big blog and checked the URL of the images. I came to the point that 5/6 months ago Tubmlr must have changed the system, because before that time no picture was located in a directory.Marchetti
@JimmyT. Yeah that was my gut feeling, that there was a change. I will update my answer to include the directory info, but I think the safest way is to either generate the links via the template tag. If that isn't possible Ally's solution should fit your needs. Again thanks for the help on this.Onetime
However, it seems that the directory name has no special meaning.Marchetti
Atention! I just realized, tumblr does not allow to use part1 without server number anymore. this answer just became outdated.Refutative
@DavidMabodo seems you maybe correct. I will leave it a few days (as Tumblr servers can do funny things). If it's still the case at the end of the week I will update the answer.Onetime
@Onetime I just ping you, two months later it keeps the same.Refutative
@DavidMabodo Sorry I haven't had time to test ( I wanted to include the new sizes as well), but it seems the server numbers are now requred. I will try and get a 2015 update at some point. However, feel free to edit the answer if you want to.Onetime
any way to get the original uploaded file? the resizes lose their EXIF data.Thurmond
@TomRoggero afaik, no. Once the file is uploaded, the original is no longer available. However, Tumblr provide template tags to display the EXIF data. tumblr.com/docs/en/custom_themes#photo-postsOnetime
FYI: as @DavidMabodo said, server number is still mandatory. Could you refresh the text a bit? It's not critical though. All the rest of information seems to be valid.Clamber
@Clamber Done and done. Thanks to everyone for the comments / input, made maintaining this answer easier.Onetime
Also link can have _r1 or _r2 before _<resolution>.jpg and deleting this substring sometimes does nothing, sometimes, edits md5 of image and sometimes gives Nginx error 404Arcuate
Does it mean that tumblr does not store images greater than 1200px? I would like to download the original sizes...Weiss
@Weiss I don't believe Tumblr stores or provides a link to the original file.Onetime
This answer is now outdated. New images are following a different schema.Parallelepiped
S
1

Do keep in mind that original files (in their full resolution) are stored with the '_raw' suffix, instead of _1280, _500, _250, etc.

They are usually stored on data.tumblr.com currently (their CDN domain).

The path in the URL scheme is generated from the original (raw) file's SHA1 checksum.

Smith answered 6/5, 2018 at 17:50 Comment(2)
Do you have an example of what you're saying? I'm unable to make it work... EDIT: I think it changed just yesterday...Sort
Yes, unfortunately, since two days ago Tumblr is now denying access to _raw files. One more reason to never use this trash site.Smith

© 2022 - 2024 — McMap. All rights reserved.