Prevent image hotlinking in Google Image Search
Asked Answered
I

4

11

Just recently, Google has introduced a new interface of their Image Search. From January 25 2013 on, full size images are shown directly inside Google, without sending visitors to the source site. I came across a site, that apparently has developed a sophisticated approach to prevent users from grabbing images from Google by introducing some sort of watermark dynamically. To see this, please search on the new Google Image Search interface for images by "fansshare.com". This link should be working: Google Image Search. If not, simply enter "site:fansshare.com" in Google search input filed. Be sure to be on the new search interface, though.

How does fansshare.com achieve this? I couldn't figure it out ...

Update:

fansshare.com adds a GET param to all of their image URLs, like ?rnd=69. Example image URL: http://fansshare.com/media/content/570_Jessica-Biel-talks-Kate-Beckinsale-Total-Recall-fight-5423.jpg?rnd=62

This image URL works for a few calls or seconds, after which a redirect takes place to a cached, watermarked image: http://fansshare.com/cached/?version=media/content/570_Jessica-Biel-talks-Kate-Beckinsale-Total-Recall-fight-5423.jpg&rnd=5810

Edit:

We have finally managed to fully mimic FansShare's hotlink protection and we've published our findings in the following, extensive blog post:

http://pixabay.com/en/blog/posts/hotlinking-protection-and-watermarking-for-google-32/

Icecap answered 10/2, 2013 at 8:31 Comment(6)
When someone requests an image, they send some information like what their referrer is, their web agent and so on. Google does all this too when they request images on your site, so they can index them in Google Image Search. So, wait for GIS to index your image, check the log for what their message was and then you can write code that checks for that and returns a watermarked version of the image :)Marva
Seems to be way more behind this: You don't see the watermark in Google's thumbnails. It only appears when enlarging the image on Google. I'll update my question with some more info now ...Icecap
Perhaps it makes two queries, one for the thumbnail one for the enlarged version? Or perhaps the thumbnail is so shrunk that you don't notice the watermark anymore?Marva
No, the thumbnail is certainly not the same. Take a look, it's the "ugly type" of watermark - not unobtrusive :-) And you cannot tell Google to take another image for its thumb. They create the thumbnail on their own.Icecap
@Patashu, from my tests the referer rarely is passed on the big image request. Probably, because Google does a lot of stuff on click, other images are loaded as well. One possibility for that may be that the referer is attached only on the first request after the click, and not the subsequent ones.Commixture
fansshare is mainly working with GET params (?rnd=) ... they give the images some sort of expiration date. It's strange, however, how they are able to determine whether it's best to redirect to a watermarked image or to the page of origin itself. Great work in any case and just what we need to kick Google in it's a.. :-DIcecap
C
6

There is a solution but just like other solutions it's up to Google to intepret it as cloaking and ban at their will. This is a long one and probably will need further tinkering to work for your case. (Sorry in advance for the length)

Setup

For the sake of the example, let's just say that:

  • site: www.thesite.com and
  • ImageURL base: images.thesite.com

(but ImageURL base could easily be www.thesites.com/wp-content/uploads)

Target

Our target is to make it so, (1) the full-size image is shown only with a watermark/overlay if it's requested from google images search and (2) don't break previously working stuff.

Solution

So the theoretical solution is the following.

1) Check the User-Agent and if it contains Googlebot then serve the "trap" URL. The trap URL is your current image URL but slightly changed so you can treat it differently, so instead of the current normal:

http://images.thesite.com/wallpapers/awesome.jpg

you should print for Googlebots:

http://cacheimages.thesite.com/wallpapers/awesome.jpg

(where cacheimages is anything you want)

2) Now the main dish; you should be able to target the requests to http://cacheimages.thesite.com/ and have a script that acts like following:

 If the request comes from a bot (check user-agent headers)
     Then serve the normal image without watermark
 Else (if the request seems to be from a normal user)
     Then check the referer: If it's from google (but NOT http://www.google.com/blank.html)
          Redirect to the Post of the image (Note 1.)
     Else if the refer is your site
          Show the raw normal image
     Else (any other referer, including http://www.google.com/blank.html)
          Show watermarked image (Note 2.)

Note 1: This will happen when people click "View original image" or the image itself

Note 2: This will happen when people try to see the full-size image from the google image search results (and if they somehow arrive to the trap url of an image)

3) You could HTTP redirect the old images to the new ImageURL base if the user-agent is Googlebots so the overlay/watermark trick starts working on old images faster (or even use Google Webmaster Tools if you use subdomains for images) and you are sure to preserve the SEO juice.

Further actions

You could do more changes if you want to be serious.

  1. Instead of showing the watermarked image redirect to more dynamic url http://cacheimages.thesite.com/preview?p=/wallpapers/awesome.jpg&r=23535 or the more modern use of HTTP headers for no indexing: X-Robots-Tag: noindex
  2. Of course cache the watermarked images
  3. Check the Accept http headers for cases that I haven't thought and serve image or redirect image post accordingly.

Note

You may also have to think about international traffic so instead of google.com you want to check for google.[a-z-\.]+/

Conclusion

This could be adapted to any system, I made it for one that has images on a subdomain, so it probably won't be exactly the same for other systems like wordpress etc. Also, I am sure Google will do a change on their image search in the following couple months to fix this issue.

An untested sample implementation of the idea can be found on Github.

Disclaimers

This hasn't been tested thoroughly and you could get banned, it's merely provided for research and educational purposes. I cannot be held responsible for any damages etc.

Commixture answered 13/2, 2013 at 18:13 Comment(10)
+1 Very interesting solution. Personally, I prefer delivering the same content to Google as to anybody else - simply to prevent getting banned. I'm currently working on a similar solution, in which instead of using a different trap URL, a GET param is appended to the image. The param contains the current timestamp and after a few minutes, a watermarked version of it is being served under this URL. By that, we could use your solution, however with the same URL for everyone. But the URL param changes on each opening of the page ... I'll look more into fansshare.com's solution today ...Icecap
I marked this as correct answer, but it probably is dangerous. Thanks for the lengthy explanation!Icecap
As far as I can see, depending on the individual post, fansshare.com uses both, a CDN subdomain AND a rather random GET param. They both do the trick. So basically it should be enough to append some sort of "timed GET param" for applying your solution, while showing the same URL to bots and humans. This could reduce the risk of getting banned for cloaking. However, due to the GET param, the URL of the images change periodically. A redirect to the original image could fix the issue, though - enabling caching in browsers and and preventing confusion for search engines.Icecap
Sev, isn't the referrer for the inside Image Search hotlinked image "google.com", as well? How can you distinguish between this request and the one caused by clicking on "Original image"?Icecap
About the hotlink image requests: When you search with SSL (HTTPS), there will be no referers attached on the request. When you are on plain HTTP google will attach a special page as a referer for security (http://www.google.com/blank.html). On "Original Image" you get the real referer; http://www.google.com/ (with no /blank.html)Commixture
Perfect! Thanks!! Additionally, I might add it's required to prevent all caching of watermarked images in the client's browser. Otherwise, clicking "View original image" will only fetch the watermarked image from the browser's cache. We've already successfully implemented it with minor adaptions on pixabay.com.Icecap
Here's an extensive description on how it's been done on Pixabay: pixabay.com/en/blog/posts/…Icecap
@Nasmon I didn't see this in your nginx examples (unless it's right under my nose) but how do you not show the ?i GET parameter to the search bots?Renfro
i just tried to do this with my site, but it's seems google image stoped index my image?! anyone know why?Panama
I believe, Google does not send "blank.html" as referrer any longer. This approach seem stop have stopped working. Can anyone confirm this?Icecap
I
2

A couple of new wordpress plugins are available to address google and bing hotlinking images:

http://wordpress.org/extend/plugins/imaguard/ http://wordpress.org/extend/plugins/google-break-dance/

Interferon answered 11/2, 2013 at 5:53 Comment(4)
Perfect - I'll go through their source code, since we need a solution for NGINX/Django. Do you know of any that already exists?Icecap
Checked the code and read more about those WordPress plugins: definitively not the same technique as fansshare's. According to bloggers, those plugins even cause images to be de-indexed from Google.Icecap
My images didn't get deindexed by using Imaguard. Imaguard forces a redirect for non-white listed sites or local referrers. I just white listed googlebot so that it can crawl my images and the redirect only happens when Google try to hotlink the large preview. Maybe people forget to white list google crawlers...Counterpoise
try also github.com/mompracem/direct-images-redirect but mind it's a new one may have bugs (there's currently one related to services like Facebook and other platforms that won't be able to generate a thumbnail while accessing your content)Lettering
L
0

Hi there here's a new plugin to address this issue on WordPress

https://github.com/mompracem/direct-images-redirect

Instead of using watermarked images, it just redirects the user who tries to access an image directly to the post or page where that image was originally attached to.

It's a new plugin therefore might have some bug, please test and report issues over github thank you

Lettering answered 23/2, 2013 at 14:4 Comment(0)
Z
0

hm ... about sending a different image or url to Googlebots, compared with regular users is not ok ! Images should be silent-redirected ().

For Wordpress blogs, WP-PICShield I think it's one of the best options !

  • Caching Support,
  • Pass-Through Images Request
  • Anti-IFRAME Protection,
  • Custom image transprency
  • Custom PNG watermark
  • HostName over images as url and/or in QR-BarCode !!!
  • Redirect direct-link to: attachment, single/gallery, or home
  • Protection against unauthorized requests
  • Avoid memory errors for big files
  • Allow Online Translators
  • Allow share button for socials sites:Facebook, Pinterest, Thumblr, Twitter, Google Plus
  • Allow Wordpress via RPC and Twitter via OAuth
  • Manual Clear Cache script avoid php limit execution
  • Allow remote ip list
  • +++ CDN Tools and helps

and more...

Zed answered 24/2, 2013 at 1:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.