How to proxy files from S3 through rails application to avoid leeching?
Asked Answered
D

4

6

In order to avoid hot-linking, S3 bandwidth leeching, etc I would like to make my bucket private and serve the files through a Rails app. Concept in general sounds very easy, but I am not entirely sure which approach would be the best for the situation.

I am using paperclip for general asset management. Is there any build-in way to achieve this type of proxy?

In general I can easily parse the url's from paperclip and point them back to my own controller. What should happen from this point? Should I simply use Net::HTTP to download the image, and then serve it with send_data? In between I want to log referer and set proper Control-Cache headers, since I have a reverse-proxy in front of the app. Is Net::HTTP + send_data resonable way in this case?

Maybe whole ideas is really bad for some reasons I am not aware at this moment? I general I believe that reveling the direct S3 links to public bucket is dangerous and yield in some serious problems in case of leeching / hot-linking...

Update:

If you have any other ideas which can reduce S3 bill and prevent hot-linking leeching in anyway please share, even if they are not directly related to Rails.

Douce answered 27/10, 2010 at 22:2 Comment(2)
Are you currently having a problem with leeching? I would be reluctant to do something that will dramatically slow my application, just to solve a problem I might have in the future.Zeitler
It's not only a matter of solving a potential problem. I just don't want to wake up one day with overwhelming S3 bill, that I cannot afford to pay... I am not so sure, if this will slow the application "dramatically", when assets will be keept in memcache / reverse proxy.Douce
C
5

Use (a private bucket|private files) and use signed URLs to the files stored on S3.

The signature includes an expiration time (e.g. 10 minutes from now, whatever you would like to set), as well as a cryptographic hash. S3 will refuse to serve files if the signature is invalid, or if the expiration time has passed.

This is useful because only you can create valid URLs to your private files in S3, and you can control how long the URLs remain valid. This prevents leeching, because leechers can't sign their own URLs and, if they get a URL that you signed, that URL will expire very shortly and after that can not be used.

Chirrupy answered 6/12, 2010 at 19:39 Comment(0)
S
5

Since there wasn't a nuts and bolts answer above, here's a small code sample of how to stream a file that's stored on S3.

render :text => proc { |response, output|
   AWS::S3::S3Object.stream(path, bucket) do |segment|
     output.write segment
     output.flush # not sure if this is needed
   end
 }

Depending on your webserver this may (mongrel) or may not (webrick) work, so don't get too frustrated if it doesn't stream in development.

Stearic answered 22/7, 2011 at 22:7 Comment(0)
Z
2

Provide temporary pre-signed URLs:

   def show
      redirect_to Aws::S3::Presigner.new.presigned_url(
         :get_object, 
         bucket: 'mybucket', 
         key: '/folder/file.pdf'
         expires_in: 60)
    end

S3 still distributes the content so you offload the work from Rails (which is very slow at it), handles HTTP caching, HEAD operations, and uses Amazon CDN.

Zobias answered 21/4, 2016 at 18:58 Comment(2)
This is smart, redirecting so that if you put the links in emails or somewhere, they will alway get a new link with new expiry once redirected.Triptych
This isn't answering the question, and there's some UX issues that cannot be mitigated: e.g. if a user hits reload on the document beyond the expiration time, he gets a document that contains information about the expiration, which often leaves the user more puzzled. There's a lot of ways the UX experience can be improved if you choose to proxy iso. redirect...Cannular
P
1

I'd probably avoid to do this -- at least until I'd have no other choice.

You need to take into account that you'll probably also add to the bandwidth bill if you download the image each time. Also, by processing each image through a script you'll also need more CPU and RAM required to do this. Not the greatest outlook -- IMHO.

I would probably enable the access logs for Amazon S3 and write a tool small to analyze usage and change the permissions on the bucket/object in case usage is goes the roof. Run this as a cronjob every 10 minutes or so and you should be save?

You could also use s3stat. They also offer a free plan.

Edit: As per my recommendation for Varnish, I'm adding a link to a blog entry about preventing hotlinking using Varnish.

Plenipotentiary answered 2/11, 2010 at 2:37 Comment(2)
I know the downsides of course, that why I am asking for an idea for solution. In general it is not all that bad. Keep in mind, that images can be cashed extensively in most cases. While doing some reasearch I came to a conslusion, that the most effective way would be a reverse-proxy in front of S3 instead rails application. I can do everything there as well - check referer, do some extensive log analysis. Such front end would redice the S3 bill dramaticly and the performance shouldn't be any lower in the end.Douce
Yeah, that sounds good -- I'd recommend varnish. :) Just keep in mind that even though the images are cached, you also run a more complicated setup by adding another service. And possibly also require more processing power (e.g. an instance or at least resources) for the proxy.Plenipotentiary

© 2022 - 2024 — McMap. All rights reserved.