How can I mount an S3 bucket to an EC2 instance and write to it with PHP?
Asked Answered
P

2

54

I'm working on a project that is being hosted on Amazon Web Services. The server setup consists of two EC2 instances, one Elastic Load Balancer and an extra Elastic Block Store on which the web application resides. The project is supposed to use S3 for storage of files that users upload. For the sake of this question, I'll call the S3 bucket static.example.com

I have tried using s3fs (https://code.google.com/p/s3fs/wiki/FuseOverAmazon), RioFS (https://github.com/skoobe/riofs) and s3ql (https://code.google.com/p/s3ql/). s3fs will mount the filesystem but won't let me write to the bucket (I asked this question on SO: How can I mount an S3 volume with proper permissions using FUSE). RioFS will mount the filesystem and will let me write to the bucket from the shell, but files that are saved using PHP don't appear in the bucket (I opened an issue with the project on GitHub). s3ql will mount the bucket, but none of the files that are already in the bucket appear in the filesystem.

These are the mount commands I used:

s3fs static.example.com -ouse_cache=/tmp,allow_other /mnt/static.example.com
riofs -o allow_other http://s3.amazonaws.com static.example.com /mnt/static.example.com
s3ql mount.s3ql s3://static.example.com /mnt/static.example.com

I've also tried using this S3 class: https://github.com/tpyo/amazon-s3-php-class/ and this FuelPHP specific S3 package: https://github.com/tomschlick/fuel-s3. I was able to get the FuelPHP package to list the available buckets and files, but saving files to the bucket failed (but did not error).

Have you ever mounted an S3 bucket on a local linux filesystem and used PHP to write a file to the bucket successfully? What tool(s) did you use? If you used one of the above mentioned tools, what version did you use?

EDIT I have been informed that the issue I opened with RioFS on GitHub has been resolved. Although I decided to use the S3 REST API rather than attempting to mount a bucket as a volume, it seems that RioFS may be a viable option these days.

Petard answered 7/5, 2013 at 21:7 Comment(5)
Why the downvote? Do I need to be more/less specific?Petard
Why aren't you using the S3 API instead of trying to use it as a filesystem?Herniorrhaphy
I'll resist the temptation to vote as duplicate of your closed question since you've clearly made a good prior effort, and I think one of them should stay open. Whilst people should have gotten the joke on the other question, in my experience if a question is asked in frustration, it tends to get heavily downvoted here. C'est la vie, I guess!Wonacott
(Btw, if you can edit some code into the question, then so much the better)Wonacott
@Wonacott I figured people were more reacting to the title of the post (I definitely am frustrated!) rather than the content, which is why I tried again. Thanks!Petard
S
51

Have you ever mounted an S3 bucket on a local linux filesystem?

No. It's fun for testing, but I wouldn't let it near a production system. It's much better to use a library to communicate with S3. Here's why:

  1. It won't hide errors. A filesystem only has a few errors codes it can send you to indicate a problem. An S3 library will give you the exact error message from Amazon so you understand what's going on, log it, handle corner cases, etc.
  2. A library will use less memory. Filesystems layers will cache lots of random stuff that you many never use again. A library puts you in control to decide what to cache and not to cache.
  3. Expansion. If you ever need to do anything fancy (set an ACL on a file, generate a signed link, versioning, lifecycle, change durability, etc), then you'll have to dump your filesystem abstraction and use a library anyway.
  4. Timing and retries. Some fraction of requests randomly error out and can be retried. Sometimes you may want to retry a lot, sometimes you would rather error out quickly. A filesystem doesn't give you granular control, but a library will.

The bottom line is that S3 under FUSE is a leaky abstraction. S3 doesn't have (or need) directories. Filesystems weren't built for billions of files. Their permissions models are incompatible. You are wasting a lot of the power of S3 by trying to shoehorn it into a filesystem.

Two random PHP libraries for talking to S3:

https://github.com/KnpLabs/Gaufrette

https://aws.amazon.com/sdkforphp/ - this one is useful if you expand beyond just using S3, or if you need to do any of the fancy requests mentioned above.

Spearmint answered 7/5, 2013 at 21:7 Comment(2)
I don't know why I even tried to mount the S3 bucket on the local filesystem...probably because it was somebody else's idea first.Petard
I checked out Gaufrette, thinking it would make s3 integration easier, but really, amazon's php sdk is quite easy to use by itself.Fleming
C
2

Quite often, it is advantageous to write files to the EBS volume, then force subsequent public requests for the file(s) to route through CloudFront CDN.

In that way, if the app must do any transformations to the file, it's much easier to do on the local drive & system, then force requests for the transformed files to pull from the origin via CloudFront.

e.g. if your user is uploading an image for an avatar, and the avatar image needs several iterations for size & crop, your app can create these on the local volume, but all public requests for the file will take place through a cloudfront origin-pull request. In that way, you have maximum flexibility to keep the original file (or an optimized version of the file), and any subsequent user requests can either pull an existing version from cloud front edge, or cloud front will route the request back to the app and create any necessary iterations.

An elementary example of the above would be WordPress, which creates multiple sized/cropped versions of any graphic image uploaded, in addition to keeping the original (subject to file size restrictions, and/or plugin transformations). CDN-capable WordPress plugins such as W3 Total Cache rewrite requests to pull through CDN, so the app only needs to create unique first-request iterations. Adding browser caching URL versioning (http://domain.tld/file.php?x123) further refines and leverages CDN functionality.

If you are concerned about rapid expansion of EBS volume file size or inodes, you can automate a pruning process for seldom-requested files, or aged files.

Catacomb answered 8/2, 2015 at 19:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.