How stable is s3fs to mount an Amazon S3 bucket as a local directory [closed]
Asked Answered
S

2

77

How stable is s3fs to mount an Amazon S3 bucket as a local directory in linux? Is it recommended/stable for high demand production environments?

Are there any better/similar solutions?

Update: Would it be better to use EBS and to mount it via NFS to all other AMIs?

Sirrah answered 29/5, 2012 at 14:27 Comment(4)
it's a shame questions like this end up being closed. but at least they are not deleted...Sirrah
Yeah give us a gosh darn opinion section or something shesh. Sometimes you need an opinion...Highball
Again a good questions closed... It's like "closed" is a tag for good question!Ayotte
"StackOverflow, where your question is good enough to remain on the site to get clicks, but not good enough to remain open."Senghor
C
98

There's a good article on s3fs here, which after reading I resorted to an EBS Share.

It highlights a few important considerations when using s3fs, namely related to the inherent limitations of S3:

  • no file can be over 5GB
  • you can't partially update a file so changing a single byte will re-upload the entire file.
  • operation on many small files are very efficient (each is a separate S3 object after all) but large files are very inefficient
  • Though S3 supports partial/chunked downloads, s3fs doesn't take advantage of this so if you want to read just one byte of a 1GB file, you'll have to download the entire GB.

It therefore depends on what you are storing whether s3fs is a feasible option. If you're storing say, photos, where you want to write an entire file or read an entire file never incrementally change a file, then its fine, although one may ask, if you're doing this, then why not just use S3's API Directly?

If you're talking about appliation data, (say database files, logging files) where you want to make small incremental change then its a definite no - S3 Just doesn't work that way you can't incrementally change a file.

The article mentioned above does talk about a similar application - s3backer - which gets around the performance issues by implementing a virtual filesystem over S3. This gets around the performance issues but itself has a few issues of its own:

  • High risk for data corruption, due to the delayed writes
  • too small block sizes (e.g., the 4K default) can add significant extra costs (e.g., $130 for 50GB with 4K blocks worth of storage)
  • too large block sizes can add significant data transfer and storage fees.
  • memory usage can be prohibitive: by default it caches 1000 blocks.
    With the default 4K block size that's not an issue but most users
    will probably want to increase block size.

I resorted to EBS Mounted Drived shared from an EC2 instance. But you should know that although the most performant option it has one big problem An EBS Mounted NFS Share has its own problems - a single point of failure; if the machine that's sharing the EBS Volume goes down then you lose access on all machines which access the share.

This is a risk I was able to live with and was the option I chose in the end.

Condole answered 30/5, 2012 at 16:41 Comment(5)
answering your question: "if you're doing this, then why not just use S3's API Directly?" I will use S3's API, but I need a quick solution where uploading my already-working-in-a-single-server app find using S3 transparent. I can't live with a single point of failure (EBS). Is it transparent for s3fs if S3 fails and another is automatically mounted by Amazon ? I'm not sure how that works...Sirrah
In that case I would probably just store it on the hard drive of the server in a temp folder. Otherwise you're effectively uploading to S3 twice!! Putting the file on an S3 based temp folder will take just as long as using the API to upload to S3 directlyCondole
what I mean is that in the near future I'll stop using s3fs and start using the S3 API; in the meantime i'll use s3fs to get the system running... thank you very much...Sirrah
There are people using google's gsutil to access S3 directly as well, and now there is efs.Prehensile
SFTP Gateway on the Amazon Web Services marketplace could be another option - aws.amazon.com/marketplace/pp/B072M8VY8M/…Perigordian
T
25

This is an old question so I'll share my experience over the past year with S3FS.

Initially, it had a number of bugs and memory leaks (I had a cron-job to restart it every 2 hours) but with the latest release 1.73 it's been very stable.

The best thing about S3FS is you have one less things to worry about and get some performance benefits for free.

Most of your S3 requests are going to be PUT (~5%) and GET (~95%). If you don't need any post-processing (thumbnail generation for example). If you don't need any post-processing, you shouldn't be hitting your web server in the first place and uploading directly to S3 (using CORS).

Assuming you are hitting the server probably means you need to do some post-processing on images. With an S3 API you'll be uploading to the server, then uploading to S3. If the user wants to crop, you'll need to download again from S3, then re-upload to server, crop and then upload to S3. With S3FS and local caching turned on this orchestration is taken care of for you and saves downloading files from S3.

On caching, if you are caching to an ephemeral drive on EC2, you get a the performance benefits that come with out and can purge your cache without having to worry about anything. Unless you run out of disk space, you should have no reason to purge your cache. This makes traversing operations like searching and filtering much easier.

The one thing I do wish it has was full sync with S3 (RSync style). That would make it an enterprise version of DropBox or Google Drive for S3 but without having to contend with the quotas and fees that come with it.

Toffeenosed answered 20/10, 2013 at 12:9 Comment(1)
with regards to "full sync", are you referring to something like this (s3tools.org/s3cmd-sync) integrated into S3FS?Pasho

© 2022 - 2024 — McMap. All rights reserved.