What does s3fs cache in /tmp?
Asked Answered
G

1

8

I'm using s3fs to mount a lot of files to an S3 bucket. It works fine except the fact that my local disk space is also growing a lot (the content in the /tmp directory).

My command is:

$ su ec2-user -c '/usr/bin/s3fs my-bucket-name -o use_cache=/tmp /home/ec2-user/dir'`

I'm using the use_cache parameter but what is actually cached? Are this files which still need to be uploaded to s3 and are cached on my local machine? Can I just delete it during upload/mount or not? And will my upload go quicker if I turn it off (if it's for other purposes)?

Groggery answered 21/1, 2019 at 14:45 Comment(0)
D
15

From the s3fs wiki (which is a bit hard to find).

If enabled via "use_cache" option, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever s3fs needs to read or write a file on s3 it first downloads the entire file locally to the folder specified by use_cache and operates on it. When fuse release() is called, s3fs will re-upload the file to s3 if it has been changed. s3fs uses md5 checksums to minimize downloads from s3. Note: this is different from the stat cache (see below).

Local file caching works by calculating and comparing md5 checksums (ETag HTTP header).

The folder specified by use_cache is just a local cache. It can be deleted at any time. s3fs re-builds it on demand. Note: this directory grows unbounded and can fill up a file system dependent upon the bucket and reads to that bucket.

Ditheism answered 21/1, 2019 at 15:0 Comment(2)
! what is the point of going to the trouble of mounting S3 if this stuff happens. So if this is a 200GB SQL Server BAK file I need to have 200GB locally up my sleeve and more if SSMS thinks it is doing me a favour by reading the BAK files I was not intending to select. I guess I am not using cache then.Atrip
You're correct. I wouldn't suggest using the cache for files that are infrequently accessed, like hopefully your .BAK files are, or primarily read from cache to s3. For pulling in frequently accessed files to a local cache, it makes sense on the write side.Ditheism

© 2022 - 2024 — McMap. All rights reserved.