I'm implementing a FUSE filesystem intended to provide access via familiar POSIX calls to files which are actually stored behind a RESTful API. The filesystem caches files once they've been retrieved for the first time so that they are more readily available in subsequent accesses.
I'm running the filesystem in multi-threaded mode (which is FUSE's default) but finding that the getattr calls seem to be serialised, even though other calls can take place in parallel.
When opening a file FUSE always calls getattr first, and the client I'm supporting needs the file size returned by this initial call to be accurate (I don't have any control over this behaviour). This means that if I don't have the file cached I need to actually get the information via the RESTful API calls. Sometimes these calls happen over a high latency network, with a round trip time of approximately 600ms.
As a result of the apparent sequential nature of the getattr call, any access to a file which is not currently cached will cause the entire filesystem to block any new operations while this getattr is serviced.
I've come up with a number of ways to work round this, but all seem ugly or long-winded, really I just want the getattr calls to run in parallel like all the other calls seem to.
Looking at the source code I don't see why getattr should be behaving like this, FUSE does lock the tree_lock mutex, but only for read, and there are no writes happening at the same time.
For the sake of posting something simple in this question I've knocked up an incredibly basic implementation which just supports getattr and allows easy demonstration of the issue.
#ifndef FUSE_USE_VERSION
#define FUSE_USE_VERSION 22
#endif
#include <fuse.h>
#include <iostream>
static int GetAttr(const char *path, struct stat *stbuf)
{
std::cout << "Before: " << path << std::endl;
sleep(5);
std::cout << "After: " << path << std::endl;
return -1;
}
static struct fuse_operations ops;
int main(int argc, char *argv[])
{
ops.getattr = GetAttr;
return fuse_main(argc, argv, &ops);
}
Using a couple of terminals to call ls on a path at (roughly) the same time shows that the second getattr call only starts once the first has finished, this causes the second ls to take ~10 seconds instead of 5.
Terminal 1
$ date; sudo ls /mnt/cachefs/file1.ext; date
Tue Aug 27 16:56:34 BST 2013
ls: /mnt/cachefs/file1.ext: Operation not permitted
Tue Aug 27 16:56:39 BST 2013
Terminal 2
$ date; sudo ls /mnt/cachefs/file2.ext; date
Tue Aug 27 16:56:35 BST 2013
ls: /mnt/cachefs/file2.ext: Operation not permitted
Tue Aug 27 16:56:44 BST 2013
As you can see, the time difference from the two date
outputs from before the ls
differs only by one second, but the two from after the ls
differs by 5 seconds, which corresponds to the delay in GetAttr
function. This suggests that the second call is blocked somewhere deep in FUSE.
Output
$ sudo ./cachefs /mnt/cachefs -f -d
unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
INIT: 7.10
flags=0x0000000b
max_readahead=0x00020000
INIT: 7.8
flags=0x00000000
max_readahead=0x00020000
max_write=0x00020000
unique: 1, error: 0 (Success), outsize: 40
unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 50
LOOKUP /file1.ext
Before: /file1.ext
After: /file1.ext
unique: 2, error: -1 (Operation not permitted), outsize: 16
unique: 3, opcode: LOOKUP (1), nodeid: 1, insize: 50
LOOKUP /file2.ext
Before: /file2.ext
After: /file2.ext
unique: 3, error: -1 (Operation not permitted), outsize: 16
The above code and examples are nothing like the actual application or how the application is used, but demonstrates the same behaviour. I haven't shown this in the example above, but I've found that once the getattr call completes, subsequent open calls are able to run in parallel, as I would have expected.
I've scoured the docs to try and explain this behaviour and tried to find someone else reporting a similar experience but can't seem to find anything. Possibly because most implementations of getattr would be so quick you wouldn't notice or care if it was being serialised, or maybe because I'm doing something silly in the configuration. I'm using version 2.7.4 of FUSE, so it's possible this was an old bug that has since been fixed.
If anyone has any insight in to this it would be greatly appreciated!