FUSE - avoid calculating size in GetAttr
Asked Answered
C

2

9

I'm implementing a FUSE file system for a remote service. When the user opens a file I do a network call to get the file's contents. It appears that the file's size must be reported through GetAttr in order for open to work.
In order to know the file's size, I have to issue a network call, and since GetAttr is called for every entry when doing ls, I'm concerned about this design (if a user does ls in a directory with many items, it will have to get all the files, even if the user didn't want to open any of them).

How can I work around this problem? My thoughts were:

  • Use a lower level method for reading that doesn't rely on reported size? I thought using Read instead of Open could help, however I couldn't get it to work without a size.
  • If I could distinguish GetAttr calls that originated from Open from other calls (including ls), I could issue the network calls only when needed.

I use Go and go-fuse, but I think it shouldn't matter because it's a general FUSE question.

Also, FUSE docs are very minimal (missing actually) documentation. It would be nice if someone familiar with the matter can explain the call flow for ls, cd and cat - what FUSE functions are called in which order.
For example, why there is both Open and Read.

Update:
I've been browsing SSHFS which is considered the canonical example for a FUSE filesystem, and it seems that it also gets the file over network on getattr: https://github.com/libfuse/sshfs/blob/master/sshfs.c#L3167
What do you think?

Colwin answered 17/9, 2017 at 19:20 Comment(0)
O
1

The problem you are seeing is because the the kernel is buffering your read, and when it does so, it uses the size of the inode to calculate exactly how many bytes it has to copy to userspace (https://elixir.bootlin.com/linux/v4.19.7/source/mm/filemap.c#L2137). So there are different workarounds:

  1. Return huge st_size from GetAttr

  2. When you open the file, set the direct_io flag so you don't use page caches.

Overgrowth answered 8/12, 2018 at 6:21 Comment(1)
Setting the file size to something huge confuses some programs so that they attempt to allocate memory for the huge size and fail.Lowborn
A
0

I don't know the go-fuse's API. Below info is based on libfuse's API.

The SSHFS's GetAttr is implement in function sshfs_getattr, it looks like send network request get file size info.

When you run cd, it will run .access callback to check directory exists.

When you run ls, it will first call .readdir callback get dir info, then call .getattr get info for files in that dir.

When you run cat, it will first call .getattr get info for file and info for path. Then call .open => .read => .release.

FUSE's is lack of doc, you better first write an example, then you can add some printf in those callbacks to get some info.

  1. In .open, you can create an privite data and set it to fuse_file_info::fh. This fuse_file_info::fh can be used in later .read callbacks.
  2. You can set all size info to zero in .getattr callbacks. Then in .open, you set fuse_file_info::direct_io to 1. In .read, first read data from network, if you reach the end of file, then return 0 in .read.

This doc helps me a lot, when I wrote my filesystem.

Auspicious answered 14/4, 2020 at 16:29 Comment(1)
Setting the file size to zero confuses some programs so that they don't even try to read the file.Lowborn

© 2022 - 2024 — McMap. All rights reserved.