Retrieve filename from file descriptor in C
Asked Answered
P

8

131

Is it possible to get the filename of a file descriptor (Linux) in C?

Propagandize answered 27/7, 2009 at 15:14 Comment(3)
I guess, the chosen answer should be given to zneak as his solution has better portability and has no noted access problems.Vulcanize
It not supported on Ubuntu 14.04 (kernel 3.16.0-76-generic). I'm guessing it's not supported on Linux at all.Chrissa
For macOS, see this answer to another question by D.Nathanael.Sweeney
T
153

You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.

Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.

Twicetold answered 27/7, 2009 at 17:38 Comment(12)
interesting warning about this. On ext4, the same st_ino number will be used with alarming regularity. I tracked down a bug in a program I was working with to a file descriptor being stale because it did a stat on the filename and an fstat on the descriptor and thought they were the same. In fact the file had been rewritten and renamed over the top twice.Photoengraving
As long as the original file still has references to it (an open fd would be such a reference), the inode number cannot be reused. Any software making use of an inode number after it's closed the file or before opening it is inherently subject to race conditions.Ledesma
@R, indeed - which is precisely why I suggested using fopen, which requires having the handle open :)Twicetold
Danger, Will Robinson! This does not always work --- if you do setuid() tricks, it's possible for /proc/self/fd to not be accessible by your process. See: permalink.gmane.org/gmane.linux.kernel/1302546Nela
@Twicetold : and in the case /proc is not mounted?Spillar
@user2284570, then mount /proc :) Most introspection of this sort will require procfs to be mounted somewhere.Twicetold
@Twicetold : Which require root access on my shared NetBSD host provider (there is no virtualisation).Spillar
@user2284570, this answer is Linux-specific. I don't know if NetBSD supports procfs at all - if your shared host doesn't provide it, it's probably because NetBSD doesn't support it at all and uses another mechanism instead. You may want to post another question with a NetBSD focus to see if anyone knows how NetBSD exposes this information (you might want to also try zneak's answer below, OS X is more similar to BSD than Linux)Twicetold
@Twicetold : NetBSD support /proc but it's not mandatory to mount it. Each time I mentioned, the answer became "switch away to an higher cost provider and you'll get /proc". So I'm looking for a procless solution.Spillar
agree, using fcntl is much more portable and should be preferredVulcanize
This will not work with memory-mapped files, as there exists no procfs entry for the associated file descriptor.Kitten
What is the equivalent of this in windows?Footrace
H
119

I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.

We do, instead, have a F_GETPATH command for fcntl:

 F_GETPATH          Get the path of the file descriptor Fildes.  The argu-
                    ment must be a buffer of size MAXPATHLEN or greater.

So to get the file associated to a file descriptor, you can use this snippet:

#include <sys/syslimits.h>
#include <fcntl.h>

char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
    // do something with the file path
}

Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.

Hawker answered 24/11, 2012 at 18:52 Comment(9)
@uchuugaka, probably not. Use getsockname.Hawker
Getsockname gives an IP address.Benediction
What do you expect? Unless it's a UNIX socket, it has no file associated.Hawker
hmm. but everything is a file... there is a descriptor... it responds to many of the same things as "files" ... hoping for the convenience anyway. Probably a way to fudge it with file descriptors ...Benediction
It'd be better to test for == 1 and handle failure in the if statement and something with filePath belowArchery
@Benediction Yes, everything is a file, but not everything is a directory entry with a name and a location inside the filesystem tree. A file is represented by an inode, it can exists without any directory entry refering to it.Holt
In <sys/param.h>: #define MAXPATHLEN PATH_MAXGibeon
I just tested this and it remains correct if the file is moved and you call it again (meaning: you get the new path of the file). However this is not supported on linux (tested on Ubuntu 14.04 - F_GETPATH is not defined).Chrissa
The F_GETPATH exists neither on Linux, nor even on FreeBSD (from which MacOS was once derived).Mayramays
S
36

In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.

Seismism answered 27/7, 2009 at 15:20 Comment(0)
D
17

As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.

If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).

If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).

Dibri answered 27/7, 2009 at 15:57 Comment(0)
S
11

Before writing this off as impossible I suggest you look at the source code of the lsof command.

There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.

Sis answered 27/7, 2009 at 17:30 Comment(0)
T
6

You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name. Nasty?

Truism answered 27/7, 2009 at 15:52 Comment(0)
G
0

Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.

Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

Glyceryl answered 27/7, 2009 at 15:15 Comment(10)
then my answer applies. Linux has no facilities to do this. Linux (POSIX) file descriptors don't necessarily refer to files, and even if they do they refer to inodes, not file names. A descriptor can point to a deleted file (which therefore has no name, this is a common way of making temp files) or it may point to an inode with multiple names (hard links).Glyceryl
Try taking a look at the lsof source code. :) That's what I did when I had this same question myself a while back. lsof works on black magic and sacrificial goats - you cannot hope to duplicate its behavior. To be more specific, lsof is tightly coupled with the linux kernel, and doesn't do what it does by means of any API that is available to user-land code.Glyceryl
Linux has a non-portable proc API for this. There are indeed limitations, but saying it's impossible is just plain false.Twicetold
@Tyler - lsof runs in userspace. Therefore, there is an API for whatever it does available to userland code :)Twicetold
@Tyler OP is running on linux. And lsof runs fine on a number of unix variants including sun, hpux, etcSis
@Duck, the portability there is probably why lsof's source has so much black magic; each UNIX variant does it differently. The linux proc interfaces aren't too bad, really, alebit rather sparsely documented.Twicetold
@Twicetold That and it has so many options it can make any sane person weep.Sis
@Twicetold I know it runs in userspace, but that does not mean there is a well-defined, stable API for what it does. I'm not saying that you can't do what lsof does (obviously you can), but that you can't really do it without committing to doing and maintaining all of the same ridiculous things that lsof does to remain semi-portable.Glyceryl
@Tyler, the linux proc api is stable. It isn't portable, but if all you care about is Linux, the kernel people are very careful about not breaking apps that depend on it.Twicetold
I see a lot of mention of reading the lsof source as if all will be revealed. Read the lsof source...you will enjoy the 'dialects' folder.Wallen
S
0

There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.

#include <string>
#include <vector>

#include <cstdio>
#include <cstring>

#include <sys/stat.h>
#include <fts.h>

#include <sys/sysctl.h>
#include <kvm.h>

using std::string;
using std::vector;

string pidfd2path(int pid, int fd) {
  string path; char errbuf[_POSIX2_LINE_MAX];
  static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
  kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
  if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
    for (int i = 0; i < cntp; i++) {
      if (kif[i].fd_fd == fd) {
        FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
        vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
        file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
        if (file_system) {
          while ((parent = fts_read(file_system))) {
            child = fts_children(file_system, 0);
            while (child && child->fts_link) {
              child = child->fts_link;
              if (!S_ISSOCK(child->fts_statp->st_mode)) {
                if (child->fts_statp->st_dev == kif[i].va_fsid) {
                  if (child->fts_statp->st_ino == kif[i].va_fileid) {
                    path = child->fts_path + string(child->fts_name);
                    goto finish;
                  }
                }
              }
            }
          }
          finish:
          fts_close(file_system); 
        }
      }
    }
  }
  kvm_close(kd);
  return path;
}

int main(int argc, char **argv) {
  if (argc == 3) {
    printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10), 
      (int)strtoul(argv[2], nullptr, 10)).c_str());
  } else {
    printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
  }
  return 0;
}

If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.

The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.

Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:

#include <string>
#include <vector>

#include <cstring>

#include <sys/stat.h>
#include <fts.h>

using std::string;
using std::vector;

string fd2path(int fd) {
  string path;
  FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
  vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
  file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
  if (file_system) {
    while ((parent = fts_read(file_system))) {
      child = fts_children(file_system, 0);
      while (child && child->fts_link) {
        child = child->fts_link; struct stat info = { 0 }; 
        if (!S_ISSOCK(child->fts_statp->st_mode)) {
          if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
            if (child->fts_statp->st_dev == info.st_dev) {
              if (child->fts_statp->st_ino == info.st_ino) {
                path = child->fts_path + string(child->fts_name);
                goto finish;
              }
            }
          }
        }
      }
    }
    finish: 
    fts_close(file_system); 
  }
  return path;
}

An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.

Sadie answered 16/1, 2022 at 7:51 Comment(2)
10 levels of nesting with a goto?!?!?! Read this: Invert "if" statement to reduce nestingEla
@AndrewHenle I guess it's a matter of personal preference, but I generally don't like having multiple return points that require to have even more calls to functions that free memory, it makes it easier to accidentally double free or miss a memory leak by not freeing in the right places, or too much, too little, etc. Even in a failure case where it doesn't find the file because it was deleted, it takes less than a second on my system and i have several tens of thousands of files in my filesystem right now.Sadie

© 2022 - 2024 — McMap. All rights reserved.