flock(): removing locked file without race condition?
Asked Answered
R

3

36

I'm using flock() for inter-process named mutexes (i.e. some process can decide to hold a lock on "some_name", which is implemented by locking a file named "some_name" in a temp directory:

lockfile = "/tmp/some_name.lock";
fd = open(lockfile, O_CREAT);
flock(fd, LOCK_EX);

do_something();

unlink(lockfile);
flock(fd, LOCK_UN);

The lock file should be removed at some point, to avoid filling the temp directory with hundreds of files.

However, there is an obvious race condition in this code; example with processes A, B and C:

A opens file
A locks file
B opens file
A unlinks file
A unlocks file
B locks file (B holds a lock on the deleted file)
C opens file (a new file one is created)
C locks file (two processes hold the same named mutex !)

Is there a way to remove the lock file at some point without introducing this race condition ?

Retention answered 17/7, 2013 at 19:46 Comment(2)
Problem is you are trying to implement a fine-grained locking strategy (files represent resources), but you have contention on a shared coarse-grained resource (the file system). You either need a global lock on the lock file directory before you update your fine-grained locks, or you redesign your locking strategy altogether.Luckless
Can you elaborate your need? Eg why not have the programs use 1 well-known filename if they are all locking the same conceptual resource?Knitter
A
37

Sorry if I reply to a dead question:

After locking the file, open another copy of it, fstat both copies and check the inode number, like this:

lockfile = "/tmp/some_name.lock";

    while(1) {
        fd = open(lockfile, O_CREAT);
        flock(fd, LOCK_EX);

        fstat(fd, &st0);
        stat(lockfile, &st1);
        if(st0.st_ino == st1.st_ino) break;

        close(fd);
    }

    do_something();

    unlink(lockfile);
    flock(fd, LOCK_UN);

This prevents the race condition, because if a program holds a lock on a file that is still on the file system, every other program that has a leftover file will have a wrong inode number.

I actually proved it in the state-machine model, using the following properties:

If P_i has a descriptor locked on the filesystem then no other process is in the critical section.

If P_i is after the stat with the right inode or in the critical section it has the descriptor locked on the filesystem.

Ahvenanmaa answered 11/9, 2013 at 15:25 Comment(9)
You can unlock file after it is unlinked and closed? Manual says that flock "Apply or remove an advisory lock on the open file specified by fd".Demonstration
Shouldn't you also close the fd at the end to prevent a leak?Collie
Not portable! E.g. on Windows st_ino is always 0.Headache
What is the state machine model?Ovoviviparous
this solution does not work, the question is about 3 processes A, B , C.Tammitammie
@Demonstration — You can unlock the file after it is deleted. You cannot unlock the file after you close the file descriptor because (a) closing the file descriptor releases your locks on it and (b) you need the file descriptor to do the unlocking but the file descriptor is no longer valid after close() — though the file descriptor might be reused, leading to trackability issues. Note, though, that the loop doesn't exit via the bottom — it exits via the break before the close().Peccable
I agree this doesn't work. A takes lock, B waits; A releases lock, B acquires it, A unlinks the file; C comes in, sees no file, creates a new one. How can C figure out that B has still the old file (with different inode) locked? C doesn't know that the file existed before and had been unlinked.Jahn
@Jahn No, you messed up the execution sequence of A: A unlinks the file first, then releases the lock, not releases first.Bush
@KelvinHu, I suppose you're right. In my scenario, B would grab the lock but find the file either removed, or a different file created by C, and would give up the lock again (by closing the fd). Thus C would probably take the lock on its freshly-created file, and B would need to line up again. I think the example should handle the case that the stat() call fails.Jahn
C
9
  1. In Unix it is possible to delete a file while it is opened - the inode will be kept until all processes have ended that have it in their file descriptor list
  2. In Unix it is possible to check that a file has been removed from all directories by checking the link count as it becomes zero

So instead of comparing the ino-value of the old/new file paths you can simply check the nlink count on the file that is already open. It assumes that it is just an ephemeral lock file and not a real mutex resource or device.

lockfile = "/tmp/some_name.lock";

for(int attempt; attempt < timeout; ++attempt) {
    int fd = open(lockfile, O_CREAT, 0444);
    int done = flock(fd, LOCK_EX | LOCK_NB);
    if (done != 0) { 
        close(fd);
        sleep(1);     // lock held by another proc
        continue;
    }
    struct stat st0;
    fstat(fd, &st0);
    if(st0.st_nlink == 0) {
       close(fd);     // lockfile deleted, create a new one
       continue;
    }
    do_something();
    unlink(lockfile); // nlink :=0 before releasing the lock
    flock(fd, LOCK_UN);
    close(fd);        // release the ino if no other proc 
    return true;
}
return false;
Curtain answered 27/6, 2018 at 20:18 Comment(3)
@Bob how so? This code never unlinks a file that it doesn't have an exclusive lock on, and when that check is done, no other process can have such an exclusive lock, since it has the lock itself.Stonybroke
@JosephSible-ReinstateMonica This code never unlinks a file that it doesn't have an exclusive lock on The flock() is by file descriptor (actually file description), the unlink() is by name. I see nothing that prevents a rename() call on a file even if flock() put an exclusive lock on the file description.Kassala
@AndrewHenle Sure, someone coming in and doing a rename() will break things. But this code doesn't do that, and it shouldn't be a surprise at all that you can break other programs if you mess with their lock files behind their backs.Stonybroke
P
4

If you use these files for locking only, and do not actually write to them, then I suggest you treat the existence of the directory entry itself as an indication for a held lock, and avoid using flock altogether.

To do so, you need to construct an operation which creates a directory entry and reports an error if it already existed. On Linux and with most file systems, passing O_EXCL to open will work for this. But some platforms and some file systems (older NFS in particular) do not support this. The man page for open therefore suggests an alternative:

Portable programs that want to perform atomic file locking using a lockfile, and need to avoid reliance on NFS support for O_EXCL, can create a unique file on the same file system (e.g., incorporating hostname and PID), and use link(2) to make a link to the lockfile. If link(2) returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.

So this looks like a locking scheme which is officially documented and therefore indicates a certain level of support and best practice suggestion. But I have seen other approaches as well. bzr for example uses directories instead of symlinks in most places. Quoting from its source code:

A lock is represented on disk by a directory of a particular name, containing an information file. Taking a lock is done by renaming a temporary directory into place. We use temporary directories because for all known transports and filesystems we believe that exactly one attempt to claim the lock will succeed and the others will fail. (Files won't do because some filesystems or transports only have rename-and-overwrite, making it hard to tell who won.)

One downside to the above approaches is that they won't block: a failed locking attempt will result in an error, but not wait till the lock becomes available. You will have to poll for the lock, which might be problematic in the light of lock contention. In that case, you might want to further depart from your filesystem-based approach, and use third party implementations instead. But general questions on how to do ipc mutexes have already been asked, so I suggest you search for [ipc] [mutex] and have a look at the results, this one in particular. By the way, these tags might be useful for your post as well.

Puttier answered 18/7, 2013 at 4:55 Comment(2)
The problem this approach suffers is that if a process dies while holding a lock, there is no sound way to automatically reap the lock. In contrast, in the flock approach, if a process dies while holding the lock, the file will remain on the file system, but it will no longer be flock'ed; other processes may now acquire the lock.Bim
@davidg: That's a valid point. Some implementations write a timestamp and/or the pid of the locking process to the file which gets linked to the lock file name, or to a well-known file in the lock directory. That way, you can check whether a process with that ID is still alive, and you can also expire locks after a given time.Puttier

© 2022 - 2024 — McMap. All rights reserved.