I have a pretty complicated python program. Internally it has a logging system that uses an exclusive (LOCK_EX
) fcntl.flock
to manage global locking. Effectively, whenever a log message is dumped, the global file lock is acquired, message is emitted to file (different from lock file) and global file lock is released.
The program also forks itself several times (after log management is set up). Generally everything works.
If the parent process is killed (and children stay alive), I occasionally get a deadlock. All programs block on the fcntl.flock()
forever. Trying to acquire the lock externally also blocks forever. I have to kill the children programs to fix the problem.
What is baffling though is that lsof lock_file
shows no process as holding the lock! So I cannot figure out why the file is being locked by the kernel but no process is reported as holding it.
Does flock
have issues with forking? Is the dead parent somehow holding the lock even though it is no longer in the process table? How do I go about resolving this issue?