I'm reading lines from a group of files (log files) following them as they are written using pyinotify.
I'm opening and reading the files with python native methods:
file = open(self.file_path, 'r')
# ... later
line = file.readline()
This is generally stable and can handle the file being deleted and re-created. pyinotify will notify the unlink and subsequent link.
However some log files are not being deleted. Instead they are being truncated and new content written to the beginning of the same file.
I'm having trouble reliably detecting when this has occurred since pyinotify will simply report only a write. The only evidence I currently get is that pyinotify reports a write and readline()
returns an empty string. BUT, it is possible that two subsiquent writes could trigger the same behavior.
I have thought of comparing a file's size to file.tell()
but according to the documentation tell
produces an opaque number and it appears this can't be trusted to be a number of bytes.
Is there a simple way to detect a file has been truncated while reading from it?
Edit:
Truncating a file can be simulated with simple shell commands:
echo hello > test.log
echo hello >> test.log
# Truncate test.log
echo goodbye > test.log
To compliment this, a simple python script can be used to confirm that file.tell()
does not reduce when the file is truncated:
foo = open('./test.log', 'r')
line = foo.readline()
while line != '':
print(foo.tell())
print(line)
line = foo.readline()
# Put a breakpoint on the following line and
# truncate the file before it executes
print(foo.tell())
tell()
returns a smaller number than the last time you called it, and you haven't seeked on your own, then something strange has happened. If you can confidently deduce that that "strange thing" is a file truncation, then I think you'll be good. - This whole idea kinda freaks me out. I'd go well out of my way to not have to read from a file that some other process might do anything but append to. – Meatustell()
will NOT move when the file is truncated. In context this is log monitoring. The whole idea is to read from a file another process is writing to. – Romaromagnatell()
is telling you (ha!) if it were to give you a smaller number. If it won't, it won't. And as I said, I'm not the guy who's going to have had any experience in writing code against files that can have their contents wiped out while I'm reading them. Best of luck in figuring this out! – Meatustail -f
does? – Meatus