How can I tail a log file in Python?
Asked Answered
M

14

104

I'd like to make the output of tail -F or something similar available to me in Python without blocking or locking. I've found some really old code to do that here, but I'm thinking there must be a better way or a library to do the same thing by now. Anyone know of one?

Ideally, I'd have something like tail.getNewData() that I could call every time I wanted more data.

Madancy answered 21/9, 2012 at 1:13 Comment(13)
subprocess.call(["tail", "-F", filename])Kegan
See this answer.Nutrilite
@Nutrilite that answer is not a "following" tail.Drabble
Avaris: No. That just tails. I need tail -F so that either it constantly gives me all new lines, or I can constantly get them every time I call some getData() function.Madancy
Monkut: No, and that's the same thing Avaris said.Madancy
Ah, right. I think I missed the -F part.Nutrilite
Would your hypothetical get_new_data method (PEP-8 name) need to return all data since the last call, or just the current tail (possibly losing some data)?Drabble
Keith: all new data since the last call.Madancy
@Eli: all new data since the last call is more than you can expect from the native tail. For example, if a rename the file being tailed and rename it back to the original name, tail will output all lines (in my platform). If it is really important to process each log line just once, you should think about implementing other mechanisms.Jablonski
@PauloScardine: tail -F is meant for reading log files which can be rotated from time to time, so it's not surprising that it would print out all lines from the beginning.Smutch
@nneonneo: sure, the unix way is piping several small and very specialized commands together to accomplish any potentially complex task; following this tradition, the OP would be writing the log processing routine to read from stdin and pipe tail -F into it. I guessed his reason to implement it in Python was to ditch the dependency on external commands, but seems like I was wrong.Jablonski
The duplicate doesn't seem appropriate because the other question doesn't ask for a following (-f) tail.Smutch
The last time I wanted this I ended up doing tail -F filename | python script.py as I didn't need stdin for any other purpose and that gave best performanceMichaud
M
15

So, this is coming quite late, but I ran into the same problem again, and there's a much better solution now. Just use pygtail:

Pygtail reads log file lines that have not been read. It will even handle log files that have been rotated. Based on logcheck's logtail2 (http://logcheck.org)

Madancy answered 4/8, 2015 at 1:56 Comment(1)
Please notice that it doesn't quite behave like tail, but it may be useful, depending on what one wants to do.Trifid
P
82

Non Blocking

If you are on linux (as windows does not support calling select on files) you can use the subprocess module along with the select module.

import time
import subprocess
import select

f = subprocess.Popen(['tail','-F',filename],\
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)

while True:
    if p.poll(1):
        print f.stdout.readline()
    time.sleep(1)

This polls the output pipe for new data and prints it when it is available. Normally the time.sleep(1) and print f.stdout.readline() would be replaced with useful code.

Blocking

You can use the subprocess module without the extra select module calls.

import subprocess
f = subprocess.Popen(['tail','-F',filename],\
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
while True:
    line = f.stdout.readline()
    print line

This will also print new lines as they are added, but it will block until the tail program is closed, probably with f.kill().

Picard answered 21/9, 2012 at 2:9 Comment(14)
Well, technically, f.stdout is a pipe, not a file (but I believe Windows is still incapable of using select on it).Smutch
In the "Blocking" solution, instead of print line, use sys.stdout.write(line) to take care of extra newlines that print will insert.Niello
line = f.stdout.readline().strip() would also remove the extra newlineBuddleia
@Buddleia Are there extra newlines printed that shouldn't be? Anyway, I believe .strip() would also remove leading whitespace that might be significant.Picard
The newline from the original log file - usually when iterating line by line, you'd prefer just the "line" string. But if the leading spaces are important - you're right, and @Mayank Jaiswal solution would be a good optionBuddleia
@Buddleia It has been a while since I worked on this piece of code, but if .readline() keeps the newline, and print adds a new one, it should be easy to fix by using sys.stdout.write() instead of print.Picard
I think that in the non-blocking solution, if there's a permissions problem opening the log file, the program just hangs with no warning or error output.Leaseholder
This only reads at most a single line per second, which is an issue if the log is growing by more than one line per second.Cadena
Answering my own comment: Replacing the contents of the if block if p.poll(1): with a loop that calls f.stdout.readline() until nothing is returned seems to work. Here is a gist with these modifications: gist.github.com/dwaltrip/bd3321880180f556ba0f9d1c4962b6f7Cadena
Ok... my gist doesn't work exactly as I expected. line = f.stdout.readline() always waits until the next line is read, if the file is present. It seems to be a non-blocking wait, though (cpu time is very short), so this might not actually be an issue. However, line = f.stdout.readline() does instantaneously return a bytes string of length 0 if the file is missing.Cadena
this worked on Python 3.8.5 just now (changed the print statement of course). also changed line = f.stdout.readline() to line = str(f.stdout.readline(), 'utf-8').strip() to clean up the print.Caruncle
This seems to skip the last log line which only shows up on subsequent poll. Why is that and how can I fix it?Signature
@Signature It has been a while since I used python but I believe this only returns lines with a line ending character. Its probably not trivial to change that behavior.Picard
You can fix the "one line per second" issue by replacing the "if" with a "while."Carmacarmack
J
56

Using the sh module (pip install sh):

from sh import tail
# runs forever
for line in tail("-f", "/var/log/some_log_file.log", _iter=True):
    print(line)

[update]

Since sh.tail with _iter=True is a generator, you can:

import sh
tail = sh.tail("-f", "/var/log/some_log_file.log", _iter=True)

Then you can "getNewData" with:

new_data = tail.next()

Note that if the tail buffer is empty, it will block until there is more data (from your question it is not clear what you want to do in this case).

[update]

This works if you replace -f with -F, but in Python it would be locking. I'd be more interested in having a function I could call to get new data when I want it, if that's possible. – Eli

A container generator placing the tail call inside a while True loop and catching eventual I/O exceptions will have almost the same effect of -F.

def tail_F(some_file):
    while True:
        try:
            for line in sh.tail("-f", some_file, _iter=True):
                yield line
        except sh.ErrorReturnCode_1:
            yield None

If the file becomes inaccessible, the generator will return None. However it still blocks until there is new data if the file is accessible. It remains unclear for me what you want to do in this case.

Raymond Hettinger approach seems pretty good:

def tail_F(some_file):
    first_call = True
    while True:
        try:
            with open(some_file) as input:
                if first_call:
                    input.seek(0, 2)
                    first_call = False
                latest_data = input.read()
                while True:
                    if '\n' not in latest_data:
                        latest_data += input.read()
                        if '\n' not in latest_data:
                            yield ''
                            if not os.path.isfile(some_file):
                                break
                            continue
                    latest_lines = latest_data.split('\n')
                    if latest_data[-1] != '\n':
                        latest_data = latest_lines[-1]
                    else:
                        latest_data = input.read()
                    for line in latest_lines[:-1]:
                        yield line + '\n'
        except IOError:
            yield ''

This generator will return '' if the file becomes inaccessible or if there is no new data.

[update]

The second to last answer circles around to the top of the file it seems whenever it runs out of data. – Eli

I think the second will output the last ten lines whenever the tail process ends, which with -f is whenever there is an I/O error. The tail --follow --retry behavior is not far from this for most cases I can think of in unix-like environments.

Perhaps if you update your question to explain what is your real goal (the reason why you want to mimic tail --retry), you will get a better answer.

The last answer does not actually follow the tail and merely reads what's available at run time. – Eli

Of course, tail will display the last 10 lines by default... You can position the file pointer at the end of the file using file.seek, I will left a proper implementation as an exercise to the reader.

IMHO the file.read() approach is far more elegant than a subprocess based solution.

Jablonski answered 21/9, 2012 at 1:14 Comment(10)
This works if you replace -f with -F, but in Python it would be locking. I'd be more interested in having a function I could call to get new data when I want it, if that's possible.Madancy
I think a container generator placing the tail call inside a while True loop and catching eventual I/O exceptions will have the same effect of -F.Jablonski
The second to last answer circles around to the top of the file it seems whenever it runs out of data. The last answer does not actually follow the tail and merely reads what's available at run time.Madancy
@Eli: a seek(0, 2) will move the file pointer to the end of the file.Jablonski
Just curious: what, to you, seems more elegant about a file.read() approach? tail properly handles showing the last 10 lines of the file (even if the lines are huge), reading new lines forever, waking up when new lines arrive (in a platform-dependent fashion), and opening new files when needed. In a word, the utility is quite well-designed for what it is meant to do -- reimplementing it does not seem nearly as elegant. (I will, however, admit that the sh module is pretty nifty.)Smutch
@nneonneo: 1) I try to avoid subprocess as much as I can; 2) not depending on tail being available in the deployment environment/platform is a plus; 3) I can't change the external tail command behavior as easily as I can change my own code; 4) I agree with all your points, yet the file.open based solution seems to be compact enough that I don't feel like reinventing the wheel.Jablonski
The sh module looks kind of cool, but I don't see what benefits you get over the subprocess module, especially because the subprocess module comes standard with python and sh does not.Picard
@Matt: the sh module is nothing but a fancy layer above subprocess, and I dislike both sh and raw subprocess (my first answer isn't my favorite); After some thought, I now believe that if the OP want to get rid of the external command "tail", he should go the file.read() route, otherwise, he should make his program read from stdin and tail -F | his_application. If there are several logfiles to be processed, having one instance for each file is simpler and safer than having a single process with several threads - there is no good reason to complicate the design adding several complex modulesJablonski
In the Hettinger approach, why are you using read() and searching for \n? Why not use readline()? readline() seems to me to be a more platform independent way of getting each line.Tindall
@BillR: I assumed (in wrong) that readline would block until a newline, but you are right, readline is the right method for this job.Jablonski
L
40

Purely pythonic solution using non-blocking readline()

I am adapting Ijaz Ahmad Khan's answer to only yield lines when they are completely written (lines end with a newline char) gives a pythonic solution with no external dependencies:

import time
from typing import Iterator

def follow(file, sleep_sec=0.1) -> Iterator[str]:
    """ Yield each line from a file as they are written.
    `sleep_sec` is the time to sleep after empty reads. """
    line = ''
    while True:
        tmp = file.readline()
        if tmp is not None and tmp != "":
            line += tmp
            if line.endswith("\n"):
                yield line
                line = ''
        elif sleep_sec:
            time.sleep(sleep_sec)


if __name__ == '__main__':
    with open("test.txt", 'r') as file:
        for line in follow(file):
            print(line, end='')
Laborsaving answered 19/1, 2019 at 0:55 Comment(8)
Not only is Iljaz Ahmad's and this solution more Pythonic, but it also prevents spawning a new process, which saves resources and might scale better depending on the situation.Logy
@Logy This answer is indeed far better than any of the ones before it that suggest spawning an instance of tail -f. Upvoted.Threepence
'else if' is not Python - edited & agree this is better than sh and tailFresh
I got the error NameError: name 'Iterator' is not defined why?Dolce
Likely 'Iterator' is not valid in your version of python as a type hint. Don't worry, it's completely unnecessary. just delete -> Iterator[str] and the function ought to workPuffer
Too much load for busy files, this answer is better: https://mcmap.net/q/204604/-how-can-i-tail-a-log-file-in-pythonNullity
High load when file is idle, but easily fixed by changing if tmp is not None to if tmp != "".Cletuscleve
Thanks @Cletuscleve I incorporated that edit.Laborsaving
S
31

The only portable way to tail -f a file appears to be, in fact, to read from it and retry (after a sleep) if the read returns 0. The tail utilities on various platforms use platform-specific tricks (e.g. kqueue on BSD) to efficiently tail a file forever without needing sleep.

Therefore, implementing a good tail -f purely in Python is probably not a good idea, since you would have to use the least-common-denominator implementation (without resorting to platform-specific hacks). Using a simple subprocess to open tail -f and iterating through the lines in a separate thread, you can easily implement a non-blocking tail operation in Python.

Example implementation:

import threading, Queue, subprocess
tailq = Queue.Queue(maxsize=10) # buffer at most 100 lines

def tail_forever(fn):
    p = subprocess.Popen(["tail", "-f", fn], stdout=subprocess.PIPE)
    while 1:
        line = p.stdout.readline()
        tailq.put(line)
        if not line:
            break

threading.Thread(target=tail_forever, args=(fn,)).start()

print tailq.get() # blocks
print tailq.get_nowait() # throws Queue.Empty if there are no lines to read
Smutch answered 21/9, 2012 at 1:59 Comment(6)
If the OP main concern is not getting rid of the dependency on the external command (tail), he should follow the unix tradition of writing the log processor application to read from stdin and piping tail -F into it. I can't see why adding the complexity of threading, Queue and subprocess will result in any advantage over the traditional approach.Jablonski
When did he say he was writing a log processor?Smutch
English is not my native idiom but I guess it can be inferred from the question title (How can I tail a log file in Python?).Jablonski
Do you know how tail -F works in Linux efficiently? Does it use sleep or a more efficient event system?Hohenlinden
tail uses a combination of inotify and select on Linux; see the source code: github.com/coreutils/coreutils/blob/master/src/tail.c#L1453Smutch
I found this particularly useful. the log file in my case was on a NAS running freeBSD, so I used ssh and let the bsd kernel make tail -f efficient: Which it was! Could see the lines come in as they arrived at the NAS. The need was for low latency, and this worked. (The writes were coming from a VM writing a virtual com port to a raw file on the NAS, logging data from a scientific instrument).Tupelo
D
18

All the answers that use tail -f are not pythonic.

Here is the pythonic way: ( using no external tool or library)

def follow(thefile):
     while True:
        line = thefile.readline()
        if not line or not line.endswith('\n'):
            time.sleep(0.1)
            continue
        yield line



if __name__ == '__main__':
    logfile = open("run/foo/access-log","r")
    loglines = follow(logfile)
    for line in loglines:
        print(line, end='')
Decumbent answered 2/11, 2018 at 15:10 Comment(4)
If a log file is appended in 2 syscalls, this way of "following" the file will sometimes return 2 parts of the line, instead of the full line itselfEssene
I've posted an answer to address the bug @Essene pointed out: https://mcmap.net/q/204604/-how-can-i-tail-a-log-file-in-pythonLaborsaving
Consider another python program is writing to this file using a writer, Is there any way we could stop this operation programmatically when the writer stops writing?Endow
yes , you can use a mechanisim like locking to aquire the lock before writing to it and release it when doneDecumbent
M
15

So, this is coming quite late, but I ran into the same problem again, and there's a much better solution now. Just use pygtail:

Pygtail reads log file lines that have not been read. It will even handle log files that have been rotated. Based on logcheck's logtail2 (http://logcheck.org)

Madancy answered 4/8, 2015 at 1:56 Comment(1)
Please notice that it doesn't quite behave like tail, but it may be useful, depending on what one wants to do.Trifid
W
13

Ideally, I'd have something like tail.getNewData() that I could call every time I wanted more data

We've already got one and itsa very nice. Just call f.read() whenever you want more data. It will start reading where the previous read left off and it will read through the end of the data stream:

f = open('somefile.log')
p = 0
while True:
    f.seek(p)
    latest_data = f.read()
    p = f.tell()
    if latest_data:
        print latest_data
        print str(p).center(10).center(80, '=')

For reading line-by-line, use f.readline(). Sometimes, the file being read will end with a partially read line. Handle that case with f.tell() finding the current file position and using f.seek() for moving the file pointer back to the beginning of the incomplete line. See this ActiveState recipe for working code.

Watchful answered 21/9, 2012 at 2:16 Comment(6)
The point was I wanted to follow the file. If I open a file, f.read() only goes until the end of what the file was at run time. It won't read anything new added after that.Madancy
I tested it out before posting. I just did: blah = open('some_file', r) while 1: sleep(1) print blah.read() And tried writing to the file. No luck.Madancy
@Eli: you should be in Windows, then. This is important information missing from your question.Jablonski
@Paulo: That's important information missing from the answer. If no operating system is specified, you build something that works generally, or at least something that works for *nix. You never assume Windows.Madancy
Why never assume windows? python is closer to windows than than nix, eg: UTF-16 vs UTF-8Michaud
Can you please further explain print in you what what this does: str(p).center(10).center(80, '=')Seersucker
T
7

You could use the 'tailer' library: https://pypi.python.org/pypi/tailer/

It has an option to get the last few lines:

# Get the last 3 lines of the file
tailer.tail(open('test.txt'), 3)
# ['Line 9', 'Line 10', 'Line 11']

And it can also follow a file:

# Follow the file as it grows
for line in tailer.follow(open('test.txt')):
    print line

If one wants tail-like behaviour, that one seems to be a good option.

Trifid answered 2/2, 2016 at 10:12 Comment(3)
It didn't follow() the same file after it's removed / recreated, so didn't work for me :/Regular
@JoseAlban it's just not the library's responsibility to watch for file deletion/creation, use pypi module make-all-the-things-work-by-themselves insteadPhilippe
@Kentzo 's answer covers that oversight: https://mcmap.net/q/204604/-how-can-i-tail-a-log-file-in-pythonTrifid
O
5

Another option is the tailhead library that provides both Python versions of of tail and head utilities and API that can be used in your own module.

Originally based on the tailer module, its main advantage is the ability to follow files by path i.e. it can handle situation when file is recreated. Besides, it has some bug fixes for various edge cases.

Ouzo answered 23/2, 2016 at 6:59 Comment(0)
A
1

Python is "batteries included" - it has a nice solution for it: https://pypi.python.org/pypi/pygtail

Reads log file lines that have not been read. Remembers where it finished last time, and continues from there.

import sys
from pygtail import Pygtail

for line in Pygtail("some.log"):
    sys.stdout.write(line)
Armindaarming answered 22/3, 2017 at 15:40 Comment(5)
Having to install a package to get a functionality is quite the opposite of "batteries included".Shipwreck
well, not all packages are installed by default, fortunately. But you don't need to write (and debug and maintain) any tricky code using subprocess, as answers with much higher karma suggest.Armindaarming
@Madancy - yes, pygtail is mentioned in your answer but has no example how easy it is to use. And BTW I upvoted your answer, so please don't be too upset :-)Armindaarming
how to use --full-lines option in your Pygtail exampleHouseline
pygtail does not appear to have been updated since 2015 and is still reported as being in Beta. The whole point about "batteries included" is that it is all the stuff in the standard library that is maintained, documented and can be relied upon.Incompliant
R
0

If you are on linux you implement a non-blocking implementation in python in the following way.

import subprocess
subprocess.call('xterm -title log -hold -e \"tail -f filename\"&', shell=True, executable='/bin/csh')
print "Done"
Rollet answered 8/10, 2015 at 6:46 Comment(1)
On Linux, with X running, and csh installed. That's a LOT of unnecessary dependencies!Lettyletup
R
0

A simple tail function from pypi app tailread

You Can use it also via pip install tailread

Recommended for tail access of large files.

from io import BufferedReader


def readlines(bytesio, batch_size=1024, keepends=True, **encoding_kwargs):
    '''bytesio: file path or BufferedReader
       batch_size: size to be processed
    '''
    path = None
    
    if isinstance(bytesio, str):
        path = bytesio
        bytesio = open(path, 'rb')
    elif not isinstance(bytesio, BufferedReader):
        raise TypeError('The first argument to readlines must be a file path or a BufferedReader')

    bytesio.seek(0, 2)
    end = bytesio.tell()

    buf = b""
    for p in reversed(range(0, end, batch_size)):
        bytesio.seek(p)
        lines = []
        remain = min(end-p, batch_size)
        while remain > 0:
            line = bytesio.readline()[:remain]
            lines.append(line)
            remain -= len(line)

        cut, *parsed = lines
        for line in reversed(parsed):
            if buf:
                line += buf
                buf = b""
            if encoding_kwargs:
                line = line.decode(**encoding_kwargs)
            yield from reversed(line.splitlines(keepends))
        buf = cut + buf
    
    if path:
        bytesio.close()

    if encoding_kwargs:
        buf = buf.decode(**encoding_kwargs)
    yield from reversed(buf.splitlines(keepends))


for line in readlines('access.log', encoding='utf-8', errors='replace'):
    print(line)
    if 'line 8' in line:
        break

# line 11
# line 10
# line 9
# line 8

Reproach answered 8/6, 2022 at 4:1 Comment(0)
L
-1

You can also use 'AWK' command.
See more at: http://www.unix.com/shell-programming-scripting/41734-how-print-specific-lines-awk.html
awk can be used to tail last line, last few lines or any line in a file.
This can be called from python.

Lazaretto answered 8/2, 2014 at 17:59 Comment(0)
S
-2
# -*- coding:utf-8 -*-
import sys
import time


class Tail():
    def __init__(self, file_name, callback=sys.stdout.write):
        self.file_name = file_name
        self.callback = callback

    def follow(self, n=10):
        try:
            # 打开文件
            with open(self.file_name, 'r', encoding='UTF-8') as f:
            # with open(self.file_name,'rb') as f:
                self._file = f
                self._file.seek(0, 2)
                # 存储文件的字符长度
                self.file_length = self._file.tell()
                # 打印最后10行
                self.showLastLine(n)
                # 持续读文件 打印增量
                while True:
                    line = self._file.readline()
                    if line:
                        self.callback(line)
                    time.sleep(1)
        except Exception as e:
            print('打开文件失败,囧,看看文件是不是不存在,或者权限有问题')
            print(e)

    def showLastLine(self, n):
        # 一行大概100个吧 这个数改成1或者1000都行
        len_line = 100
        # n默认是10,也可以follow的参数传进来
        read_len = len_line * n
        # 用last_lines存储最后要处理的内容
        while True:
            # 如果要读取的1000个字符,大于之前存储的文件长度
            # 读完文件,直接break
            if read_len > self.file_length:
                self._file.seek(0)
                last_lines = self._file.read().split('\n')[-n:]
                break
            # 先读1000个 然后判断1000个字符里换行符的数量
            self._file.seek(-read_len, 2)
            last_words = self._file.read(read_len)
            # count是换行符的数量
            count = last_words.count('\n')

            if count >= n:
                # 换行符数量大于10 很好处理,直接读取
                last_lines = last_words.split('\n')[-n:]
                break
            # 换行符不够10个
            else:
                # break
                # 不够十行
                # 如果一个换行符也没有,那么我们就认为一行大概是100个
                if count == 0:

                    len_perline = read_len
                # 如果有4个换行符,我们认为每行大概有250个字符
                else:
                    len_perline = read_len / count
                # 要读取的长度变为2500,继续重新判断
                read_len = len_perline * n
        for line in last_lines:
            self.callback(line + '\n')


if __name__ == '__main__':
    py_tail = Tail('test.txt')
    py_tail.follow(1)
Strict answered 17/9, 2021 at 7:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.