How to avoid Python fileinput buffering [duplicate]
Asked Answered
C

2

11

Possible Duplicate:
Setting smaller buffer size for sys.stdin?

I have a Python (2.4/2.7) script using fileinput to read from standard input or from files. It's easy to use, and works well except for one case:

tail -f log | filter.py

The problem is that my script buffers its input, whereas (at least in this case) I want to see its output right away. This seems to stem from the fact that fileinput uses readlines() to grab up to its bufsize worth of bytes before it does anything. I tried using a bufsize of 1 and it didn't seem to help (which was somewhat surprising).

I did find that I can write code like this which does not buffer:

while 1:
    line = sys.stdin.readline()
    if not line: break
    sys.stdout.write(line)

The problem with doing it this way is that I lose the fileinput functionality (namely that it automatically opens all the files passed to my program, or stdin if none, and it can even decompress input files automatically).

So how can I have the best of both? Ideally something where I don't need to explicitly manage my input file list (including decompression), and yet which doesn't delay input when used in a "streaming" way.

Child answered 17/5, 2011 at 16:4 Comment(4)
close the stdin filehandle and reopen it with buffering = 0 (i haven't tried it, so Im not going to post it as an answer)Estrange
#3670823Billon
You might be mischaracterizing the situation somewhat by saying fileinput uses readlines(). By default, readlines() doesn't return til it hits EOF, whereas 'for line in fileinput.input():' and 'for line in sys.stdin:' will eventually return something when they get enough characters buffered. You could be right that fileinput uses readlines() internally, though, if it passes a bufsize argument.Lodger
I just filed bug report bugs.python.org/issue26290 "fileinput and 'for line in sys.stdin' do strange mockery of input buffering" which includes the behavior you've observed. Summary: fileinput is broken in both 2.7 and 3.4, "for line in sys.stdin:" is broken in 2.7 but fixed in 3.4, readline works properly in both 2.7 and 3.4.Lodger
A
3

Try running python -u; man says that it will "force stdin, stdout and stderr to be totally unbuffered".

You can just alter the hashbang path at the first line of filter.py.

Anglofrench answered 17/5, 2011 at 16:18 Comment(4)
Note that there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in sys.stdin") which is not influenced by this option.Estrange
Yeah for the reason tMC stated, this doesn't work. I did try it though.Child
Then don't use line-based I/O. Use plain stdin.read().Anglofrench
readline() (singular) works just fine. It's only readlines() (plural) that does the buffering I don't want. I imagine raw read() would work too, but it's not necessary in this case.Child
P
0

Have you tried:

def hook_nobuf(filename, mode):
    return open(filename, mode, 0)

fi = fileinput.FileInput(openhook=hook_nobuf)

Not tested it, but from reading what openhook param does and what passing 0 to open for bufsize param, this should do the trick.

Performing answered 17/5, 2011 at 16:14 Comment(3)
This has no effect. Again the problem seems to be that fileinput uses the readlines() method and buffers internally.Child
Well, I think that's your answer then. Either don't use fileinput, or starting with fileinput.py as a base, rewrite it to not buffer internally. Looking at the code, there doesn't seem to be any way to make it not do at least SOME buffering just by passing parameters to it.Performing
I'm new to Python; it seems shocking that this use case is not well covered (it seems very natural to write text filters in Python after all, if it weren't for this).Child

© 2022 - 2024 — McMap. All rights reserved.