unbuffered read from stdin in python

Asked 23/10, 2015 at 14:41 Answered 27/11, 2020 at 10:3

I'm writing a python script that can read input through a pipe from another command like so

batch_job | myparser

My script myparser processes the output of batch_job and write to its own stdout. My problem is that I want to see the output immediately (the output of batch_job is processed line-by-line) but there appears to be this notorious stdin buffering (allegedly 4KB, I haven't verified) which delays everything.

The problem has been discussed already here here and here.

I tried the following:

open stdin using os.fdopen(sys.stdin.fileno(), 'r', 0)
using -u in my hashbang: #!/usr/bin/python -u
setting export PYTHONUNBUFFERED=1 right before calling the script
flushing my output after each line that was read (just in case the problem was coming from output buffering rather than input buffering)

My python version is 2.4.3 - I have no possibility of upgrading or installing any additional programs or packages. How can I get rid of these delays?

Piracy answered 23/10, 2015 at 14:41 Comment(8)

Are you sure the buffering is happening in Python, on stdin, and not on the batch job's stdout? Sometimes applications check the device type of stdout, and base their buffering on what it is, so just because it might appear to be line buffering when writing to a terminal doesn't mean it will do the same when piped to another process. – Ait 23/10, 2015 at 14:48

That's an interesting suggestion. I will try to verify. What I can say is that the application is itself shell script. – Piracy 23/10, 2015 at 22:27

It also creates a log file with identical content to what's normally written to the terminal. I observe that this log file is updated faster i.e. it will already contain the lines that my script is still waiting for. – Piracy 23/10, 2015 at 22:42

Possible duplicate of Setting smaller buffer size for sys.stdin? – Nils 3/12, 2015 at 15:9

@DenilsonSá : no I had looked at that question. The answer which was marked as the solution there is using the -u option, which as I explained, didn't work in my case. – Piracy 3/12, 2015 at 15:15

I do think it may be a dup of your second "here" #6034281 , which is incorrectly marked as a dup of yet another one: #3670823 . A good workaround is to use readline (however first use strace to confirm that the bad behavior is within python, rather than output buffering in your batch_job... it could be either or both!). See that other Q for more info. – Talky 5/2, 2016 at 3:5

Why don't you just launch the batch_job from within your myparser as a subprocess and then you get to fully control STDOUT/STDIN? The way you have it set up doesn't depend only on Python but also on shell buffering itself. – Viv 19/12, 2017 at 14:53

As already pointed out by others, it's more likely to be an output buffering of batch_job. Have you tried to run it with stdbuf -o0 -e0 as suggested in the question you've linked (unix.stackexchange.com/a/25378)? – Ironsmith 19/12, 2017 at 15:49

I've encountered the same issue with legacy code. It appears to be a problem with the implementation of Python 2's file object's __next__ method; it uses a Python level buffer (which -u/PYTHONUNBUFFERED=1 doesn't affect, because those only unbuffer the stdio FILE*s themselves, but file.__next__'s buffering isn't related; similarly, stdbuf/unbuffer can't change any of the buffering at all, because Python replaces the default buffer made by the C runtime; the last thing file.__init__ does for a newly opened file is call PyFile_SetBufSize which uses setvbuf/setbuf [the APIs] to replace the default stdio buffer).

The problem is seen when you have a loop of the form:

for line in sys.stdin:

where the first call to __next__ (called implicitly by the for loop to get each line) ends up blocking to fill the block before producing a single line.

There are three possible fixes:

(Only on Python 2.6+) Rewrap sys.stdin with the io module (backported from Python 3 as a built-in) to bypass file entirely in favor of the (frankly superior) Python 3 design (which uses a single system call at a time to populate the buffer without blocking for the full requested read to occur; if it asks for 4096 bytes and gets 3, it'll see if a line is available and produce it if so) so:
```
import io
import sys

# Add buffering=0 argument if you won't always consume stdin completely, so you 
# can't lose data in the wrapper's buffer. It'll be slower with buffering=0 though.
with io.open(sys.stdin.fileno(), 'rb', closefd=False) as stdin:
    for line in stdin:
        # Do stuff with the line
```
This will typically be faster than option 2, but it's more verbose, and requires Python 2.6+. It also allows for the rewrap to be Unicode friendly, by changing the mode to 'r' and optionally passing the known encoding of the input (if it's not the locale default) to seamlessly get unicode lines instead of (ASCII only) str.
(Any version of Python) Work around problems with file.__next__ by using file.readline instead; despite nearly identical intended behavior, readline doesn't do its own (over)buffering, it delegates to C stdio's fgets (default build settings) or a manual loop calling getc/getc_unlocked into a buffer that stops exactly when it hits end of line. By combining it with two-arg iter you can get nearly identical code without excess verbosity (it'll probably be slower than the prior solution, depending on whether fgets is used under the hood, and how the C runtime implements it):
```
# '' is the sentinel that ends the loop; readline returns '' at EOF
for line in iter(sys.stdin.readline, ''):
    # Do stuff with line
```
Move to Python 3, which doesn't have this problem. :-)

February answered 20/11, 2020 at 18:10 Comment(1)

Note: Obviously, if batch_job has buffered output, you need to unbuffer it or make sure it does manual flushes so there is anything for the Python program to see. But I've definitely seen cases where the prior process was definitely unbuffered, and Python 2's for line in sys.stdin: is responsible for the buffering (where non-Python 2 programs subbed in to the pipeline, using raw I/O or plain C stdio, don't have the problem). – February 20/11, 2020 at 18:16

In Linux, bash, what you are looking for seems to be the stdbuf command.

If you want no buffering (i.e. an unbuffered stream), try this,

# batch_job | stdbuf -o0 myparser

If you want line buffering, try this,

# batch_job | stdbuf -oL myparser

Supinator answered 18/9, 2018 at 15:26 Comment(5)

This won't help. The problem isn't output buffering by Python (if it was, the -u flag or doing export PYTHONUNBUFFERED=1 before calling the script would fix it; stdbuf [the command line tool] doesn't work on programs that modify the default stdio buffering with setvbuf/setbuf [the APIs] in any event, and Python can and does do this), it Python buffering the input. And the buffering on the input is done in a Python user mode buffer that stdbuf (the command line tool) can't affect. – February 20/11, 2020 at 18:13

@February Well, it actually works. I tested this by feeding data between two python programs, with and without stdbu -o0, and the difference is very clear. So, that is the fact. And it is unfair of you to down vote based on your speculations, and without trying it. – Supinator 22/11, 2020 at 14:27

It may work in some scenarios, but not on Python 2.x in the scenarios where -u/PYTHONUNBUFFERED=1 doesn't help already. You're likely being fooled by a test case that isn't the same as the OP's (e.g. in your case, your input pipe was also Python; the OP's only had it for the output from the pipe). Simple bash one-liner example that does not work:

(for ((i = 0; i < 10; ++i)); do echo $i && sleep 1; done) | stdbuf -o0 python2 -c 'for line in __import__("sys").stdin: print line,'

; you get no output for 10 seconds. Cause is the buffering in file.__next__, which stdbuf doesn't affect. – February 23/11, 2020 at 0:37

Replace __import__("sys").stdin with iter(__import__("sys").stdin.readline, "") and you'll get an output every second. If you can show me a single example where stdbuf on the right side of the pipe solves problems not solved by the various things the OP tried, I'll happily convert my downvote to an upvote. But I don't think such scenarios exist (as stdbuf's man page notes "If COMMAND adjusts the buffering of its standard streams ('tee' does for e.g.) then that will override corresponding settings changed by 'stdbuf'."; Python 2 does that). – February 23/11, 2020 at 0:46

@February The input was from a C-program, and yes it was python2. It was a deployed imaging system, MIMO array from C, to image display and AI in python. – Supinator 24/11, 2020 at 2:13

You can unbuffer the output:

unbuffer batch_job | myparser

Tidings answered 27/11, 2020 at 10:3 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags