Read streaming input from subprocess.communicate()

M

7

99

I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.

How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?

subprocess.communicate() appears to give all the output at once.

Milagro answered 26/4, 2010 at 18:23 Comment(1)

related: Getting realtime output using subprocess – Chartreuse 16/10, 2014 at 20:11

C

49

Please note, I think J.F. Sebastian's method (below) is better.

Here is an simple example (with no checking for errors):

import subprocess
proc = subprocess.Popen('ls',
                       shell=True,
                       stdout=subprocess.PIPE,
                       )
while proc.poll() is None:
    output = proc.stdout.readline()
    print output,

If ls ends too fast, then the while loop may end before you've read all the data.

You can catch the remainder in stdout this way:

output = proc.communicate()[0]
print output,

Conservatory answered 26/4, 2010 at 18:54 Comment(5)

does this scheme fall victim to the buffer blocking problem that the python doc refers to? – Milagro 26/4, 2010 at 19:22

@Heinrich, the buffer blocking problem is not something I understand well. I believe (just from googling around) that this problem only occurs if you don't read from stdout (and stderr?) inside the while loop. So I think the above code is okay, but I can't say for sure. – Conservatory 26/4, 2010 at 19:44

This actually does suffer from a blocking problem, a few years ago I had no end to the trouble where readline would block 'til it got a newline even if the proc had ended. I don't remember the solution, but I think it had something to do with doing the reads on a worker thread and just looping while proc.poll() is None: time.sleep(0) or something to that effect. Basically- you need to either ensure that the output newline is the last thing that the process does (because you can't give the interpreter time to loop again) or you need to do something "fancy." – Kline 26/4, 2010 at 20:5

@Heinrich: Alex Martelli writes about how to avoid the deadlock here: #1446127 – Conservatory 26/4, 2010 at 20:47

The buffer blocking is simpler than it sometimes sounds: parent blocks waiting for child to exit + child blocks waiting for parent to read and free some space in the communication pipe which is full = deadlock. It is that simple. The smaller the pipe the more likely to happen. – Howell 28/3, 2013 at 14:13

C

183

To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:

#!/usr/bin/env python2
from subprocess import Popen, PIPE

p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
    for line in iter(p.stdout.readline, b''):
        print line,
p.wait() # wait for the subprocess to exit

iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.

If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?

Here's Python 3 code:

#!/usr/bin/env python3
from subprocess import Popen, PIPE

with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
           universal_newlines=True) as p:
    for line in p.stdout:
        print(line, end='')

Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).

Chartreuse answered 17/7, 2013 at 11:15 Comment(23)

what does the b'' mean? – Idette 7/4, 2014 at 18:58

b'' is a bytes literal in Python 2.7 and Python 3. – Chartreuse 7/4, 2014 at 18:59

bufsize=1 is the key! At least it works when cmd is fastboot. – Glim 3/8, 2014 at 20:50

@JinghaoShi: bufsize=1 should have no effect other than performance on Python 2 where bufsize=0 by default. – Chartreuse 3/8, 2014 at 21:6

@J.F.Sebastian: Yeah, I was also confused. Yet it did make a difference in my case. – Glim 4/8, 2014 at 20:15

@JinghaoShi: bufsize=1 may make a difference if you also write (using p.stdin) to the subprocess e.g., it can help to avoid a deadlock while doing an interactive (pexpect-like) exchange -- assuming there are no buffering issues in child process itself. If you are only reading then as I said the difference is only in performance: if it is not so then could you provide a minimal complete code example that shows it? – Chartreuse 5/8, 2014 at 1:22

@J.F.Sebastian: My code is the same with what you have in the answer. The sub process I was running is Android fastboot tool to flash a phone. And I didn't write to p.stdin during the process. Before setting bufisze to 1, I can only see the output after the flashing finishes (~2 min). – Glim 5/8, 2014 at 16:56

This prints all of the read lines with a b'...' around them. – Roentgenoscope 16/4, 2015 at 16:3

@NateGlenn: no, it won't. Notice: it uses Python 2 syntax: print line,. You are likely to see b'' in Python 3. I've added Python 3 code. – Chartreuse 16/4, 2015 at 21:14

Thanks. I'm writing an ST3 plugin and don't use Python normally. – Roentgenoscope 17/4, 2015 at 2:29

I am also having same issue where my prints from subprocess are buffered till the end of the subprocess. I tired to use your way of getting prints on the terminal as they are encountered but it is still not resolving the issue and still doing the buffer. My post is at #31321914 – Passel 9/7, 2015 at 16:14

@user2966197: -u works for a python subprocess. In general, I recommend that you read the links in the paragraph above that starts with "If subprocess' stdout uses a block buffering.." – Chartreuse 9/7, 2015 at 16:24

@J.F.Sebastian I am using python 3.4, bufsize=1 really helps when using subprocess.Popen().communicate() as it essentially tells the process to flush stdout / stderr as soon as there is 1 byte of data. default is -1 – Meteor 1/4, 2016 at 22:33

@Meteor it is a wrong assumption. If bufsize had any effect on the buffering inside the child process then I wouldn't need to mention stdbuf, pty, etc. If you are using .communicate() then bufsize should not have any effect on the parent too (unless its implementation is defective). – Chartreuse 1/4, 2016 at 22:36

@J.F.Sebastian so is the idea that we put all the code that we want to execute after the subprocess is finished inside the with context manager (after the for loop) and if so, how do we get something like the return code? (as neither wait nor communicate are being called) – Frosting 1/6, 2016 at 21:53

@Frosting why do you think it is necessary? Put the code after the with-statement, to make sure the process is reaped. – Chartreuse 1/6, 2016 at 22:2

could you listen in for stderr at the same time? – Milldam 30/6, 2016 at 16:46

@ealeon: yes. It requires techniques that can read stdout/stderr concurrently unless you merge stderr into stdout (by passing stderr=subprocess.STDOUT to Popen()). See also, threading or asyncio solutions linked there. – Chartreuse 30/6, 2016 at 16:59

@J.F.Sebastian In python 3.5.2, I see exactly the same behavior when I replace the loop starting for line in p.stdout: by pass. Is there some way to actually process the lines? Suppose I only want to print some of the lines, for example? – Knitter 27/2, 2017 at 19:31

@Knitter if stdout=PIPE doesn't capture the output (you still see it on the screen) then your program might print to stderr or directly to the terminal instead. To merge stdout&stderr, pass stderr=subprocess.STDOUT (see my previous comment). To capture output printed directly to your tty, you could use pexpect, pty solutions.. Here's a more complex code example. – Chartreuse 27/2, 2017 at 21:9

with Popen("./Portability.py", stdout=PIPE, stderr=STDOUT, bufsize=1, universal_newlines=True) as p, \ open('Portability.log', 'ab') as file: for line in p.stdout: # b'\n'-separated lines print(line, end='') #new addition sys.stdout.buffer.write(line) # pass bytes as is file.write(line) ERROR IS : Traceback (most recent call last): File "./Portability_Tests.py", line 67, in <module> sys.stdout.buffer.write(line) # pass bytes as is TypeError: a bytes-like object is required, not 'str' – Dungeon 26/12, 2017 at 7:11

This was driving me nuts. If you are calling python in the subprocess, you need to pay attention to jfs's warning about buffering. To turn off buffering of the python subprocess, just pass env={"PYTHONUNBUFFERED": "1"} to subprocess.Popen. As it stands, jfs's Python 3 code will not work for a python subprocess. It won't crash, but you won't get the lines as soon as they are emitted. – Elielia 22/12, 2023 at 21:55

@leducvin: you can use -u flag if you control the command line docs.python.org/3/using/cmdline.html#cmdoption-u – Chartreuse 23/12, 2023 at 6:57

C

49

Please note, I think J.F. Sebastian's method (below) is better.

Here is an simple example (with no checking for errors):

import subprocess
proc = subprocess.Popen('ls',
                       shell=True,
                       stdout=subprocess.PIPE,
                       )
while proc.poll() is None:
    output = proc.stdout.readline()
    print output,

If ls ends too fast, then the while loop may end before you've read all the data.

You can catch the remainder in stdout this way:

output = proc.communicate()[0]
print output,

Conservatory answered 26/4, 2010 at 18:54 Comment(5)

does this scheme fall victim to the buffer blocking problem that the python doc refers to? – Milagro 26/4, 2010 at 19:22

@Heinrich, the buffer blocking problem is not something I understand well. I believe (just from googling around) that this problem only occurs if you don't read from stdout (and stderr?) inside the while loop. So I think the above code is okay, but I can't say for sure. – Conservatory 26/4, 2010 at 19:44

This actually does suffer from a blocking problem, a few years ago I had no end to the trouble where readline would block 'til it got a newline even if the proc had ended. I don't remember the solution, but I think it had something to do with doing the reads on a worker thread and just looping while proc.poll() is None: time.sleep(0) or something to that effect. Basically- you need to either ensure that the output newline is the last thing that the process does (because you can't give the interpreter time to loop again) or you need to do something "fancy." – Kline 26/4, 2010 at 20:5

@Heinrich: Alex Martelli writes about how to avoid the deadlock here: #1446127 – Conservatory 26/4, 2010 at 20:47

The buffer blocking is simpler than it sometimes sounds: parent blocks waiting for child to exit + child blocks waiting for parent to read and free some space in the communication pipe which is full = deadlock. It is that simple. The smaller the pipe the more likely to happen. – Howell 28/3, 2013 at 14:13

C

6

I believe the simplest way to collect output from a process in a streaming fashion is like this:

import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
    data = proc.stdout.readline()   # Alternatively proc.stdout.read(1024)
    if len(data) == 0:
        break
    sys.stdout.write(data)   # sys.stdout.buffer.write(data) on Python 3.x

The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.

On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.

Crockery answered 25/4, 2013 at 4:39 Comment(3)

data = proc.stdout.read() blocks until all data is read. You might be confusing it with os.read(fd, maxsize) that can return earlier (as soon as any data is available). – Chartreuse 22/8, 2013 at 9:15

You're correct, I was mistaken. However if a reasonable number of bytes is passed as an argument to read() then it works fine, and likewise readline() works fine as long as the maximum line length is reasonable. Updated my answer accordingly. – Crockery 22/8, 2013 at 23:46

The shell=True is obviously useless here and should be taken out, though that requires you to pass the command as a list. See also Actual meaning of shell=True in subprocess – Nianiabi 15/2, 2022 at 8:22

S

3

If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().

Shrapnel answered 26/4, 2010 at 18:29 Comment(1)

non-blocking approach is not straightforward – Chartreuse 23/9, 2015 at 14:45

M

3

If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:

import subprocess

# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])

See the docs for subprocess.check_call().

If you need to process the output, sure, loop on it. But if you don't, just keep it simple.

Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).

Mucoid answered 22/9, 2015 at 15:34 Comment(6)

It won't work if sys.stdout or sys.stderr are replaced with file-like objects that have no real fileno(). If sys.stdout, sys.stderr are not replaced then it is even simpler: subprocess.check_call(args). – Chartreuse 22/9, 2015 at 18:47

Thanks! I'd realized the vagaries of replacing sys.stdout/stderr, but somehow never realized that if you omit the arguments, it passes stdout and stderr to the right places. I like call() over check_call() unless I want the CalledProcessError. – Mucoid 23/9, 2015 at 14:41

python -mthis: "Errors should never pass silently. Unless explicitly silenced." that is why the example code should prefer check_call() over call(). – Chartreuse 23/9, 2015 at 14:42

Heh. A lot of the programs I wind up call()ing return nonzero error codes in non-error conditions, because they are terrible. So on our case, a nonzero error code is not actually an error. – Mucoid 23/9, 2015 at 14:45

yes. There are programs such as grep that may return non-zero exit status even if there is no error -- they are exceptions. By default zero exit status indicates success. – Chartreuse 23/9, 2015 at 14:48

Sure. Anyhow, now the example code uses check_call() and explains when you might want call() instead. – Mucoid 23/9, 2015 at 14:52

A

1

myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r     equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:    
    print(p.stderr.readline().rstrip('\r\n'))

Antibiosis answered 12/11, 2017 at 23:22 Comment(2)

it is always good to explain what your solution does just to make people understand better – Requite 12/11, 2017 at 23:44

You should consider using shlex.split(myCommand) instead of myCommand.split(). It honors spaces in quoted arguments, as well. – Crowell 17/9, 2018 at 1:32

D

1

Adding another python3 solution with a few small changes:

Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time

import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
    # Run a shell command, streaming output to STDOUT in real time
    # Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
    for line in p.stdout:
        sys.stdout.write(line)
    p.wait()
    exit_code = p.returncode
    if exit_code != 0 and fail_on_error:
        raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
    return(exit_code)

Der answered 15/10, 2020 at 23:5 Comment(0)

Recommended topics

Hot tags