Read streaming input from subprocess.communicate()
Asked Answered
M

7

99

I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.

How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?

subprocess.communicate() appears to give all the output at once.

Milagro answered 26/4, 2010 at 18:23 Comment(1)
related: Getting realtime output using subprocessChartreuse
C
49

Please note, I think J.F. Sebastian's method (below) is better.


Here is an simple example (with no checking for errors):

import subprocess
proc = subprocess.Popen('ls',
                       shell=True,
                       stdout=subprocess.PIPE,
                       )
while proc.poll() is None:
    output = proc.stdout.readline()
    print output,

If ls ends too fast, then the while loop may end before you've read all the data.

You can catch the remainder in stdout this way:

output = proc.communicate()[0]
print output,
Conservatory answered 26/4, 2010 at 18:54 Comment(5)
does this scheme fall victim to the buffer blocking problem that the python doc refers to?Milagro
@Heinrich, the buffer blocking problem is not something I understand well. I believe (just from googling around) that this problem only occurs if you don't read from stdout (and stderr?) inside the while loop. So I think the above code is okay, but I can't say for sure.Conservatory
This actually does suffer from a blocking problem, a few years ago I had no end to the trouble where readline would block 'til it got a newline even if the proc had ended. I don't remember the solution, but I think it had something to do with doing the reads on a worker thread and just looping while proc.poll() is None: time.sleep(0) or something to that effect. Basically- you need to either ensure that the output newline is the last thing that the process does (because you can't give the interpreter time to loop again) or you need to do something "fancy."Kline
@Heinrich: Alex Martelli writes about how to avoid the deadlock here: #1446127Conservatory
The buffer blocking is simpler than it sometimes sounds: parent blocks waiting for child to exit + child blocks waiting for parent to read and free some space in the communication pipe which is full = deadlock. It is that simple. The smaller the pipe the more likely to happen.Howell
C
183

To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:

#!/usr/bin/env python2
from subprocess import Popen, PIPE

p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
    for line in iter(p.stdout.readline, b''):
        print line,
p.wait() # wait for the subprocess to exit

iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.

If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?


Here's Python 3 code:

#!/usr/bin/env python3
from subprocess import Popen, PIPE

with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
           universal_newlines=True) as p:
    for line in p.stdout:
        print(line, end='')

Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).

Chartreuse answered 17/7, 2013 at 11:15 Comment(23)
what does the b'' mean?Idette
b'' is a bytes literal in Python 2.7 and Python 3.Chartreuse
bufsize=1 is the key! At least it works when cmd is fastboot.Glim
@JinghaoShi: bufsize=1 should have no effect other than performance on Python 2 where bufsize=0 by default.Chartreuse
@J.F.Sebastian: Yeah, I was also confused. Yet it did make a difference in my case.Glim
@JinghaoShi: bufsize=1 may make a difference if you also write (using p.stdin) to the subprocess e.g., it can help to avoid a deadlock while doing an interactive (pexpect-like) exchange -- assuming there are no buffering issues in child process itself. If you are only reading then as I said the difference is only in performance: if it is not so then could you provide a minimal complete code example that shows it?Chartreuse
@J.F.Sebastian: My code is the same with what you have in the answer. The sub process I was running is Android fastboot tool to flash a phone. And I didn't write to p.stdin during the process. Before setting bufisze to 1, I can only see the output after the flashing finishes (~2 min).Glim
This prints all of the read lines with a b'...' around them.Roentgenoscope
@NateGlenn: no, it won't. Notice: it uses Python 2 syntax: print line,. You are likely to see b'' in Python 3. I've added Python 3 code.Chartreuse
Thanks. I'm writing an ST3 plugin and don't use Python normally.Roentgenoscope
I am also having same issue where my prints from subprocess are buffered till the end of the subprocess. I tired to use your way of getting prints on the terminal as they are encountered but it is still not resolving the issue and still doing the buffer. My post is at #31321914Passel
@user2966197: -u works for a python subprocess. In general, I recommend that you read the links in the paragraph above that starts with "If subprocess' stdout uses a block buffering.."Chartreuse
@J.F.Sebastian I am using python 3.4, bufsize=1 really helps when using subprocess.Popen().communicate() as it essentially tells the process to flush stdout / stderr as soon as there is 1 byte of data. default is -1Meteor
@Meteor it is a wrong assumption. If bufsize had any effect on the buffering inside the child process then I wouldn't need to mention stdbuf, pty, etc. If you are using .communicate() then bufsize should not have any effect on the parent too (unless its implementation is defective).Chartreuse
@J.F.Sebastian so is the idea that we put all the code that we want to execute after the subprocess is finished inside the with context manager (after the for loop) and if so, how do we get something like the return code? (as neither wait nor communicate are being called)Frosting
@Frosting why do you think it is necessary? Put the code after the with-statement, to make sure the process is reaped.Chartreuse
could you listen in for stderr at the same time?Milldam
@ealeon: yes. It requires techniques that can read stdout/stderr concurrently unless you merge stderr into stdout (by passing stderr=subprocess.STDOUT to Popen()). See also, threading or asyncio solutions linked there.Chartreuse
@J.F.Sebastian In python 3.5.2, I see exactly the same behavior when I replace the loop starting for line in p.stdout: by pass. Is there some way to actually process the lines? Suppose I only want to print some of the lines, for example?Knitter
@Knitter if stdout=PIPE doesn't capture the output (you still see it on the screen) then your program might print to stderr or directly to the terminal instead. To merge stdout&stderr, pass stderr=subprocess.STDOUT (see my previous comment). To capture output printed directly to your tty, you could use pexpect, pty solutions.. Here's a more complex code example.Chartreuse
with Popen("./Portability.py", stdout=PIPE, stderr=STDOUT, bufsize=1, universal_newlines=True) as p, \ open('Portability.log', 'ab') as file: for line in p.stdout: # b'\n'-separated lines print(line, end='') #new addition sys.stdout.buffer.write(line) # pass bytes as is file.write(line) ERROR IS : Traceback (most recent call last): File "./Portability_Tests.py", line 67, in <module> sys.stdout.buffer.write(line) # pass bytes as is TypeError: a bytes-like object is required, not 'str'Dungeon
This was driving me nuts. If you are calling python in the subprocess, you need to pay attention to jfs's warning about buffering. To turn off buffering of the python subprocess, just pass env={"PYTHONUNBUFFERED": "1"} to subprocess.Popen. As it stands, jfs's Python 3 code will not work for a python subprocess. It won't crash, but you won't get the lines as soon as they are emitted.Elielia
@leducvin: you can use -u flag if you control the command line docs.python.org/3/using/cmdline.html#cmdoption-uChartreuse
C
49

Please note, I think J.F. Sebastian's method (below) is better.


Here is an simple example (with no checking for errors):

import subprocess
proc = subprocess.Popen('ls',
                       shell=True,
                       stdout=subprocess.PIPE,
                       )
while proc.poll() is None:
    output = proc.stdout.readline()
    print output,

If ls ends too fast, then the while loop may end before you've read all the data.

You can catch the remainder in stdout this way:

output = proc.communicate()[0]
print output,
Conservatory answered 26/4, 2010 at 18:54 Comment(5)
does this scheme fall victim to the buffer blocking problem that the python doc refers to?Milagro
@Heinrich, the buffer blocking problem is not something I understand well. I believe (just from googling around) that this problem only occurs if you don't read from stdout (and stderr?) inside the while loop. So I think the above code is okay, but I can't say for sure.Conservatory
This actually does suffer from a blocking problem, a few years ago I had no end to the trouble where readline would block 'til it got a newline even if the proc had ended. I don't remember the solution, but I think it had something to do with doing the reads on a worker thread and just looping while proc.poll() is None: time.sleep(0) or something to that effect. Basically- you need to either ensure that the output newline is the last thing that the process does (because you can't give the interpreter time to loop again) or you need to do something "fancy."Kline
@Heinrich: Alex Martelli writes about how to avoid the deadlock here: #1446127Conservatory
The buffer blocking is simpler than it sometimes sounds: parent blocks waiting for child to exit + child blocks waiting for parent to read and free some space in the communication pipe which is full = deadlock. It is that simple. The smaller the pipe the more likely to happen.Howell
C
6

I believe the simplest way to collect output from a process in a streaming fashion is like this:

import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
    data = proc.stdout.readline()   # Alternatively proc.stdout.read(1024)
    if len(data) == 0:
        break
    sys.stdout.write(data)   # sys.stdout.buffer.write(data) on Python 3.x

The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.

On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.

Crockery answered 25/4, 2013 at 4:39 Comment(3)
data = proc.stdout.read() blocks until all data is read. You might be confusing it with os.read(fd, maxsize) that can return earlier (as soon as any data is available).Chartreuse
You're correct, I was mistaken. However if a reasonable number of bytes is passed as an argument to read() then it works fine, and likewise readline() works fine as long as the maximum line length is reasonable. Updated my answer accordingly.Crockery
The shell=True is obviously useless here and should be taken out, though that requires you to pass the command as a list. See also Actual meaning of shell=True in subprocessNianiabi
S
3

If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().

Shrapnel answered 26/4, 2010 at 18:29 Comment(1)
non-blocking approach is not straightforwardChartreuse
M
3

If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:

import subprocess

# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])

See the docs for subprocess.check_call().

If you need to process the output, sure, loop on it. But if you don't, just keep it simple.

Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).

Mucoid answered 22/9, 2015 at 15:34 Comment(6)
It won't work if sys.stdout or sys.stderr are replaced with file-like objects that have no real fileno(). If sys.stdout, sys.stderr are not replaced then it is even simpler: subprocess.check_call(args).Chartreuse
Thanks! I'd realized the vagaries of replacing sys.stdout/stderr, but somehow never realized that if you omit the arguments, it passes stdout and stderr to the right places. I like call() over check_call() unless I want the CalledProcessError.Mucoid
python -mthis: "Errors should never pass silently. Unless explicitly silenced." that is why the example code should prefer check_call() over call().Chartreuse
Heh. A lot of the programs I wind up call()ing return nonzero error codes in non-error conditions, because they are terrible. So on our case, a nonzero error code is not actually an error.Mucoid
yes. There are programs such as grep that may return non-zero exit status even if there is no error -- they are exceptions. By default zero exit status indicates success.Chartreuse
Sure. Anyhow, now the example code uses check_call() and explains when you might want call() instead.Mucoid
A
1
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r     equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:    
    print(p.stderr.readline().rstrip('\r\n'))
Antibiosis answered 12/11, 2017 at 23:22 Comment(2)
it is always good to explain what your solution does just to make people understand betterRequite
You should consider using shlex.split(myCommand) instead of myCommand.split(). It honors spaces in quoted arguments, as well.Crowell
D
1

Adding another python3 solution with a few small changes:

  1. Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
  2. Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
    # Run a shell command, streaming output to STDOUT in real time
    # Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
    for line in p.stdout:
        sys.stdout.write(line)
    p.wait()
    exit_code = p.returncode
    if exit_code != 0 and fail_on_error:
        raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
    return(exit_code)
Der answered 15/10, 2020 at 23:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.