read subprocess stdout line by line
Asked Answered
S

14

314

My python script uses subprocess to call a linux utility that is very noisy. I want to store all of the output to a log file and show some of it to the user. I thought the following would work, but the output doesn't show up in my application until the utility has produced a significant amount of output.

# fake_utility.py, just generates lots of output over time
import time
i = 0
    while True:
        print(hex(i)*512)
        i += 1
        time.sleep(0.5)

In the parent process:

import subprocess

proc = subprocess.Popen(['python', 'fake_utility.py'], stdout=subprocess.PIPE)
for line in proc.stdout:
    # the real code does filtering here
    print("test:", line.rstrip())

The behavior I really want is for the filter script to print each line as it is received from the subprocess, like tee does but within Python code.

What am I missing? Is this even possible?


Sequel answered 10/5, 2010 at 16:47 Comment(6)
you could use print line, instead of print line.rstrip() (note: comma at the end).Unmeaning
related: Python: read streaming input from subprocess.communicate()Unmeaning
Update 2 states that it works with python 3.0+ but uses the old print statement, so it does not work with python 3.0+.Runlet
None of the answers listed here worked for me, but #5412280 did!Autotoxin
interesting the code that only works in python3.0+ uses 2.7 syntax for print.Lamed
the update does not work. you're only printing line by line, not receiving them one by one.Bridegroom
S
241

I think the problem is with the statement for line in proc.stdout, which reads the entire input before iterating over it. The solution is to use readline() instead:

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
while True:
  line = proc.stdout.readline()
  if not line:
    break
  #the real code does filtering here
  print "test:", line.rstrip()

Of course you still have to deal with the subprocess' buffering.

Note: according to the documentation the solution with an iterator should be equivalent to using readline(), except for the read-ahead buffer, but (or exactly because of this) the proposed change did produce different results for me (Python 2.5 on Windows XP).

Superscription answered 11/5, 2010 at 18:48 Comment(13)
for file.readline() vs. for line in file see bugs.python.org/issue3907 (in short: it works on Python3; use io.open() on Python 2.6+)Unmeaning
The more pythonic test for an EOF, per the "Programming Recommendations" in PEP 8 (python.org/dev/peps/pep-0008), would be 'if not line:'.Stilbite
there is no open() used in this script; where do you put io.open()? is there a workaround for 2.5?Immiscible
@naxa: for pipes: for line in iter(proc.stdout.readline, ''):.Unmeaning
@J.F.Sebastian: did you try this solution on Python3? I have code that previously ran on Python 2(.7) using the iter(proc.stdout.readline, '') approach, and now that I switched to Python 3.4 that code went pear-shaped, the loop does not return and RAM usage oscillates between ~0 and 3 GB.Cymograph
@Jan-PhilipGehrcke: yes. 1. you could use for line in proc.stdout on Python 3 (there is no the read-ahead bug) 2. '' != b'' on Python 3 -- don't copy-paste the code blindly -- think what it does and how it works.Unmeaning
@J.F.Sebastian: sure, the iter(f.readline, b'') solution is rather obvious (and also works on Python 2, if anyone is interested). The point of my comment was not to blame your solution (sorry if it appeared like that, I read that now, too!), but to describe the extent of the symptoms, which are quite severe in this case (most of the Py2/3 issues result in exceptions, whereas here a well-behaved loop changed to be endless, and garbage collection struggles fighting the flood of newly created objects, yielding memory usage oscillations with long period and large amplitude).Cymograph
@Jan-PhilipGehrcke: whether to use '' or b'' depends on universal_newlines parameter that enables text mode. It is not obvious. There are parameters that are different on Python 2 and 3. You should be careful if you write single source Python 2/3 compatible code that uses subprocess module.Unmeaning
@J.F.Sebastian: I agree that there is a lot to consider when using subprocess, but usage of b'' fits most application scenarios, because the well-chosen default in both, Python 2 and 3 is to treat subprocess.PIPE as a byte stream, and to not implicitly perform de/encoding operations. I'd say b'' is recommendable even on Python 2, because it is semantically better (explicit). Indeed, b'' would be wrong with universal_newlines=True on Python 3 (which renders stdout/err attributes to be TextIOWrapper objects). On Python 2, b'' works independent of universal_newlines.Cymograph
How can you see if proc has terminated before trying to read another line from its stdout?Hydropic
Does this care how frequently or infrequently the called process sends output? Could it run indefinitely for months only printing a line every 30 seconds? I don't understand how readline() can determine when the program output is actually finished...Donough
I recommmend to add sys.stdout.flush() before breaking, otherwise things mix up.Bawbee
@JasonMock if not line: will also break on the first empty line (which is not necessarily at the end of the stream). if line is not None: should work properly.Supersaturated
U
101

Bit late to the party, but was surprised not to see what I think is the simplest solution here:

import io
import subprocess

proc = subprocess.Popen(["prog", "arg"], stdout=subprocess.PIPE)
for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"):  # or another encoding
    # do something with line

(This requires Python 3.)

Unsnarl answered 22/1, 2016 at 3:56 Comment(9)
I'd like to use this answer but I am getting: AttributeError: 'file' object has no attribute 'readable' py2.7Discommodity
Works with python 3Bastia
Clearly this code is not valid for multiple reasons py3/py3 compatibility and real risk of getting ValueError: I/O operation on closed fileMariandi
@Mariandi neither of those things make it "not valid". If you're writing a library that still needs to support Python 2, then don't use this code. But many people have the luxury of being able to use software released more recently than a decade ago. If you try to read on a closed file you'll get that exception regardless of whether you use TextIOWrapper or not. You can simply handle the exception.Unsnarl
you are maybe late to the party but you answer is up to date with current version of Python, tyPaganini
This logic works fine but i am getting extra '\n' at every line. Is there a way to suppress that?Cadet
@Cadet \n is the newline character. it's conventional in Python for the newline to not be removed when splitting by lines - you'll see the same behaviour if you iterate over a file's lines or use a readlines() method. You can get the line without it with just line[:-1] (TextIOWrapper operates in "universal newlines" mode by default, so even if you're on Windows and the line ends with \r\n, you'll only have \n at the end, so -1 works). You can also use line.rstrip() if you don't mind any other whitespace-like characters at the end of the line also being removed.Unsnarl
I got AttributeError: 'file' object has no attribute 'readable' on python 3.7, but it was because I was using subprocess.run instead of subprocess.Popen.Tensity
This was a good answer, but the io.TextIOWrapper isn't really needed anymore - for modern Python versions, just create the Popen in text mode.Murage
S
29

Indeed, if you sorted out the iterator then buffering could now be your problem. You could tell the python in the sub-process not to buffer its output.

proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)

becomes

proc = subprocess.Popen(['python','-u', 'fake_utility.py'],stdout=subprocess.PIPE)

I have needed this when calling python from within python.

Slavery answered 29/8, 2014 at 16:36 Comment(0)
E
22

A function that allows iterating over both stdout and stderr concurrently, in realtime, line by line

In case you need to get the output stream for both stdout and stderr at the same time, you can use the following function.

The function uses Queues to merge both Popen pipes into a single iterator.

Here we create the function read_popen_pipes():

from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor


def enqueue_output(file, queue):
    for line in iter(file.readline, ''):
        queue.put(line)
    file.close()


def read_popen_pipes(p):

    with ThreadPoolExecutor(2) as pool:
        q_stdout, q_stderr = Queue(), Queue()

        pool.submit(enqueue_output, p.stdout, q_stdout)
        pool.submit(enqueue_output, p.stderr, q_stderr)

        while True:

            if p.poll() is not None and q_stdout.empty() and q_stderr.empty():
                break

            out_line = err_line = ''

            try:
                out_line = q_stdout.get_nowait()
            except Empty:
                pass
            try:
                err_line = q_stderr.get_nowait()
            except Empty:
                pass

            yield (out_line, err_line)

read_popen_pipes() in use:

import subprocess as sp


with sp.Popen(my_cmd, stdout=sp.PIPE, stderr=sp.PIPE, text=True) as p:

    for out_line, err_line in read_popen_pipes(p):

        # Do stuff with each line, e.g.:
        print(out_line, end='')
        print(err_line, end='')

    return p.poll() # return status-code
Eruption answered 18/7, 2019 at 11:57 Comment(0)
S
18

You want to pass these extra parameters to subprocess.Popen:

bufsize=1, universal_newlines=True

Then you can iterate as in your example. (Tested with Python 3.5)

Scuta answered 16/10, 2015 at 18:57 Comment(1)
@nicoulaj It should work if using the subprocess32 package.Caliphate
M
8

The subprocess module has come a long way since 2010, and most of the answers here are quite outdated.

Here is a simple way working for modern Python versions:

from subprocess import Popen, PIPE, STDOUT

with Popen(args, stdout=PIPE, stderr=STDOUT, text=True) as proc:
    for line in proc.stdout:
        print(line)
rc = proc.returncode

About using Popen as a context-manager (supported since Python 3.2): on exit of the with block, standard file descriptors are closed, and the process is waited / returncode attribute set. See subprocess.py:Popen.__exit__ in CPython sources.

Murage answered 22/9, 2023 at 15:55 Comment(0)
E
6

You can also read lines w/o loop. Works in python3.6.

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
list_of_byte_strings = process.stdout.readlines()
Erg answered 27/12, 2018 at 23:20 Comment(2)
Or to convert into strings: list_of_strings = [x.decode('utf-8').rstrip('\n') for x in iter(process.stdout.readlines())]Gamber
@ndtreviv, you can pass text=True to Popen or use its "encoding" kwarg if you want the output as strings, no need to convert it yourselfBoren
W
5

Pythont 3.5 added the methods run() and call() to the subprocess module, both returning a CompletedProcess object. With this you are fine using proc.stdout.splitlines():

proc = subprocess.run( comman, shell=True, capture_output=True, text=True, check=True )
for line in proc.stdout.splitlines():
   print "stdout:", line

See also How to Execute Shell Commands in Python Using the Subprocess Run Method

Wahkuna answered 22/3, 2020 at 9:4 Comment(3)
This solution is short and effective. One problem, compared to the original question: it does not print each line "as it is received," which I think means printing the messages in realtime just as if running the process directly in the command line. Instead it only prints the output after the process finishes running.Lyon
Thanks @Lyon for mentioning that. I use pipelines extensively and rely on streaming data and would have wrongly chosen this for its brevity.Synchronize
This does not answer the question. It buffers entire output of subprocess into memory.Murage
S
1

I tried this with python3 and it worked, source

When you use popen to spawn the new thread, you tell the operating system to PIPE the stdout of the child processes so the parent process can read it and here, stderr is copied to the stderr of the parent process.

in output_reader we read each line of stdout of the child process by wrapping it in an iterator that populates line by line output from the child process whenever a new line is ready.

def output_reader(proc):
    for line in iter(proc.stdout.readline, b''):
        print('got line: {0}'.format(line.decode('utf-8')), end='')


def main():
    proc = subprocess.Popen(['python', 'fake_utility.py'],
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT)

    t = threading.Thread(target=output_reader, args=(proc,))
    t.start()

    try:
        time.sleep(0.2)
        import time
        i = 0
    
        while True:
        print (hex(i)*512)
        i += 1
        time.sleep(0.5)
    finally:
        proc.terminate()
        try:
            proc.wait(timeout=0.2)
            print('== subprocess exited with rc =', proc.returncode)
        except subprocess.TimeoutExpired:
            print('subprocess did not terminate in time')
    t.join()
Stepmother answered 21/1, 2018 at 12:0 Comment(3)
That's great, but it seems to use the normal Popen. Instead of just showing a code snippit, you should really describe how it is set apart and what it does. There is a lot in there that surprises the heck out of the reader, and we're supposed to keep to the principle of least surprise.Denotative
Thank you @MaartenBodewes, I added more details to the answer, please let me know if you have more commentsStepmother
Much better, upvoted. I'll remove my comment, you can do the same :)Denotative
B
1

I came here with the same problem, and found that none of the provided answers really worked for me. The closest was adding the sys.std.flush() to the child process, which works but means modifying that process, which I didn't want to do.

Setting the bufsize=1 in the Popen() didn't seem to have any effect for my use case. I guess the problem is that the child process is buffering, regardless of how I call the Popen().

However, I found this question with similar problem (How can I flush the output of the print function?) and one of the answers is to set the environment variable PYTHONUNBUFFERED=1 when calling Popen. This works how I want it to, i.e. real-time line-by-line reading of the output of the child process.

Bragdon answered 19/9, 2023 at 9:1 Comment(0)
A
0

The following modification of Rômulo's answer works for me on Python 2 and 3 (2.7.12 and 3.6.1):

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
while True:
  line = process.stdout.readline()
  if line != '':
    os.write(1, line)
  else:
    break
Albescent answered 2/4, 2017 at 17:14 Comment(0)
D
0

I was having a problem with the arg list of Popen to update servers, the following code resolves this a bit.

import getpass
from subprocess import Popen, PIPE

username = 'user1'
ip = '127.0.0.1'

print ('What is the password?')
password = getpass.getpass()
cmd1 = f"""sshpass -p {password} ssh {username}@{ip}"""
cmd2 = f"""echo {password} | sudo -S apt update"""
cmd3 = " && "
cmd4 = f"""echo {password} | sudo -S apt upgrade -y"""
cmd5 = " && "
cmd6 = "exit"
commands = [cmd1, cmd2, cmd3, cmd4, cmd5, cmd6]

command = " ".join(commands)

cmd = command.split()

with Popen(cmd, stdout=PIPE, bufsize=1, universal_newlines=True) as p:
    for line in p.stdout:
        print(line, end='')

And to run the update on a local computer, the following code example does this.

import getpass
from subprocess import Popen, PIPE

print ('What is the password?')
password = getpass.getpass()

cmd1_local = f"""apt update"""
cmd2_local = f"""apt upgrade -y"""
commands = [cmd1_local, cmd2_local]

with Popen(['echo', password], stdout=PIPE) as auth:
    for cmd in commands:
        cmd = cmd.split()
        with Popen(['sudo','-S'] + cmd, stdin=auth.stdout, stdout=PIPE, bufsize=1, universal_newlines=True) as p:
            for line in p.stdout:
                print(line, end='')
Dipper answered 21/4, 2022 at 0:22 Comment(0)
R
0

An improved version of https://mcmap.net/q/99104/-read-subprocess-stdout-line-by-line and suitable to python 3.10

A function to iterate over both stdout and stderr of the process in parallel.

Improvements:

  • Unified queue to maintain the order of entries in stdout and stderr.
  • Yield all available lines in stdout and stderr - this is useful when the calling process is slower.
  • Use blocking in the loop to prevent the process from utilizing 100% of the CPU.
import time
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor

def enqueue_output(file, queue, level):
    for line in file:
        queue.put((level, line))
    file.close()


def read_popen_pipes(p, blocking_delay=0.5):

    with ThreadPoolExecutor(2) as pool:
        q = Queue()

        pool.submit(enqueue_output, p.stdout, q, 'stdout')
        pool.submit(enqueue_output, p.stderr, q, 'stderr')

        while True:
            if p.poll() is not None and q.empty():
                break

            lines = []
            while not q.empty():
                lines.append(q.get_nowait())

            if lines:
                yield lines

            # otherwise, loop will run as fast as possible and utilizes 100% of the CPU
            time.sleep(blocking_delay)

Usage:

with subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=1, universal_newlines=True) as p:
    for lines in read_popen_pipes(p):
        # lines - all the log entries since the last loop run.
        print('ext cmd', lines)
        # process lines
Reverberatory answered 17/8, 2023 at 21:19 Comment(0)
T
0

On Linux (and presumably OSX), sometimes the parent process doesn't see the output immediately because the child process is buffering its output (see this article for a more detailed explanation).

If the child process is a Python program, you can disable this by setting the environment variable PYTHONUNBUFFERED to 1 as described in this answer.

If the child process is not a Python program, you can sometimes trick it into running in line-buffered mode by creating a pseudo-terminal like so:

import os
import pty
import subprocess

# Open a pseudo-terminal
master_fd, slave_fd = pty.openpty()

# Open the child process on the slave end of the PTY
with subprocess.Popen(
        ['python', 'fake_utility.py'],
        stdout=slave_fd,
        stdin=slave_fd,
        stderr=slave_fd) as proc:

    # Close our copy of the slave FD (without this we won't notice
    # when the child process closes theirs)
    os.close(slave_fd)

    # Convert the master FD into a file-like object
    with open(master_fd, 'r') as stdout:
        try:
            for line in stdout:
                # Do the actual filtering here
                print("test:", line.rstrip())
        except OSError:
            # This happens when the child process closes its STDOUT,
            # usually when it exits
            pass

If the child process needs to read from STDIN, you can get away without the stdin=slave_fd argument to subprocess.Popen(), as the child process should be checking the status of STDOUT (not STDIN) when it decides whether or not to use line-buffering.

Finally, some programs may actually directly open and write to their controlling terminal instead of writing to STDOUT. If you need to catch this case, you can use the setsid utility by replacing ['python', 'fake_utility.py'] with ['setsid', 'python', 'fake_utility.py'] in the call to subprocess.Popen().

Tolbooth answered 26/3, 2024 at 3:44 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.