Run command and get its stdout, stderr separately in near real time like in a terminal
Asked Answered
L

3

26

I am trying to find a way in Python to run other programs in such a way that:

  1. The stdout and stderr of the program being run can be logged separately.
  2. The stdout and stderr of the program being run can be viewed in near-real time, such that if the child process hangs, the user can see. (i.e. we do not wait for execution to complete before printing the stdout/stderr to the user)
  3. Bonus criteria: The program being run does not know it is being run via python, and thus will not do unexpected things (like chunk its output instead of printing it in real-time, or exit because it demands a terminal to view its output). This small criteria pretty much means we will need to use a pty I think.

Here is what i've got so far... Method 1:

def method1(command):
    ## subprocess.communicate() will give us the stdout and stderr sepurately, 
    ## but we will have to wait until the end of command execution to print anything.
    ## This means if the child process hangs, we will never know....
    proc=subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable='/bin/bash')
    stdout, stderr = proc.communicate() # record both, but no way to print stdout/stderr in real-time
    print ' ######### REAL-TIME ######### '
    ########         Not Possible
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print stdout
    print 'STDOUT:'
    print stderr

Method 2

def method2(command):
    ## Using pexpect to run our command in a pty, we can see the child's stdout in real-time,
    ## however we cannot see the stderr from "curl google.com", presumably because it is not connected to a pty?
    ## Furthermore, I do not know how to log it beyond writing out to a file (p.logfile). I need the stdout and stderr
    ## as strings, not files on disk! On the upside, pexpect would give alot of extra functionality (if it worked!)
    proc = pexpect.spawn('/bin/bash', ['-c', command])
    print ' ######### REAL-TIME ######### '
    proc.interact()
    print ' ########## RESULTS ########## '
    ########         Not Possible

Method 3:

def method3(command):
    ## This method is very much like method1, and would work exactly as desired
    ## if only proc.xxx.read(1) wouldn't block waiting for something. Which it does. So this is useless.
    proc=subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable='/bin/bash')
    print ' ######### REAL-TIME ######### '
    out,err,outbuf,errbuf = '','','',''
    firstToSpeak = None
    while proc.poll() == None:
            stdout = proc.stdout.read(1) # blocks
            stderr = proc.stderr.read(1) # also blocks
            if firstToSpeak == None:
                if stdout != '': firstToSpeak = 'stdout'; outbuf,errbuf = stdout,stderr
                elif stderr != '': firstToSpeak = 'stderr'; outbuf,errbuf = stdout,stderr
            else:
                if (stdout != '') or (stderr != ''): outbuf += stdout; errbuf += stderr
                else:
                    out += outbuf; err += errbuf;
                    if firstToSpeak == 'stdout': sys.stdout.write(outbuf+errbuf);sys.stdout.flush()
                    else: sys.stdout.write(errbuf+outbuf);sys.stdout.flush()
                    firstToSpeak = None
    print ''
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print out
    print 'STDERR:'
    print err

To try these methods out, you will need to import sys,subprocess,pexpect

pexpect is pure-python and can be had with

sudo pip install pexpect

I think the solution will involve python's pty module - which is somewhat of a black art that I cannot find anyone who knows how to use. Perhaps SO knows :) As a heads-up, i recommend you use 'curl www.google.com' as a test command, because it prints its status out on stderr for some reason :D


UPDATE-1:
OK so the pty library is not fit for human consumption. The docs, essentially, are the source code. Any presented solution that is blocking and not async is not going to work here. The Threads/Queue method by Padraic Cunningham works great, although adding pty support is not possible - and it's 'dirty' (to quote Freenode's #python). It seems like the only solution fit for production-standard code is using the Twisted framework, which even supports pty as a boolean switch to run processes exactly as if they were invoked from the shell. But adding Twisted into a project requires a total rewrite of all the code. This is a total bummer :/

UPDATE-2:

Two answers were provided, one of which addresses the first two criteria and will work well where you just need both the stdout and stderr using Threads and Queue. The other answer uses select, a non-blocking method for reading file descriptors, and pty, a method to "trick" the spawned process into believing it is running in a real terminal just as if it was run from Bash directly - but may or may not have side-effects. I wish I could accept both answers, because the "correct" method really depends on the situation and why you are subprocessing in the first place, but alas, I could only accept one.

Ludwigg answered 10/8, 2015 at 18:20 Comment(6)
could pyinvoke be of any use ?Decoteau
Moffat's sh module is a subprocess replacement allowing execution of external programs as functions with redirection of stderr and stdout to files or functions. Documentation for this is at amoffat.github.io/sh/#redirection. It can be installed with 'pip install sh' and its GitHub site is github.com/amoffat/sh.Whirl
@GIRISHRAMNANI: I don't see how pyinvoke would help here. Without pty=True you have issues buffering. With pty=True, you can't separate stdout, stderrRomelda
@TrisNefzger: sh's callbacks might work to get stdout, stderr separately and incrementally but I don't see that it solves the buffering issue with stderr (I see only tty_in, tty_out, though it might be enough in most cases and there could be issues with stderr!=STDOUT)Romelda
@J.F.Sebastian: _out_bufsize controls buffering for stderr as well as stdout and setting it to 0 disables buffering, based on amoffat.github.io/sh/#buffer-sizesWhirl
@TrisNefzger: unless sh performs some ugly non-portable hacks (like stdbuf utility) then _out_bufsize controls a wrong buffer. It is likely that it controls the buffer in the parent python process. Like bufsize, it won't fix the buffering behavior in the child. Look at the picture that shows libc stdio buffers in my answer (the parent process (the shell) buffers are not shown)Romelda
R
25

The stdout and stderr of the program being run can be logged separately.

You can't use pexpect because both stdout and stderr go to the same pty and there is no way to separate them after that.

The stdout and stderr of the program being run can be viewed in near-real time, such that if the child process hangs, the user can see. (i.e. we do not wait for execution to complete before printing the stdout/stderr to the user)

If the output of a subprocess is not a tty then it is likely that it uses a block buffering and therefore if it doesn't produce much output then it won't be "real time" e.g., if the buffer is 4K then your parent Python process won't see anything until the child process prints 4K chars and the buffer overflows or it is flushed explicitly (inside the subprocess). This buffer is inside the child process and there are no standard ways to manage it from outside. Here's picture that shows stdio buffers and the pipe buffer for command 1 | command2 shell pipeline:

pipe/stdio buffers

The program being run does not know it is being run via python, and thus will not do unexpected things (like chunk its output instead of printing it in real-time, or exit because it demands a terminal to view its output).

It seems, you meant the opposite i.e., it is likely that your child process chunks its output instead of flushing each output line as soon as possible if the output is redirected to a pipe (when you use stdout=PIPE in Python). It means that the default threading or asyncio solutions won't work as is in your case.

There are several options to workaround it:

  • the command may accept a command-line argument such as grep --line-buffered or python -u, to disable block buffering.

  • stdbuf works for some programs i.e., you could run ['stdbuf', '-oL', '-eL'] + command using the threading or asyncio solution above and you should get stdout, stderr separately and lines should appear in near-real time:

    #!/usr/bin/env python3
    import os
    import sys
    from select import select
    from subprocess import Popen, PIPE
    
    with Popen(['stdbuf', '-oL', '-e0', 'curl', 'www.google.com'],
               stdout=PIPE, stderr=PIPE) as p:
        readable = {
            p.stdout.fileno(): sys.stdout.buffer, # log separately
            p.stderr.fileno(): sys.stderr.buffer,
        }
        while readable:
            for fd in select(readable, [], [])[0]:
                data = os.read(fd, 1024) # read available
                if not data: # EOF
                    del readable[fd]
                else: 
                    readable[fd].write(data)
                    readable[fd].flush()
    
  • finally, you could try pty + select solution with two ptys:

    #!/usr/bin/env python3
    import errno
    import os
    import pty
    import sys
    from select import select
    from subprocess import Popen
    
    masters, slaves = zip(pty.openpty(), pty.openpty())
    with Popen([sys.executable, '-c', r'''import sys, time
    print('stdout', 1) # no explicit flush
    time.sleep(.5)
    print('stderr', 2, file=sys.stderr)
    time.sleep(.5)
    print('stdout', 3)
    time.sleep(.5)
    print('stderr', 4, file=sys.stderr)
    '''],
               stdin=slaves[0], stdout=slaves[0], stderr=slaves[1]):
        for fd in slaves:
            os.close(fd) # no input
        readable = {
            masters[0]: sys.stdout.buffer, # log separately
            masters[1]: sys.stderr.buffer,
        }
        while readable:
            for fd in select(readable, [], [])[0]:
                try:
                    data = os.read(fd, 1024) # read available
                except OSError as e:
                    if e.errno != errno.EIO:
                        raise #XXX cleanup
                    del readable[fd] # EIO means EOF on some systems
                else:
                    if not data: # EOF
                        del readable[fd]
                    else:
                        readable[fd].write(data)
                        readable[fd].flush()
    for fd in masters:
        os.close(fd)
    

    I don't know what are the side-effects of using different ptys for stdout, stderr. You could try whether a single pty is enough in your case e.g., set stderr=PIPE and use p.stderr.fileno() instead of masters[1]. Comment in sh source suggests that there are issues if stderr not in {STDOUT, pipe}

Romelda answered 11/8, 2015 at 23:3 Comment(6)
This is another fantastic answer, and I feel like we are close to a universal solution to what should be a common problem. Whilst i test out your code, i will just mention that I thought the whole point of a pty was to 'trick' the connecting program into thinking it was talking to an actual terminal, and thus NOT buffering its output as it might if the output was being redirected to a file or another process. i.e. using pty's prevents chunking and other non-typical behaviour, allowing python to subprocess commands as if they were run directly from the shell.Ludwigg
@user3329564: yes, most programs use line-buffered stdout if you provide pty. What part of my answer suggests otherwise? The child process is free to change (or not to change) this and other aspects of its behavior e.g., curl does not report progress if the output is a tty.Romelda
sometimes your code calls write with b'' (an empty binary array) and also some other kind of "null" string (maybe EOF) actually doing nothing... maybe add a guard like: if data: write()Meadowsweet
@MasterYogurt it is impossible. Do you see if not data in the code?Romelda
maybe it was something else then, a null character of sorts which doesn't print... anyway, after implementing something close to this, I then found a better solution which would not require to read stderr. Thanks anyway.Meadowsweet
@MasterYogurt: a null character looks like b'\x00' -- there are no invisible characters in the bytes representation.Romelda
H
3

If you want to read from stderr and stdout and get the output separately, you can use a Thread with a Queue, not overly tested but something like the following:

import threading
import queue

def run(fd, q):
    for line in iter(fd.readline, ''):
        q.put(line)
    q.put(None)


def create(fd):
    q = queue.Queue()
    t = threading.Thread(target=run, args=(fd, q))
    t.daemon = True
    t.start()
    return q, t


process = Popen(["curl","www.google.com"], stdout=PIPE, stderr=PIPE,
                universal_newlines=True)

std_q, std_out = create(process.stdout)
err_q, err_read = create(process.stderr)

while std_out.is_alive() or err_read.is_alive():
        for line in iter(std_q.get, None):
            print(line)
        for line in iter(err_q.get, None):
            print(line)
Henghold answered 10/8, 2015 at 19:40 Comment(3)
a single queue would be enough. The answer does not address the hard buffering problem at all.Romelda
@J.F.Sebastian, how would you know which is stderr and which is stdout if you used a single queue?Henghold
I believe that you can answer this question yourself. Think for a couple of minutes before following the link with a code example.Romelda
L
2

While J.F. Sebastian's answer certainly solves the heart of the problem, i'm running python 2.7 (which wasn't in the original criteria) so im just throwing this out there to any other weary travellers who just want to cut/paste some code. I havent tested this throughly yet, but on all the commands i have tried it seems to work perfectly :) you may want to change .decode('ascii') to .decode('utf-8') - im still testing that bit out.

#!/usr/bin/env python2.7
import errno
import os
import pty
import sys
from select import select
import subprocess
stdout = ''
stderr = ''
command = 'curl google.com ; sleep 5 ; echo "hey"'
masters, slaves = zip(pty.openpty(), pty.openpty())
p = subprocess.Popen(command, stdin=slaves[0], stdout=slaves[0], stderr=slaves[1], shell=True, executable='/bin/bash')
for fd in slaves: os.close(fd)

readable = { masters[0]: sys.stdout, masters[1]: sys.stderr }
try:
    print ' ######### REAL-TIME ######### '
    while readable:
        for fd in select(readable, [], [])[0]:
            try: data = os.read(fd, 1024)
            except OSError as e:
                if e.errno != errno.EIO: raise
                del readable[fd]
            finally:
                if not data: del readable[fd]
                else:
                    if fd == masters[0]: stdout += data.decode('ascii')
                    else: stderr += data.decode('ascii')
                    readable[fd].write(data)
                    readable[fd].flush()
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise
finally:
    p.wait()
    for fd in masters: os.close(fd)
    print ''
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print stdout
    print 'STDERR:'
    print stderr
Ludwigg answered 12/8, 2015 at 14:40 Comment(1)
Thank you for your answer, I'm using Python 2.7 too. BUT could you please change except: pass into something else or remove the line entirely? If someone changes the code and gets e.g. a syntaxerror, this is not shown!Moulder

© 2022 - 2024 — McMap. All rights reserved.