How to write to stdout AND to log file simultaneously with Popen?
Asked Answered
B

3

61

I am using Popen to call a shell script that is continuously writing its stdout and stderr to a log file. Is there any way to simultaneously output the log file continuously (to the screen), or alternatively, make the shell script write to both the log file and stdout at the same time?

I basically want to do something like this in Python:

cat file 2>&1 | tee -a logfile #"cat file" will be replaced with some script

Again, this pipes stderr/stdout together to tee, which writes it both to stdout and my logfile.

I know how to write stdout and stderr to a logfile in Python. Where I'm stuck is how to duplicate these back to the screen:

subprocess.Popen("cat file", shell=True, stdout=logfile, stderr=logfile)

Of course, I could just do something like this, but is there any way to do this without tee and shell file descriptor redirection?:

subprocess.Popen("cat file 2>&1 | tee -a logfile", shell=True)
Byrdie answered 20/3, 2013 at 21:37 Comment(2)
related: Python subprocess get children's output to file and terminal?Philbo
Related: How to replicate tee behavior in Python when using subprocess?Poinciana
S
58

You can use a pipe to read the data from the program's stdout and write it to all the places you want:

import sys
import subprocess

logfile = open('logfile', 'w')
proc=subprocess.Popen(['cat', 'file'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in proc.stdout:
    sys.stdout.write(line)
    logfile.write(line)
proc.wait()

UPDATE

In python 3, the universal_newlines parameter controls how pipes are used. If False, pipe reads return bytes objects and may need to be decoded (e.g., line.decode('utf-8')) to get a string. If True, python does the decode for you

Changed in version 3.3: When universal_newlines is True, the class uses the encoding locale.getpreferredencoding(False) instead of locale.getpreferredencoding(). See the io.TextIOWrapper class for more information on this change.

Slurp answered 20/3, 2013 at 21:46 Comment(21)
You could also create a file like object that encapsulates this functionality and then use that in place of stdout/stderr in the call to Popen.Etching
@sr2222 - I like that idea too.... except now that I think about it..., they are operating system pipes, not python objects, so does that even work?Slurp
Why do you call proc.wait() AFTER having referenced proc.stdout? I'm a little confused where to use proc.wait() in general.Byrdie
@imagineerThis - The code reads stdout until it is closed and then waits for the program to exit. You read before wait so that you don't risk the pipe filling up and hanging the program. You wait after read for the final program exit and return code. If you don't wait, you'll get a zombie process (at least on linux).Slurp
Ok, I think I got the basic idea. Had to let it sit for a while.Byrdie
If I don't use wait(), then what will become a zombie process? The subprocess?Byrdie
@imagineerThis - yes, the subprocess joins the living dead. In unix/linux, when a process exits, it stays in the pid table until its parent reads its return code via the wait() function or until its parent exits. Run 'ps ax' and it will say 'defunct' with Z (zombie) in the STAT field. With a maximum 64k processes, zombies can kill your system if you have too many wandering the streets. Normally parents babysit their child processes but sometimes you want the children to be independent - so you daemonize them. Look at the python daemon module or google "fork off and die" for details.Slurp
awesome. thanks. I happen to be watching the amc show, the walking dead, right now. love the analogy to zombies.Byrdie
Doesn't the one line need to be "sys.stdout.write(line)"?Beset
you might need iter(proc.stdout.readline, '') (due to bug with a read-ahead buffer) and add bufsize=1 to print lines as soon as they are flushed by the child process. call proc.stdout.close() to avoid fd leaks.Philbo
@J.F.Sebastian I vaguely remember pipe issues being sorted out in the 2.6 time frame and since my usage is idiomatic I think I'll stubbornly stick with it. flushing() is a great idea if delays in writing the log file are intolerable, but is resource intensive and should be use only when needed (the users 'cat' example doesn't, btw). Finally, file handles close automatically when processes exit - explicit closes are not needed for scripts that keep files open until exit.Slurp
@tdelaney: no, it is not fixed. try the script: import time; print(1); time.sleep(1); print(2). Your version won't print 1 until the script exits. The word flush in my comment refers to buffers inside a child process that you have no direct control over. If the child doesn't flush its stdout then the output will be delayed. It might be fixed using pexpect, pty modules or stdbuf, unbuffer, script commands.Philbo
@J.F.Sebastian - wait, you were referring to some read-ahead bug supposedly worked-around by iter - the pty issue is completely different, exists for the 'tee' example given by the user as well as your work-around. And its not a bug, its the way the system is supposed to work. If you want a line buffered pipeline, you need to play the pty game but that's not the default.Slurp
@tdelaney: yes, it is two independent issues. iter workarounds the bug (just click the first link in my previous comment, to see it yourself). pty solves different problem (you should understand it if your read the last two links in the previous comment)Philbo
@tdelaney: I agree with @J.F.Sebastian that there's an issue here, at least for Python 2. See bugs.python.org/issue3907 for the relevant bug report, which is closed as 'works for me', with the advice to use the newer io module in Python 2.6 and later. Unfortunately, even in Python 2.7 the subprocess module doesn't use the io module for its PIPEs; as far as Python is concerned they're just file objects, and are still subject to the same difference between for line in my_pipe and for line in iter(my_pipe.readline, '').Perlman
@Silas Ray - how do you "create a file like object that encapsulates this functionality and then use that in place of stdout/stderr in the call to Popen"? I wan to my Popen (sub)process to run asynchronously from my main code path, but still write to 2 (or more) different files (and not just the terminal) concurrently.Imidazole
@JohnnyUtahh - Silas's suggestion doesn't work. The child process writes to the file descriptor, not any read or write on the file object. You can take my example, make it into a function and call it as a thread.Slurp
With python 3.4, I had to replace both write(line) by write(line.decode('utf-8')) to remove error TypeError:must be str,not bytesChirk
@Chirk Thanks for the note. The post could use a 3.x update. The pipes will return bytes that need decoding unless you set universal_newlines=True. I'll update the answer.Slurp
@Chirk The slightly better fix is to replace write(line) with write(line.decode(sys.stdout.encoding)). That guarantees to match the encoding of the string with stdout. Of course, universal_newlines=True makes it all moot.Trengganu
@Trengganu - Good point. Its not clear to me which encoding to pick. The program is writing to a pipe. Its reasonable to think that it comes up with the same encoding that python does, but then maybe not.Slurp
P
18

To emulate: subprocess.call("command 2>&1 | tee -a logfile", shell=True) without invoking the tee command:

#!/usr/bin/env python2
from subprocess import Popen, PIPE, STDOUT

p = Popen("command", stdout=PIPE, stderr=STDOUT, bufsize=1)
with p.stdout, open('logfile', 'ab') as file:
    for line in iter(p.stdout.readline, b''):
        print line,  #NOTE: the comma prevents duplicate newlines (softspace hack)
        file.write(line)
p.wait()

To fix possible buffering issues (if the output is delayed), see links in Python: read streaming input from subprocess.communicate().

Here's Python 3 version:

#!/usr/bin/env python3
import sys
from subprocess import Popen, PIPE, STDOUT

with Popen("command", stdout=PIPE, stderr=STDOUT, bufsize=1) as p, \
     open('logfile', 'ab') as file:
    for line in p.stdout: # b'\n'-separated lines
        sys.stdout.buffer.write(line) # pass bytes as is
        file.write(line)
Philbo answered 5/1, 2016 at 5:22 Comment(18)
you should mention that you can find the return code in p.returncode after it's done.Mandalay
@kdubs: it is unrelated to the question. Why do you think I "should mention" it?Philbo
while I agree he didn't ask for that, it seems one ought to check the return status. I was hoping to find it here. would seem to make the answer complete. perhaps "should" was strong.Mandalay
@Mandalay I agree that it is a good idea to check the exit status (that is why subprocess.check_call(), subprocess.check_output() functions exist which do it for you). I could have added if p.wait() != 0: raise subprocess.CalledProcessError(p.returncode, "command") but it would distract from the main point: how to emulate tee utility in Python.Philbo
Above Python 3 version: prints on screen after execution not liveIschia
@UjjawalKhare wrong. it is easy to see the output can be produced while the process is still running. It is likely that you've encountered the explicitly mentioned buffering issues.Philbo
@Philbo below is my script which calles another script using above example and produces entire output after execution completed while expectation is that it should produce live output as well as write in file at the same time.Ischia
./My_Tests.py ALL TC 1 : PASSED TC 2 : PASSED Tatal Passed Tests : 2 Tatal Failed Tests : 0 Inside My_Tests.py : with Popen("./tests.py", stdout=PIPE, stderr=STDOUT, bufsize=1) as p, \ open('My_Test.log', 'ab') as file: for line in p.stdout: # b'\n'-separated lines sys.stdout.buffer.write(line) # pass bytes as is file.write(line)Ischia
@UjjawalKhare if you don't understand that the code in the answer may produce output while the process is still running, ask a separate question.Philbo
@Philbo : just posted. please checkthe url : #47975733Ischia
@jfs: Would be looking for your answer, your code does what i want but i dont know how to fix the buffering issues. Please help with question asked in above linkIschia
@jfk My apologies. Dont take me oterwise. I said tgat for me it prints on screen after execution of script and you said its a buffer issue and now i am keen to learn how to fix this. A guidence from your side shall be highly appriciated.Ischia
@Philbo could you please help me ?Ischia
Let us continue this discussion in chat.Ischia
I don't know why but i tried above all suggestions and for my Python3.5 its not doing both the jobs simultaneously. It executes the entire code and then in the end prints the logs. Not sure If this is the buffer issue (as claimed by @Philbo above) and if yes, then i don't know how to fix so that it can print on terminal as well as write in file simultaneously. I have also posted a question and everyone is welcome to help me fix this. #47975733Ischia
@Philbo thanks, your solution was correct i just had to use it as a function and split my own script and then call this function everytime. Due to all tests were part of same file so it was printing all the outputs in the end. But now after splitting my script in chunks your solution works perfectly. 1000 likes for you. 😀Ischia
@UjjawalKhare here is a solution that prints interactive commands immediately: https://mcmap.net/q/25430/-how-to-write-to-stdout-and-to-log-file-simultaneously-with-popenWanderlust
Love the with Popen(...) as p : I didn't know you could put it in a context manager. It's a good trick to know.Woolfolk
W
5

Write to terminal byte by byte for interactive applications

This method write any bytes it gets to stdout immediately, which more closely simulates the behavior of tee, especially for interactive applications.

main.py

#!/usr/bin/env python3
import os
import subprocess
import sys
with subprocess.Popen(sys.argv[1:], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) as proc, \
        open('logfile.txt', 'bw') as logfile:
    while True:
        byte = proc.stdout.read(1)
        if byte:
            sys.stdout.buffer.write(byte)
            sys.stdout.flush()
            logfile.write(byte)
            # logfile.flush()
        else:
            break
exit_status = proc.returncode

sleep.py

#!/usr/bin/env python3
import sys
import time
for i in range(10):
    print(i)
    sys.stdout.flush()
    time.sleep(1)

First we can do a non-interactive sanity check:

./main.py ./sleep.py

And we see it counting to stdout on real time.

Next, for an interactive test, you can run:

./main.py bash

Then the characters you type appear immediately on the terminal as you type them, which is very important for interactive applications. This is what happens when you run:

bash | tee logfile.txt

Also, if you want the output to show on the ouptut file immediately, then you can also add a:

logfile.flush()

but tee does not do this, and I'm afraid it would kill performance. You can test this out easily with:

tail -f logfile.txt

Related question: live output from subprocess command

Tested on Ubuntu 18.04, Python 3.6.7.

Wanderlust answered 30/8, 2018 at 6:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.