SGE script: print to file during execution (not just at the end)?
Asked Answered
S

8

5

I have an SGE script to execute some python code, submitted to the queue using qsub. In the python script, I have a few print statements (updating me on the progress of the program). When I run the python script from the command line, the print statements are sent to stdout. For the sge script, I use the -o option to redirect the output to a file. However, it seems that the script will only send these to the file after the python script has completed running. This is annoying because (a) I can no longer see real time updates on the program and (b) if my job does not terminate correctly (for example if my job gets kicked off the queue) none of the updates are printed. How can I make sure that the script is writing to the file each time it I want to print something, as opposed to lumping it all together at the end?

Strident answered 26/3, 2012 at 17:40 Comment(0)
C
7

I think you are running into an issue with buffered output. Python uses a library to handle it's output, and the library knows that it's more efficient to write a block at a time when it's not talking to a tty.

There are a couple of ways to work around this. You can run python with the "-u" option (see the python man page for details), for example, with something like this as the first line of your script:

#! /usr/bin/python -u

but this doesn't work if you are using the "/usr/bin/env" trick because you don't know where python is installed.

Another way is to reopen the stdout with something like this:

import sys 
import os 

# reopen stdout file descriptor with write mode 
# and 0 as the buffer size (unbuffered) 
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) 

Note the bufsize parameter of os.fdopen being set to 0 to force it to be unbuffered. You can do something similar with sys.stderr.

Courtenay answered 26/3, 2012 at 18:4 Comment(1)
Thanks! Didn't realize this had anything to do with python. Also found this post helpful https://mcmap.net/q/20922/-disable-output-bufferingStrident
E
6

As others mentioned, it is out of performance reasons to not always write the stdout when not connected to a tty.

If you have a specific point at which you want the stdout to be written, you can force that by using

import sys
sys.stdout.flush()

at that point.

Endoplasm answered 18/12, 2015 at 17:24 Comment(0)
A
3

I just encountered a similar issue with SGE, and no suggested method to "unbuffer" the file IO seemed to work for me. I had to wait until the end of program execution to see any output.

The workaround I found was to wrap sys.stdout into a custom object that re-implements the "write" method. Instead of actually writing to stdout, this new method instead opens the file where IO is redirected, appends with the desired data, and then closes the file. It's a bit ugly, but I found it solved the problem, since the actual opening/closing of the file forces IO to be interactive.

Here's a minimal example:

import os, sys, time

class RedirIOStream:
  def __init__(self, stream, REDIRPATH):
    self.stream = stream
    self.path = REDIRPATH
  def write(self, data):
    # instead of actually writing, just append to file directly!
    myfile = open( self.path, 'a' )
    myfile.write(data)
    myfile.close()
  def __getattr__(self, attr):
    return getattr(self.stream, attr)


if not sys.stdout.isatty():
  # Detect redirected stdout and std error file locations!
  #  Warning: this will only work on LINUX machines
  STDOUTPATH = os.readlink('/proc/%d/fd/1' % os.getpid())
  STDERRPATH = os.readlink('/proc/%d/fd/2' % os.getpid())
  sys.stdout=RedirIOStream(sys.stdout, STDOUTPATH)
  sys.stderr=RedirIOStream(sys.stderr, STDERRPATH)


# Simple program to print msg every 3 seconds
def main():    
  tstart = time.time()
  for x in xrange( 10 ):  
    time.sleep( 3 )
    MSG = '  %d/%d after %.0f sec' % (x, args.nMsg,  time.time()-tstart )
    print MSG

if __name__ == '__main__':
  main()
Agone answered 29/10, 2012 at 17:45 Comment(0)
P
3

This is SGE buffering the output of your process, it happens whether its a python process or any other.

In general you can decrease or disable the buffering in SGE by changing it and recompiling. But its not a great thing to do, all that data is going to be slowly written to disk affecting your overall performance.

Phocis answered 11/3, 2013 at 17:33 Comment(0)
S
1

Why not print to a file instead of stdout?

outFileID = open('output.log','w')
print(outFileID,'INFO: still working!')
print(outFileID,'WARNING: blah blah!')

and use

tail -f output.log
Sweepback answered 24/12, 2015 at 1:57 Comment(0)
P
0

This works for me:

class ForceIOStream:
    def __init__(self, stream):
        self.stream = stream

    def write(self, data):
        self.stream.write(data)
        self.stream.flush()
        if not self.stream.isatty():
            os.fsync(self.stream.fileno())

    def __getattr__(self, attr):
        return getattr(self.stream, attr)


sys.stdout = ForceIOStream(sys.stdout)
sys.stderr = ForceIOStream(sys.stderr)

and the issue has to do with NFS not syncing data back to the master until a file is closed or fsync is called.

Planogamete answered 20/7, 2015 at 19:59 Comment(0)
U
0

I hit this same problem today and solved it by just writing to disk instead of printing:

with open('log-file.txt','w') as out:
  out.write(status_report)
Uterus answered 28/7, 2017 at 0:34 Comment(0)
T
0

print() supports the argument flush since Python 3.3 (documentation). So, to force flush the stream:

print('Hello World!', flush=True)
Tying answered 12/4, 2022 at 12:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.