I need to launch a pdftk process while serving a web request in Django, and wait for it to finish. My current pdftk code looks like this:
proc = subprocess.Popen(["/usr/bin/pdftk",
"/tmp/infile1.pdf",
"/tmp/infile2.pdf",
"cat", "output", "/tmp/outfile.pdf"])
proc.communicate()
This works fine, as long as I'm executing under the dev server (running as user www-data
). But as soon as I switch to mod_wsgi, changing nothing else, the code hangs at proc.communicate()
, and "outfile.pdf" is left as an open file handle of zero length.
I've tried a several variants of the subprocess invocation (as well as plain old os.system) -- setting stdin/stdout/stderr to PIPE or to various file handles changes nothing. Using "shell=True" prevents proc.communicate()
from hanging, but then pdftk fails to create the output file, both under the devserver or mod_wsgi. This discussion seems to indicate there might be some deeper voodoo going on with OS signals and pdftk that I don't understand.
Are there any workarounds to get a subprocess call like this to work properly under wsgi? I'm avoiding using PyPDF to combine pdf files, because I have to combine large enough numbers of files (several hundred) that it runs out of memory (PyPDF needs to keep every source pdf file open in memory while combining them).
I'm doing this under recent Ubuntu, pythons 2.6 and 2.7.