Multiple inputs and outputs in python subprocess communicate
Asked Answered
A

5

34

I need to do something like this post, but I need to create a subprocess that can be given input and give output many times. The accepted answer of that post has good code...

from subprocess import Popen, PIPE, STDOUT

p = Popen(['grep', 'f'], stdout=PIPE, stdin=PIPE, stderr=STDOUT)    
grep_stdout = p.communicate(input=b'one\ntwo\nthree\nfour\nfive\nsix\n')[0]
print(grep_stdout.decode())

# four
# five

...that I would like to continue like this:

grep_stdout2 = p.communicate(input=b'spam\neggs\nfrench fries\nbacon\nspam\nspam\n')[0]
print(grep_stdout2.decode())

# french fries

But alas, I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/subprocess.py", line 928, in communicate
    raise ValueError("Cannot send input after starting communication")
ValueError: Cannot send input after starting communication

The proc.stdin.write() method not enable you to collect output, if I understand correctly. What is the simplest way to keep the lines open for ongoing input/output?

Edit: ====================

It looks like pexpect is a useful library for what I am trying to do, but I am having trouble getting it to work. Here is a more complete explanation of my actual task. I am using hfst to get grammar analyses of individual (Russian) words. The following demonstrates its behavior in a bash shell:

$ hfst-lookup analyser-gt-desc.hfstol
> слово
слово   слово+N+Neu+Inan+Sg+Acc 0.000000
слово   слово+N+Neu+Inan+Sg+Nom 0.000000

> сработай
сработай    сработать+V+Perf+IV+Imp+Sg2 0.000000
сработай    сработать+V+Perf+TV+Imp+Sg2 0.000000

> 

I want my script to be able to get the analyses of one form at a time. I tried code like this, but it is not working.

import pexpect

analyzer = pexpect.spawnu('hfst-lookup analyser-gt-desc.hfstol')
for newWord in ['слово','сработай'] :
    print('Trying', newWord, '...')
    analyzer.expect('> ')
    analyzer.sendline( newWord )
    print(analyzer.before)

# trying слово ...
# 
# trying сработай ...
# слово
# слово слово+N+Neu+Inan+Sg+Acc 0.000000
# слово слово+N+Neu+Inan+Sg+Nom 0.000000
# 
# 

I obviously have misunderstood what pexpect.before does. How can I get the output for each word, one at a time?

Annunciate answered 19/2, 2015 at 20:13 Comment(8)
"The proc.stdin.write() method not enable you to collect output, " You can still get output, you just have to get it from proc.stdout and proc.stderr.Marquand
Is this windows or linux? On linux, the pexpect module is a good choice for subprocess interaction.Reinhardt
what are you trying to do?Boutique
mandatory reading if you want "multiple input and outputs": Q: Why not just use a pipe (popen())?Lamellicorn
@PadraicCunningham I am trying to get a proof of concept example that I can expand to use in a script that will need to interact with my process 1,000,000+ times. My current script uses subprocess.check_output, which is much too slow, since it has to initiate the process for each interaction.Annunciate
@Reinhardt This is on OS X.Annunciate
the sequence should be: 0. wait for the first prompt 1. send word 2. wait for the prompt, get response (.after?) for the word 3. repeat 1-2Lamellicorn
сработай :D сработай ;[Jecho
A
22

This answer should be attributed to @J.F.Sebastian. Thanks for the comments!

The following code got my expected behavior:

import pexpect

analyzer = pexpect.spawn('hfst-lookup analyser-gt-desc.hfstol', encoding='utf-8')
analyzer.expect('> ')

for word in ['слово', 'сработай']:
    print('Trying', word, '...')
    analyzer.sendline(word)
    analyzer.expect('> ')
    print(analyzer.before)
Annunciate answered 24/2, 2015 at 8:1 Comment(5)
AttributeError: module 'pexpect' has no attribute 'spawnu'Jedediah
@Jedediah it looks like pexpect.spawnu was deprecated in favor of using spawn(encoding='utf-8'). I updated the answer accordingly. However, it is still in the source code (github.com/pexpect/pexpect/blob/master/pexpect/…), so I wonder if you've installed pexpect correctly.Annunciate
Expect if good to work to simulate interaction with a terminal, for programs which will change their behavior when they are in a pipe line. However, it's not good if you communicate with binary data.Prestissimo
@Prestissimo for binary data, just omit the encoding='utf-8' argument. By default, spawn expects binary.Annunciate
I don't think it's that simple because of the new lines processing. I didn't check it right now, but I believe even without the encoding pexpect will convert the end of lines characters to windows like end of line (\r\n), and you might also have problems if the end of you data is not a new line.Prestissimo
R
34

Popen.communicate() is a helper method that does a one-time write of data to stdin and creates threads to pull data from stdout and stderr. It closes stdin when its done writing data and reads stdout and stderr until those pipes close. You can't do a second communicate because the child has already exited by the time it returns.

An interactive session with a child process is quite a bit more complicated.

One problem is whether the child process even recognizes that it should be interactive. In the C libraries that most command line programs use for interaction, programs run from terminals (e.g., a linux console or "pty" pseudo-terminal) are interactive and flush their output frequently, but those run from other programs via PIPES are non-interactive and flush their output infrequently.

Another is how you should read and process stdout and stderr without deadlocking. For instance, if you block reading stdout, but stderr fills its pipe, the child will halt and you are stuck. You can use threads to pull both into internal buffers.

Yet another is how you deal with a child that exits unexpectedly.

For "unixy" systems like linux and OSX, the pexpect module is written to handle the complexities of an interactive child process. For Windows, there is no good tool that I know of to do it.

Reinhardt answered 19/2, 2015 at 20:33 Comment(4)
you can deadlock even with only stdin/stdout e.g., how do you know when to read from grep's stdout after you've written something? Also, the child process may bypass stdin/stdout completely (typical example: a password prompt). See the link in the comment above. In many cases these issues could be resolved using threads, fcntl, async.io: select/poll/epoll/kqueue/iocp and/or pty. And sometimes it is enough to be carefulLamellicorn
@J.F.Sebastian - and then there are programs that read the terminal type to colorize or full-screen output. It can be a challenge.Reinhardt
if a program does not provide options to override such behavior (such as --color); it can be considered a bug. The default behavior should be for an interactive user – less typing, concise output. Full screen shouldn't be used unless it is absolutely necessary.Lamellicorn
Note to Alice Carvalho who edited my post - you have an interesting contribution dealing with pexpect updates, but its not part of my answer. So, a comment or alternate answer would both be good.Reinhardt
A
22

This answer should be attributed to @J.F.Sebastian. Thanks for the comments!

The following code got my expected behavior:

import pexpect

analyzer = pexpect.spawn('hfst-lookup analyser-gt-desc.hfstol', encoding='utf-8')
analyzer.expect('> ')

for word in ['слово', 'сработай']:
    print('Trying', word, '...')
    analyzer.sendline(word)
    analyzer.expect('> ')
    print(analyzer.before)
Annunciate answered 24/2, 2015 at 8:1 Comment(5)
AttributeError: module 'pexpect' has no attribute 'spawnu'Jedediah
@Jedediah it looks like pexpect.spawnu was deprecated in favor of using spawn(encoding='utf-8'). I updated the answer accordingly. However, it is still in the source code (github.com/pexpect/pexpect/blob/master/pexpect/…), so I wonder if you've installed pexpect correctly.Annunciate
Expect if good to work to simulate interaction with a terminal, for programs which will change their behavior when they are in a pipe line. However, it's not good if you communicate with binary data.Prestissimo
@Prestissimo for binary data, just omit the encoding='utf-8' argument. By default, spawn expects binary.Annunciate
I don't think it's that simple because of the new lines processing. I didn't check it right now, but I believe even without the encoding pexpect will convert the end of lines characters to windows like end of line (\r\n), and you might also have problems if the end of you data is not a new line.Prestissimo
J
12

Whenever you want to send input to the process, use proc.stdin.write(). Whenever you want to get output from the process, use proc.stdout.read(). Both stdin and stdout arguments to the constructor need to be set to PIPE.

Jeremiahjeremias answered 19/2, 2015 at 20:17 Comment(2)
It works perfectly. The proc takes one input and send out one output.Venable
This should be used with caution with line splitters and flushing the PIPE as it may create deadlocks. More could be reached in this great blog post: eli.thegreenplace.net/2017/…Parlour
T
2

HFST has Python bindings: https://pypi.python.org/pypi/hfst

Using those should avoid the whole flushing issue, and will give you a cleaner API to work with than parsing the string output from pexpect.

From the Python REPL, you can get some doc's on the bindings with

dir(hfst)
help(hfst.HfstTransducer)

or read https://hfst.github.io/python/3.12.2/QuickStart.html

Snatching the relevant parts of the docs:

istr = hfst.HfstInputStream('hfst-lookup analyser-gt-desc.hfstol')
transducers = []
while not (istr.is_eof()):
    transducers.append(istr.read())
istr.close()
print("Read %i transducers in total." % len(transducers))
if len(transducers) == 1:
  out = transducers[0].lookup_optimize("слово")
  print("got %s" % (out,))
else: 
  pass # or handle >1 fst in the file, though I'm guessing you don't use that feature
Tournedos answered 22/11, 2017 at 8:34 Comment(0)
A
2

Write one line to stdin, read one line from stdout, loop

This is a common pattern to interact with CLI programs that take one input line at a time and immediately output the corresponding output line.

Based on this example by Ustad Eli:

main.py

from subprocess import Popen, PIPE, STDOUT
import time

p = Popen(['./upline.py'], stdout=PIPE, stdin=PIPE)

p.stdin.write('Hello world\n'.encode())
p.stdin.flush()
print(p.stdout.readline().decode()[:-1])

time.sleep(1)

p.stdin.write('bonne journeé\n'.encode())
p.stdin.flush()
print(p.stdout.readline().decode()[:-1])

time.sleep(1)

p.stdin.write('goodbye world\n'.encode())
p.stdin.flush()
print(p.stdout.readline().decode()[:-1])

upline.py

#!/usr/bin/env python

import sys

for line in sys.stdin:
    print(line.upper(), end='', flush=True)

Output:

HELLO WORLD
BONNE JOURNEÉ
GOODBYE WORLD

HELLO WORLD appears instantly, then it waits one second, then BONNE JOURNEÉ (quick UTF-8 test), then another second later GOODBYE WORLD.

Tested on Python 3.11.4, Ubuntu 23.04.

Abamp answered 2/11, 2023 at 10:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.