Script output is buffered into one message, despite separate echo statements?
Asked Answered
A

7

9

I have a shell script with three echo statements:

echo 'first message'

echo 'second message'

echo 'third message'

I then run this script in node and collect the output via this code:

var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   console.log(data);
});

But the singular output is "first message\nsecond message\nthird message\n", which is a problem. I expected three outputs, not one smushed together due to some form of buffering. And I can't just split on newlines, because the individual outputs may contain newlines.

Is there any way to distinguish the messages of individual echo statements? (or other output commands, i.e. printf, or anything that causes data to be written to stdout or stderror)

Edit: I have tried unbuffer and stdbuf, neither work for this, as simple testing can demonstrate. Here is an example of the stdbuf attempt, which I tried with a variety of different argument values, essentially all possible options.

 var child = process.spawn('stdbuf', ['-i0', '-o0', '-e0', './test.sh']);

To be clear, this problem happens when I run a python script from node, too, with just three simple print statements. So it's language-agnostic, it's not about bash scripting in particular. It's about successfully detecting the individual outputs of a script in any language on a unix-based system. If this is something C/C++ can do and I have to hook into that from node, I'm willing to go there. Any working solution is welcome.


Edit: I solved the problem for myself initially by piping the script's output to sed and using s/$/uniqueString to insert an identifier at the end of each individual output, then just splitting the received data on that identifier.

The answer I gave the bounty to will work on single-line outputs, but breaks on multi-line outputs. A mistake in my testing led me to think was not the case, but it is. The accepted answer is the better solution and will work on outputs of any size. But if you can't control the script and have to handle user-created scripts, then my sed solution is the only thing I've found that works. And it does work, quite well.

Argyres answered 8/5, 2019 at 1:17 Comment(2)
You know the easiest solution is you can just add sleep 1 between each echo statement and it works.Splenic
It may technically allow you to detect separate echo statements, but it comes at the cost of arbitrarily slowing your program down, potentially by orders of magnitude. It's a terrible solution and strongly recommended against in the presence of many superior alternatives.Argyres
B
1

I ran into the same problem on a previous project. I used the interpretation switch on the echo statement and then split the string on a non-printable character.

Example:

echo -e 'one\u0016'

echo -e "two\u0016"

echo -e 'three\u0016'

Result:

"one\u0016\ntwo\u0016\nthree\u0016\n"

And the corresponding Javascript:

var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   var value = data.toString('utf8');
   var values = value.split("\u0016\n").filter(item => item);
   console.log(values);
});
Bullivant answered 16/5, 2019 at 2:25 Comment(1)
Yep, same principle I used, except I did it with sed.Argyres
P
5

You can use the readline interface provided as part of the node APIs. More information here https://nodejs.org/api/readline.html#readline_event_line. You will use spawn as it is however pass the stdout to readline so that it can parse the lines. Not sure if this is what you intend to do. Here is some sample code:

var process = require('child_process');
const readline = require('readline');

var child = process.spawn('./test.sh');

// Use readline interface
const readlinebyline = readline.createInterface({ input: child.stdout });

// Called when a line is received
readlinebyline.on('line', (line) => {
    line = JSON.stringify(line.toString('utf8'));
    console.log(line);
});

Output:

"first message"
"second message"
"third message"

If you get an error like TypeError: input.on is not a function, make sure you have executing privileges on the test.sh script via chmod +x test.sh.

Protestantism answered 10/5, 2019 at 14:57 Comment(10)
I actually get TypeError: input.on is not a function from readline.js:189. Not sure what to make of that.Argyres
what is your node version?Protestantism
10.15.2. I'm reading various StackOverflow questions trying to see if there's a solution to that error, like this one.Argyres
Did you use my example as it is or did you make some modifications?Protestantism
No modifications.Argyres
I tried using 10.15.2 on mac with the above example and it seems to work for meProtestantism
@Argyres Can you try chmod +x test.sh first in your shell to see if you have permissions?Protestantism
@Argyres Did you try what I mentioned above?Protestantism
Oddly, when I test again I get different results. I'm not sure why, but now readline is behaving as expected and reading one line at a time, so a single echo statement with multiple lines of output fires multiple times instead of once.Argyres
Perhaps the stream module would be better, or something like that. Web servers receive individual writes to sockets....surely there's a way to do this with one of the various node interfaces.Argyres
W
1

The C library that underlies bash and python is the one that does per-line buffering of stdout. stdbuf and unbuffer would deal with that, but not the buffering done by the operating system.

Linux, for example, allocates 4096 bytes as the buffer for the pipe between your node.js process and the bash process.

Truth is, there's no honest way for a process on one end of the pipe (node.js) to see individual writes (echo calls) on the other end. This isn't the right design (you could communicate via individual files instead of stdout).

If you insist, you can try and fool the OS scheduler: if nothing is even remotely close to writing to the pipe, then it will schedule-in the reader process (node.js) which will read what's currently in the OS buffer.

I tested this on Linux:

$ cat test.sh 
echo 'first message'
sleep 0.1
echo 'second message'
sleep 0.1
echo 'third message'
$ cat test.js 
const  child_process  = require('child_process');
var child = child_process.spawn(`./test.sh`);
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   global.process.stdout.write(data); // notice global object
});
$ node test.js
"first message\n""second message\n""third message\n"
Whang answered 12/5, 2019 at 1:27 Comment(3)
Even I thought of this solution, but I thought making changes to external files is not a good approach.Yasukoyataghan
Can you elaborate a little on "This isn't the right design (you could communicate via individual files instead of stdout)." ? I'm intrigued by it, but adding sleep statements to the script is definitely not the right solution here.Argyres
@Argyres "isn't the right design" the writer (echo) needs to frame individual messages - which is what the accepted solution does.Whang
B
1

I ran into the same problem on a previous project. I used the interpretation switch on the echo statement and then split the string on a non-printable character.

Example:

echo -e 'one\u0016'

echo -e "two\u0016"

echo -e 'three\u0016'

Result:

"one\u0016\ntwo\u0016\nthree\u0016\n"

And the corresponding Javascript:

var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   var value = data.toString('utf8');
   var values = value.split("\u0016\n").filter(item => item);
   console.log(values);
});
Bullivant answered 16/5, 2019 at 2:25 Comment(1)
Yep, same principle I used, except I did it with sed.Argyres
L
1

If you expect the output from test.sh to be always sent by line then IMHO your best choice is to use readline

const readline = require('readline');
const {spawn} = require('child_process');

const child = spawn('./test.sh');
const rl = readline.createInterface({
    input: child.stdout
});

rl.on('line', (input) => {
    console.log(`Received: ${input}`);
});
Lette answered 16/5, 2019 at 22:32 Comment(1)
Correct solution, and thank you, but the bounty winner submitted his answer earlier. Upvoted though! ¯\_(ツ)_/¯Argyres
G
0

Do not use console.log:

const  process_module  = require('child_process');

var child = process_module.spawn('./test.sh');
child.stdout.on('data', data => {
   process.stdout.write(data);
});

UPDATE (just to show the difference between process module and process global object):

const process = require('child_process');

var child = process.spawn(`./test.sh`);
child.stdout.on('data', data => {
   global.process.stdout.write(data); // notice global object
});

The files I've used to test this script are:

Python:

#!/usr/bin/env python

print("first message")
print("second message")
print("third message")

Bash:

#!/usr/bin/env bash

echo 'first message'
echo 'second message'
echo 'third message'

The output:

first message
second message
third message

Make sure, they are executable scripts with:

chmod a+x test.sh
chmod a+x test.py
Guardianship answered 10/5, 2019 at 16:10 Comment(5)
Did you mean process_module.stdout.write(data)?Argyres
Either way, stdout is undefined for me on the child process.Argyres
No, you read wrong, I'm using the global process variable for writing: process.stdout.write(data). The process_module is different, that's why I named differently. I've updated the answer to show you the difference.Guardianship
Sorry, changing process.stdout.write(data); to process.stdout.write("message: " + data); makes it clear that it is not firing once for each echo statement.Argyres
It's because of the message size, the data event is not supposed to be called per every script sentence, but blocks of data.Guardianship
S
0

There is a very simple solution to this. Simply add a sleep 1 to your bash script and the .on('data') handler won't combine the outputs.

So script like this:

#/bin/bash
echo 'first message'
sleep 1
echo 'second message'
sleep 1
echo 'third message'

And your exact script (with fix to missing require('child_process');

var process = require('child_process');
var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   console.log(data);
});
Splenic answered 16/5, 2019 at 22:51 Comment(0)
I
0

If you're trying to split-interpret each message, this might help: (I don't have much experience with node, sorry if I got something wrong)

test.sh:

#!/bin/bash
echo -n 'first message'
echo -ne '\0'
echo -n 'second message'
echo -ne '\0'
echo -n 'third message'
echo -ne '\0'

node:

var child = process.spawn('./test.sh');
var data_buffer  = Buffer.from([]);
var data_array   = [];
child.stdout.on('data', data => {
  data_buffer   += data;
  while (data_buffer.includes("\0")) {
    let i        = data_buffer.indexOf("\0");
    let s        = data_buffer.slice(0,i);
    data_array.push(s);
    data_buffer  = data_buffer.slice(i+1);
    let json     = JSON.stringify(s.toString('utf8'));
    console.log('--8<-------- split ------------');
    console.log('index: '+i);
    console.log('received: '+s);
    console.log('json: '+json);
    console.log(data_array);
  }
});

This would essentially use NULL-delimited strings instead of newline-delimited. Another option would be to utilize the use of IFS, but i failed at achieving this. This method will save you from the need to use readline.

One thing to note is that you would have to store all the received data in a global variable since you can't control how the chunks of data arrive (I don't know if there's a way to control that). having said that you can reduce the size of it by cutting the already interpreted part of it, hence the second slice.

For this to work, of course you have to make sure you don't have any null characters in your data. But you can change the delimiting character if you do.

This approach, I think is more thorough IMHO.

If you needed python3:

#!/usr/bin/python3
print("first message", end = '\x00')
print("second message", end = '\x00')
print("third message", end = '\x00')
Incapable answered 17/5, 2019 at 0:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.