In C++, calling fork when cin is a bash heredoc causes repeated input fragments
Asked Answered
S

1

11

I am implementing a shell-like program in C++. It has a loop that reads from cin, forks, and waits for the child.

This works fine if the input is interactive or if it's piped from another program. However, when the input is a bash heredoc, the program rereads parts of the input (sometimes indefinitely).

I understand that the child process inherits the parent's file descriptors, including shared file offset. However, the child in this example does not read anything from cin, so I think it shouldn't touch the offset. I'm kind of stumped about why this is happening.


test.cpp:

#include <iostream>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char **argv)
{
    std::string line;
    while (std::getline(std::cin, line)) {
        pid_t pid = fork();
        if (pid == 0) { // child
            break; // exit immediately
        }
        else if (pid > 0) { // parent
            waitpid(pid, nullptr, 0);
        }
        else { // error
            perror("fork");
        }

        std::cout << getpid() << ": " << line << "\n";
    }
    return 0;
}

I compile it as follows:

g++ test.cpp -std=c++11

Then I run it with:

./a.out <<EOF
hello world
goodbye world
EOF

Output:

7754: hello world
7754: goodbye world
7754: goodbye world

If I add a third line foo bar to the input command, the program gets stuck in an infinite loop:

13080: hello world
13080: goodbye world
13080: foo bar
13080: o world
13080: goodbye world
13080: foo bar
13080: o world
[...]

Versions:

  • Linux kernel: 4.4.0-51-generic
  • Ubuntu: 16.04.1 LTS (xenial)
  • bash: GNU bash, version 4.3.46(1)-release (x86_64-pc-linux-gnu)
  • gcc: g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Softhearted answered 5/12, 2016 at 0:59 Comment(13)
What happens if you do std::ios::sync_with_stdio(false); at the beginning and explicitly flush after your write to stdout? (e.g. change '\n' to std::endl)Okra
By stracing the child process, I can see exactly what's going on. The child process lseeks backwards on file descriptor 0 before exiting, which affects the parent process. Unfortunately, I don't know why the C library is doing that, so I'm not going to post the details as an answer. This also happens with an explicit exit(0); but not _exit(0);.Preemption
the answer here might be useful #33900048Abfarad
@Hurkyl yes, that fixes it! I guess because C library is seeking stdout (per @SamVarshavchik), turning off synchronization changes the child process's effect.Softhearted
@Abfarad Thanks, that's really helpful. So maybe the "right" way to fix this is calling fclose(stdin) etc at the beginning of the child, to prevent exit from seeking in the first place.Softhearted
@Kevin: Interesting! I wonder if the problem, then, is in the C library, or the interaction between the C and C++ libraries? What if you close cin after forking (I don't remember if you can do that; if not, you could set its badbit or something)? What if you rewrite the program to use the C io routines?Okra
... also, it's probably worth testing with a larger heredoc size; enough to overflow up a whole cin buffer, maybe twice.Okra
I couldn't reproduce this, but you absolutely should be _exit'ing or quick_exit'ting from the forked child if the child doesn't exec. The parent builds up cout buffer state and the children inherit it. If the children exit regularly, they will attempt to flush their copy of the cout buffer which should be getting flushed in the parent. If this happens, you will get duplicates in your output.Vertebrate
@PSkocik you just should flush buffers before a fork if this is a concern.Sharkey
@SamVarshavchik AFAIU failure to rewind stdin was considered a glibc bug back when.Sharkey
@Hurkyl the behavior persists when I add std::cin.setstate(std::ios::failbit); to the beginning of the child. It also happens when I rewrite the program to use getline and printf (compiled with both c++11 and c11). And the problem goes away when I make the child sleep instead of calling exit. So it seems like exit is the culprit here.Softhearted
don't use nullptr for C lib. nullptr is C++, use NULL when you use C function.Impatience
@KevinChen this has nothing to do with C++ streams. You need to call close(0) before exiting.Sharkey
O
2

I was able to reproduce this problem, not only using the heredoc but also using a standard file redirection.

Here is the test script that I used. In both the first and second cases, I got a duplication of the second line of input.

./a.out < Input.txt
echo

cat Input.txt | ./a.out
echo

./a.out <<EOF
hello world
goodbye world
EOF

Closing the stdin before exiting in the child seems to eliminate both of the problems.

#include <iostream>
#include <sstream>
#include <unistd.h>
#include <sys/wait.h>
#include <limits>

int main(int argc, char **argv)
{
    std::string line;
    while (std::getline(std::cin, line)) {
        pid_t pid = fork();
        if (pid == 0) { // child
            close(STDIN_FILENO);
            break; // exit after first closing stdin
        }
        else if (pid > 0) { // parent
            waitpid(pid, nullptr, 0);
        }
        else { // error
            perror("fork");
        }

        std::cout << getpid() << ": " << line << "\n";
    }
    return 0;
}
Osborn answered 25/12, 2016 at 2:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.