Sub-shell differences between bash and ksh
Asked Answered
A

4

9

I always believed that a sub-shell was not a child process, but another shell environment in the same process.

I use a basic set of built-ins:

(echo "Hello";read)

On another terminal:

ps -t pts/0
  PID TTY          TIME CMD
20104 pts/0    00:00:00 ksh

So, no child process in kornShell (ksh).

Enter bash, it appears to behave differently, given the same command:

  PID TTY          TIME CMD
 3458 pts/0    00:00:00 bash
20067 pts/0    00:00:00 bash

So, a child process in bash.
From reading the man pages for bash, it is obvious that another process is created for a sub-shell, however it fakes $$, which is sneeky.

Is this difference between bash and ksh expected, or am I reading the symptoms incorrectly?

Edit: additional information: Running strace -f on bash and ksh on Linux shows that bash calls clone twice for the sample command (it does not call fork). So bash might be using threads (I tried ltrace but it core dumped!). KornShell calls neither fork, vfork, nor clone.

Alkmaar answered 4/2, 2013 at 12:31 Comment(0)
S
10

ksh93 works unusually hard to avoid subshells. Part of the reason is the avoidance of stdio and extensive use of sfio which allows builtins to communicate directly. Another reason is ksh can in theory have so many builtins. If built with SHOPT_CMDLIB_DIR, all of the cmdlib builtins are included and enabled by default. I can't give a comprehensive list of places where subshells are avoided, but it's typically in situations where only builtins are used, and where there are no redirects.

#!/usr/bin/env ksh

# doCompat arr
# "arr" is an indexed array name to be assigned an index corresponding to the detected shell.
# 0 = Bash, 1 = Ksh93, 2 = mksh
function doCompat {
    ${1:+:} return 1
    if [[ ${BASH_VERSION+_} ]]; then
        shopt -s lastpipe extglob
        eval "${1}[0]="
    else
        case "${BASH_VERSINFO[*]-${!KSH_VERSION}}" in
            .sh.version)
                nameref v=$1
                v[1]=
                if builtin pids; then
                    function BASHPID.get { .sh.value=$(pids -f '%(pid)d'); }
                elif [[ -r /proc/self/stat ]]; then
                    function BASHPID.get { read -r .sh.value _ </proc/self/stat; }
                else
                    function BASHPID.get { .sh.value=$(exec sh -c 'echo $PPID'); }
                fi 2>/dev/null
                ;;
            KSH_VERSION)
                nameref "_${1}=$1"
                eval "_${1}[2]="
                ;&
            *)
                if [[ ! ${BASHPID+_} ]]; then
                    echo 'BASHPID requires Bash, ksh93, or mksh >= R41' >&2
                    return 1
                fi
        esac
    fi
}

function main {
    typeset -a myShell
    doCompat myShell || exit 1 # stripped-down compat function.
    typeset x

    print -v .sh.version
    x=$(print -nv BASHPID; print -nr " $$"); print -r "$x" # comsubs are free for builtins with no redirections 
    _=$({ print -nv BASHPID; print -r " $$"; } >&2)        # but not with a redirect
    _=$({ printf '%s ' "$BASHPID" $$; } >&2); echo         # nor for expansions with a redirect
    _=$(printf '%s ' "$BASHPID" $$ >&2); echo # but if expansions aren't redirected, they occur in the same process.
    _=${ { print -nv BASHPID; print -r " $$"; } >&2; }     # However, ${ ;} is always subshell-free (obviously).
    ( printf '%s ' "$BASHPID" $$ ); echo                   # Basically the same rules apply to ( )
    read -r x _ <<<$(</proc/self/stat); print -r "$x $$"   # These are free in {{m,}k,z}sh. Only Bash forks for this.
    printf '%s ' "$BASHPID" $$ | cat # Sadly, pipes always fork. It isn't possible to precisely mimic "printf -v".
    echo
} 2>&1

main "$@"

out:

Version AJM 93v- 2013-02-22
31732 31732
31735 31732
31736 31732 
31732 31732 
31732 31732
31732 31732 
31732 31732
31738 31732

Another neat consequence of all this internal I/O handling is some buffering issues just go away. Here's a funny example of reading lines with tee and head builtins (don't try this in any other shell).

 $ ksh -s <<\EOF
integer -a x
builtin head tee
printf %s\\n {1..10} |
    while head -n 1 | [[ ${ { x+=("$(tee /dev/fd/{3,4})"); } 3>&1; } ]] 4>&1; do
        print -r -- "${x[@]}"
    done
EOF
1
0 1
2
0 1 2
3
0 1 2 3
4
0 1 2 3 4
5
0 1 2 3 4 5
6
0 1 2 3 4 5 6
7
0 1 2 3 4 5 6 7
8
0 1 2 3 4 5 6 7 8
9
0 1 2 3 4 5 6 7 8 9
10
0 1 2 3 4 5 6 7 8 9 10
Shouldst answered 9/3, 2013 at 14:50 Comment(3)
Many thanks for this comprehensive answer. It is a pity I can only 'tick' one answer as accepted, and can only upvote you once.Alkmaar
ehe, I know the reputation system is messed up... thanks for asking an interesting question :PShouldst
@MarkReed: done. Sorry for the down vote you ended-up with. I'm really grateful to both of you.Alkmaar
P
12

In ksh, a subshell might or might not result in a new process. I don't know what the conditions are, but the shell was optimized for performance on systems where fork() was more expensive than it typically is on Linux, so it avoids creating a new process whenever it can. The specification says a "new environment", but that environmental separation may be done in-process.

Another vaguely-related difference is the use of new processes for pipes. In ksh and zsh, if the last command in a pipeline is a builtin, it runs in the current shell process, so this works:

$ unset x
$ echo foo | read x
$ echo $x
foo
$

In bash, all pipeline commands after the first are run in subshells, so the above doesn't work:

$ unset x
$ echo foo | read x
$ echo $x

$

As @dave-thompson-085 points out, you can get the ksh/zsh behavior in bash versions 4.2 and newer if you turn off job control (set +o monitor) and turn on the lastpipe option (shopt -s lastpipe). But my usual solution is to use process substitution instead:

$ unset x
$ read x < <(echo foo)
$ echo $x
foo
Protuberancy answered 4/2, 2013 at 13:33 Comment(3)
I was really after what the conditions are. Can you give a reference where you found that the shell was optimised to avoid fork?Alkmaar
No reference other than common knowledge; fork used to be a very expensive operation. Since ksh is open-source these days, I would just look at the code to see when it decides to fork. All Korn's book says is "A subshell is a separate environment that is a copy of the parent shell environment. ... A subshell environment need not be a separate process."Protuberancy
In bash 4.2 up if you shopt -s lastpipe and jobcontrol is off (usually implying noninteractive) and the pipeline is not backgrounded, it does try to run the last 'joint' in the current shell. unix.stackexchange.com/questions/143958/… gnu.org/software/bash/manual/html_node/…Dowel
S
10

ksh93 works unusually hard to avoid subshells. Part of the reason is the avoidance of stdio and extensive use of sfio which allows builtins to communicate directly. Another reason is ksh can in theory have so many builtins. If built with SHOPT_CMDLIB_DIR, all of the cmdlib builtins are included and enabled by default. I can't give a comprehensive list of places where subshells are avoided, but it's typically in situations where only builtins are used, and where there are no redirects.

#!/usr/bin/env ksh

# doCompat arr
# "arr" is an indexed array name to be assigned an index corresponding to the detected shell.
# 0 = Bash, 1 = Ksh93, 2 = mksh
function doCompat {
    ${1:+:} return 1
    if [[ ${BASH_VERSION+_} ]]; then
        shopt -s lastpipe extglob
        eval "${1}[0]="
    else
        case "${BASH_VERSINFO[*]-${!KSH_VERSION}}" in
            .sh.version)
                nameref v=$1
                v[1]=
                if builtin pids; then
                    function BASHPID.get { .sh.value=$(pids -f '%(pid)d'); }
                elif [[ -r /proc/self/stat ]]; then
                    function BASHPID.get { read -r .sh.value _ </proc/self/stat; }
                else
                    function BASHPID.get { .sh.value=$(exec sh -c 'echo $PPID'); }
                fi 2>/dev/null
                ;;
            KSH_VERSION)
                nameref "_${1}=$1"
                eval "_${1}[2]="
                ;&
            *)
                if [[ ! ${BASHPID+_} ]]; then
                    echo 'BASHPID requires Bash, ksh93, or mksh >= R41' >&2
                    return 1
                fi
        esac
    fi
}

function main {
    typeset -a myShell
    doCompat myShell || exit 1 # stripped-down compat function.
    typeset x

    print -v .sh.version
    x=$(print -nv BASHPID; print -nr " $$"); print -r "$x" # comsubs are free for builtins with no redirections 
    _=$({ print -nv BASHPID; print -r " $$"; } >&2)        # but not with a redirect
    _=$({ printf '%s ' "$BASHPID" $$; } >&2); echo         # nor for expansions with a redirect
    _=$(printf '%s ' "$BASHPID" $$ >&2); echo # but if expansions aren't redirected, they occur in the same process.
    _=${ { print -nv BASHPID; print -r " $$"; } >&2; }     # However, ${ ;} is always subshell-free (obviously).
    ( printf '%s ' "$BASHPID" $$ ); echo                   # Basically the same rules apply to ( )
    read -r x _ <<<$(</proc/self/stat); print -r "$x $$"   # These are free in {{m,}k,z}sh. Only Bash forks for this.
    printf '%s ' "$BASHPID" $$ | cat # Sadly, pipes always fork. It isn't possible to precisely mimic "printf -v".
    echo
} 2>&1

main "$@"

out:

Version AJM 93v- 2013-02-22
31732 31732
31735 31732
31736 31732 
31732 31732 
31732 31732
31732 31732 
31732 31732
31738 31732

Another neat consequence of all this internal I/O handling is some buffering issues just go away. Here's a funny example of reading lines with tee and head builtins (don't try this in any other shell).

 $ ksh -s <<\EOF
integer -a x
builtin head tee
printf %s\\n {1..10} |
    while head -n 1 | [[ ${ { x+=("$(tee /dev/fd/{3,4})"); } 3>&1; } ]] 4>&1; do
        print -r -- "${x[@]}"
    done
EOF
1
0 1
2
0 1 2
3
0 1 2 3
4
0 1 2 3 4
5
0 1 2 3 4 5
6
0 1 2 3 4 5 6
7
0 1 2 3 4 5 6 7
8
0 1 2 3 4 5 6 7 8
9
0 1 2 3 4 5 6 7 8 9
10
0 1 2 3 4 5 6 7 8 9 10
Shouldst answered 9/3, 2013 at 14:50 Comment(3)
Many thanks for this comprehensive answer. It is a pity I can only 'tick' one answer as accepted, and can only upvote you once.Alkmaar
ehe, I know the reputation system is messed up... thanks for asking an interesting question :PShouldst
@MarkReed: done. Sorry for the down vote you ended-up with. I'm really grateful to both of you.Alkmaar
T
2

The bash manpage reads:

Each command in a pipeline is executed as a separate process (i.e., in a subshell).

While this sentence is about pipes, it strongly implies a subshell is a separate process.

Wikipedia's disambiguation page also describes a subshell in child-process terms. A child process is certainly itself a process.

The ksh manpage (at a glance) isn't direct about its own definition of a subshell, so it does not imply one way or the other that a subshell is a different process.

Learning the Korn Shell says that they are different processes.

I'd say you're missing something (or the book is wrong or out of date).

Thaine answered 4/2, 2013 at 13:3 Comment(4)
OK, if they are different processes then why is only one shown in the ps?Alkmaar
In addition, strace show that ksh does not call fork, vfork, or clone (on Linux) for the sample subshell.Alkmaar
To be fair, it could be version specific. I have asked on [ksh-users] and I'll post any replies here.Alkmaar
@Alkmaar It somewhat depends on your OS and how ksh was built. I believe it uses posix_spawn if available.Shouldst
P
1

The Korn shell does not necessarily use a subshell for command substitution. They are usually handled in the same process. Exceptions include I/O operations

To go a bit farther, I had a command giving a variable value that looked like this, in ksh93, from a VERY old script:

my_variable=(`cat ./my_file`)

In other words, parentheses around the backticked command substitution. "my_file" is a list of 4-digit octal numbers, one to a line.

When this is supplied this way in ksh93t and later, the newlines are preserved, and you can step through the numbers in the variable using a counter. For example, the following code would give a 4 digit octal number from the list discussed above, after which, you would increment the counter:

data_I_want=$(echo "${my_variable[$my_counter]}")

In ksh93, the command for the variable can also be done with this:

my_variable=($(cat ./my_file))

and, finally, to eliminate the "useless use of cat",

my_variable=($(<./my_file))

If the command is structured without the outer parentheses, the newlines are stripped (a POSIX standard), and the first use of the variable includes all of the numbers from the file. Subsequent calls to the variable using the counter return null values.

Putting the command inside parentheses forces the use of a subshell in a new process, and skirts the necessity of resetting the default field separator using IFS="".

Sorry for bumping something so old, but it seemed worthwhile to include this, as I haven't seen this particular behavior discussed elsewhere.

Pau answered 5/2, 2021 at 19:31 Comment(1)
Thanks to Robert, for cleaning up my code blocks.Pau

© 2022 - 2024 — McMap. All rights reserved.