When does command substitution spawn more subshells than the same commands in isolation?

Asked 24/1, 2014 at 11:5 Answered 25/1, 2014 at 4:49

bash shell optimization subshell command-substitution

Yesterday it was suggested to me that using command substitution in bash causes an unnecessary subshell to be spawned. The advice was specific to this use case:

# Extra subshell spawned
foo=$(command; echo $?)

# No extra subshell
command
foo=$?

As best I can figure this appears to be correct for this use case. However, a quick search trying to verify this leads to reams of confusing and contradictory advice. It seems popular wisdom says ALL usage of command substitution will spawn a subshell. For example:

The command substitution expands to the output of commands. These commands are executed in a subshell, and their stdout data is what the substitution syntax expands to. (source)

This seems simple enough unless you keep digging, on which case you'll start finding references to suggestions that this is not the case.

Command substitution does not necessarily invoke a subshell, and in most cases won't. The only thing it guarantees is out-of-order evaluation: it simply evaluates the expressions inside the substitution first, then evaluates the surrounding statement using the results of the substitution. (source)

This seems reasonable, but is it true? This answer to a subshell related question tipped me off that man bash has this to note:

Each command in a pipeline is executed as a separate process (i.e., in a subshell).

This brings me to the main question. What, exactly, will cause command substitution to spawn a subshell that would not have been spawned anyway to execute the same commands in isolation?

Please consider the following cases and explain which ones incur the overhead of an extra subshell:

# Case #1
command1
var=$(command1)

# Case #2
command1 | command2
var=$(command1 | command2)

# Case #3
command1 | command 2 ; var=$?
var=$(command1 | command2 ; echo $?)

Do each of these pairs incur the same number of subshells to execute? Is there a difference in POSIX vs. bash implementations? Are there other cases where using command substitution would spawn a subshell where running the same set of commands in isolation would not?

Troche answered 24/1, 2014 at 11:5 Comment(7)

This question has some related info: Command substitution vs process substitution – Troche 24/1, 2014 at 11:17

I don't think I would take your second quote as any kind of authoritative information about how bash is implemented. However, I would note that subshell != process; a subshell (in the sense of a new scope for variables) is not required to spawn a new process to run it. (This is the third point made in the accepted answer to your linked question.) – Monocarpic 24/1, 2014 at 12:18

@Monocarpic If I had taken that as authoritative I probably wouldn't be asking here. The point here is to get the issue de-mystified a bit in a peer reviewed environment. I realize what a subshell even is is probably something an answer will need to clarify on order to give a sensible explanation as to when "unnecessary" resources are being used. – Troche 24/1, 2014 at 12:22

The second quote's comment "There aren't any builtins that explicitly mean 'subshell';" is at best misleading and in my view flat-out wrong (and has caused me not to read the rest of the article). The ( ... ) explicitly creates a sub-shell; the commands within the parentheses must be executed in a sub-shell (meaning that any changes made to variables etc must not affect the main shell). That used to be done by forking and letting the child execute the contents of the sub-shell script while the parent waits for it to complete. A shell might avoid that if it has good enough scoping abilities. – Courtly 24/1, 2014 at 17:34

@JonathanLeffler to be fair, if you read the exchange further down he recants, although in a somewhat obtuse way. – Monorail 28/1, 2014 at 17:14

@kojiro: yes, he does, but because the posting leaves an ambiguous message, rather than being rewritten to present the final view unambiguously, or point to where the final view is presented unambiguously, it leaves the posting as 'of dubious merit' as a source of information. – Courtly 28/1, 2014 at 17:59

See https://mcmap.net/q/537190/-bash-script-is-super-slow – Sales 24/12, 2016 at 17:19

Update and caveat:

This answer has a troubled past in that I confidently claimed things that turned out not to be true. I believe it has value in its current form, but please help me eliminate other inaccuracies (or convince me that it should be deleted altogether).

I've substantially revised - and mostly gutted - this answer after @kojiro pointed out that my testing methods were flawed (I originally used ps to look for child processes, but that's too slow to always detect them); a new testing method is described below.

I originally claimed that not all bash subshells run in their own child process, but that turns out not to be true.

As @kojiro states in his answer, some shells - other than bash - DO sometimes avoid creation of child processes for subshells, so, generally speaking in the world of shells, one should not assume that a subshell implies a child process.

As for the OP's cases in bash (assumes that command{n} instances are simple commands):

# Case #1
command1         # NO subshell
var=$(command1)  # 1 subshell (command substitution)

# Case #2
command1 | command2         # 2 subshells (1 for each pipeline segment)
var=$(command1 | command2)  # 3 subshells: + 1 for command subst.

# Case #3
command1 | command2 ; var=$?         # 2 subshells (due to the pipeline)
var=$(command1 | command2 ; echo $?) # 3 subshells: + 1 for command subst.;
                                     #   note that the extra command doesn't add 
                                     #   one

It looks like using command substitution ($(...)) always adds an extra subshell in bash - as does enclosing any command in (...).

I believe, but am not certain these results are correct; here's how I tested (bash 3.2.51 on OS X 10.9.1) - please tell me if this approach is flawed:

Made sure only 2 interactive bash shells were running: one to run the commands, the other to monitor.
In the 2nd shell I monitored the fork() calls in the 1st with sudo dtruss -t fork -f -p {pidOfShell1} (the -f is necessary to also trace fork() calls "transitively", i.e. to include those created by subshells themselves).
Used only the builtin : (no-op) in the test commands (to avoid muddling the picture with additional fork() calls for external executables); specifically:
- :
- $(:)
- : | :
- $(: | :)
- : | :; :
- $(: | :; :)
Only counted those dtruss output lines that contained a non-zero PID (as each child process also reports the fork() call that created it, but with PID 0).
Subtracted 1 from the resulting number, as running even just a builtin from an interactive shell apparently involves at least 1 fork().
Finally, assumed that the resulting count represents the number of subshells created.

Below is what I still believe to be correct from my original post: when bash creates subshells.

bash creates subshells in the following situations:

for an expression surrounded by parentheses ( (...) )
- except directly inside [[ ... ]], where parentheses are only used for logical grouping.

for every segment of a pipeline (|), including the first one
- Note that every subshell involved is a clone of the original shell in terms of content (process-wise, subshells can be forked from other subshells (before commands are executed)).
  Thus, modifications of subshells in earlier pipeline segments do not affect later ones.
  (By design, commands in a pipeline are launched simultaneously - sequencing only happens through their connected stdin/stdout pipes.)
- bash 4.2+ has shell option lastpipe (OFF by default), which causes the last pipeline segment NOT to run in a subshell.

for command substitution ($(...))
for process substitution (<(...))
- typically creates 2 subshells; in the case of a simple command, @konsolebox came up with a technique to only create 1: prepend the simple command with exec (<(exec ...)).

background execution (&)

Combining these constructs will result in more than one subshell.

Rave answered 24/1, 2014 at 16:55 Comment(5)

How did you test for the creating of a new process? – Forensics 24/1, 2014 at 17:28

I don't think your results are valid, because ps isn't fast enough to capture some of those subshells you create above. For example, you say an expression surrounded by parentheses does not run in a child process for simple commands, but try continuously outputting ps while running for i in {0..999999}; do ( : ); done. You won't see every new process, but you'll see some, and the number of PIDs the system goes through increases rapidly. – Monorail 25/1, 2014 at 4:52

@kojiro: Excellent catch, thanks. Not that it matters any longer, but how would you test in the absence of $BASHPID? I'll update my answer. – Rave 25/1, 2014 at 5:32

@Rave That's tough. My first attempts were on Mavericks (which still has Bash 3, sigh), and I ended up trying long loops with many tiny subshells in them. On Linux with Bash 3 you may be able to use /proc/self, but since OS X doesn't have /proc, Bash 3 doesn't have BASHPID, and ( sh -c 'echo $$PPID' ) violates some invariants of the question, I don't feel there's a clean solution. – Monorail 25/1, 2014 at 13:10

@kojiro: Thanks; I've come up with something based on sudo dtruss -t fork -f -p {pid} on OSX; would you mind taking a look at the updated answer and tell me whether that looks sound? – Rave 25/1, 2014 at 20:30

In Bash, a subshell always executes in a new process space. You can verify this fairly trivially in Bash 4, which has the $BASHPID and $$ environment variables:

$$ Expands to the process ID of the shell. In a () subshell, it expands to the process ID of the current shell, not the subshell.
BASHPID Expands to the process id of the current bash process. This differs from $$ under certain circumstances, such as subshells that do not require bash to be re-initialized

in practice:

$ type echo
echo is a shell builtin
$ echo $$-$BASHPID
4671-4671
$ ( echo $$-$BASHPID )
4671-4929
$ echo $( echo $$-$BASHPID )
4671-4930
$ echo $$-$BASHPID | { read; echo $REPLY:$$-$BASHPID; }
4671-5086:4671-5087
$ var=$(echo $$-$BASHPID ); echo $var
4671-5006

About the only case where the shell can elide an extra subshell is when you pipe to an explicit subshell:

$ echo $$-$BASHPID | ( read; echo $REPLY:$$-$BASHPID; )
4671-5118:4671-5119

Here, the subshell implied by the pipe is explicitly applied, but not duplicated.

This varies from some other shells that try very hard to avoid fork-ing. Therefore, while I feel the argument made in js-shell-parse misleading, it is true that not all shells always fork for all subshells.

Monorail answered 25/1, 2014 at 4:49 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags