bash: limiting subshells in a for loop with file list
Asked Answered
D

4

5

I've been trying to get a for loop to run a bunch of commands sort of simultaneously and was attempting to do it via subshells. Ive managed to cobble together the script below to test and it seems to work ok.

#!/bin/bash
for i in {1..255}; do
  (
    #commands
  )&

done
wait

The only problem is that my actual loop is going to be for i in files* and then it just crashes, i assume because its started too many subshells to handle. So i added

#!/bin/bash
for i in files*; do
  (
    #commands
  )&
if (( $i % 10 == 0 )); then wait; fi
done
wait

which now fails. Does anyone know a way around this? Either using a different command to limit the number of subshells or provide a number for $i?

Cheers

Dehypnotize answered 19/12, 2014 at 11:30 Comment(2)
what are you doing with files?Encomiast
This - "then it just crashes" - is very hard to believe. I'm pretty sure it produces some error message instead.Delmardelmer
T
4

You can find useful to count the number of jobs with jobs. e.g.:

wc -w <<<$(jobs -p)

So, your code would look like this:

#!/bin/bash
for i in files*; do
  (
    #commands
  )&
  if (( $(wc -w <<<$(jobs -p)) % 10 == 0 )); then wait; fi
done
wait

As @chepner suggested:

In bash 4.3, you can use wait -n to proceed as soon as any job completes, rather than waiting for all of them

Tenpins answered 19/12, 2014 at 11:51 Comment(0)
S
5

xargs/parallel

Another solution would be to use tools designed for concurrency:

printf '%s\0' files* | xargs -0 -P6 -n1 yourScript

The -P6 is the maximum number of concurrent processes that xargs will launch. Make it 10 if you like.

I suggest xargs because it is likely already on your system. If you want a really robust solution, look at GNU Parallel.

Filenames in array

For another answer explicit to your question: Get the counter as the array index?

files=( files* )
for i in "${!files[@]}"; do
    commands "${files[i]}" &
    (( i % 10 )) || wait
done

(The parentheses around the compound command aren't important because backgrounding the job will have the same effects as using a subshell anyway.)

Function

Just different semantics:

simultaneous() {
    while [[ $1 ]]; do
        for i in {1..11}; do
            [[ ${@:i:1} ]] || break
            commands "${@:i:1}" &
        done
        shift 10 || shift "$#"
        wait
    done
}
simultaneous files*
Splurge answered 19/12, 2014 at 14:25 Comment(1)
Another plug for wait -n to start a new job sooner.Monomolecular
T
4

You can find useful to count the number of jobs with jobs. e.g.:

wc -w <<<$(jobs -p)

So, your code would look like this:

#!/bin/bash
for i in files*; do
  (
    #commands
  )&
  if (( $(wc -w <<<$(jobs -p)) % 10 == 0 )); then wait; fi
done
wait

As @chepner suggested:

In bash 4.3, you can use wait -n to proceed as soon as any job completes, rather than waiting for all of them

Tenpins answered 19/12, 2014 at 11:51 Comment(0)
M
3

Define the counter explicitly

#!/bin/bash
for f in files*; do
  (
    #commands
  )&
  (( i++ % 10 == 0 )) && wait
done
wait

There's no need to initialize i, as it will default to 0 the first time you use it. There's also no need to reset the value, as i %10 will be 0 for i=10, 20, 30, etc.

Monomolecular answered 19/12, 2014 at 14:20 Comment(6)
I like this one. It's cheaper +1Tenpins
Looking it nearer, I think it's not enough convenient because you can have a i == 10 although the number of background jobs can be lesser than 10 (they may finish).Tenpins
If the goal is to keep the cores as busy as possible, I'd use a job scheduler like parallel rather than writing one from scratch in bash. This is just a way to keep too many jobs from starting at once, rather than keeping as many as possible running.Monomolecular
In bash 4.3, you could use wait -n to wait for any single job to complete before starting the next one. However, that suffers from a race condition (unavoidable, I think) in which wait -n is called after a job completes, which could result in a period of time where a new job could be added, but we're waiting for another job to complete.Monomolecular
If you really want to keep all the cores busy, start more processes than you can run at once, and let the OS do the scheduling. This is especially desirable for I/O bound jobs, which might sit idle while other processes could run.Monomolecular
I'm fairly new to all this, but the idea was to run this target prediction program. So i needed to run the same command with each of a list of input files against each file in a list of target files. the thing was that each individual run didn't take that much processing power, but took a lot of time, so i was thinking, i could shorten the time by running multiple of the same, but couldn't figure out a easy way to do it.Dehypnotize
L
2

If you have Bash≥4.3, you can use wait -n:

#!/bin/bash

max_nb_jobs=10

for i in file*; do
    # Wait until there are less than max_nb_jobs jobs running
    while mapfile -t < <(jobs -pr) && ((${#MAPFILE[@]}>=max_nb_jobs)); do
        wait -n
    done
    {
        # Your commands here: no useless subshells! use grouping instead
    } &
done
wait

If you don't have wait -n available, you can use something like this:

#!/bin/bash

set -m

max_nb_jobs=10

sleep_jobs() {
   # This function sleeps until there are less than $1 jobs running
   local n=$1
   while mapfile -t < <(jobs -pr) && ((${#MAPFILE[@]}>=n)); do
      coproc read
      trap "echo >&${COPROC[1]}; trap '' SIGCHLD" SIGCHLD
      [[ $COPROC_PID ]] && wait $COPROC_PID
   done
}

for i in files*; do
    # Wait until there are less than 10 jobs running
    sleep_jobs "$max_nb_jobs"
    {
        # Your commands here: no useless subshells! use grouping instead
    } &
done
wait

The advantage of proceeding like this, is that we make no assumptions on the time taken to finish the jobs. A new job is launched as soon as there's room for it. Moreover, it's all pure Bash, so doesn't rely on external tools and (maybe more importantly), you may use your Bash environment (variables, functions, etc.) without exporting them (arrays can't be easily exported so that can be a huge pro).

Labanna answered 19/12, 2014 at 14:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.