How Do I Fix It?
I'm going to assume that my-worker
is a long-lived process, and you want to have any easy way to spin up & tear down multiple parallel instances of my-worker
.
If this is the case, you probably don't want all-my-workers
to be a task
. You'd want the following instead:
description "all-my-workers"
start on runlevel [2345]
console log
env NUM_INSTANCES=1
env STARTING_PORT=42002
pre-start script
for i in `seq 1 $NUM_INSTANCES`;
do
start my-worker N=$i PORT=$(($STARTING_PORT + $i))
done
end script
pre-stop script
for i in `seq 1 $NUM_INSTANCES`;
do
stop my-worker N=$i PORT=$(($STARTING_PORT + $i)) || true
done
end script
Then you can run start all-my-workers
to start all of the my-worker
instances and then run stop all-my-workers
to stop them. Effectively, all-my-workers
becomes a parent job that manages starting and stoping it's child jobs.
Why?
You cited two SO answers showing this idea of a parent job managing child jobs. They show:
- A task with a
script
stanza
- A job with a
pre-start
stanza
Your parent job is a task with a pre-start
stanza, and that's why you're running into this odd behavior.
script vs pre-start
From this Ask Ubuntu answer which cites this deprecated documentation, there are two very important statements (with emphasis added):
All job files must have either an exec or script stanza. This specifies what will be run for the job.
Additional shell code can be given to be run before or after the binary or script specified with exec or script. These are not expected to start the process, in fact, they can't. They are intended for preparing the environment and cleaning up afterwards.
In summary, any background processes spawned by the pre-start
stanza are ignored (i.e., not monitored) by Upstart. Instead, you must use exec
or script
to spawn a process which Upstart will monitor.
What happens if you omit the exec
/script
stanza? Upstart will sit and wait for a process to be spawned. Thus, you might as well have written a while-true loop:
script
while true; do
true
done
end script
The only difference is that the while-true loop is a live-lock whereas an empty stanza results in a dead-lock.
Jobs vs. Tasks
Knowing the above, the Upstart documentation for tasks finally leads us to what's going on:
Without the 'task' keyword, the events that cause the job to start will be unblocked as soon as the job is started. This means the job has emitted a starting(7) event, run its pre-start, begun its script/exec, and post-start, and emitted its started(7) event.
With task, the events that lead to this job starting will be blocked until the job has completely transitioned back to stopped. This means that the job has run up to the previously mentioned started(7) event, and has also completed its post-stop, and emitted its stopped(7) event.
(Some of the specifics about events and states will make more sense if you read the documenation about starting and stopping jobs).
In simpiler terms:
- With a normal Upstart job, the
exec
/script
stanza is expected to block indefinitely because it's launching a long-lived process. Thus, Upstart stops blocking once it has finished the pre-start
stanza.
- With a
task
, the exec
/script
stanza is expected to block for a "finite" period because it's launching a short-lived process. Thus, Ubstart blocks until after the exec
/script
stanza has completed.
But what happens if there is no exec
/script
stanza? Upstart sits and waits indefinitely for something to be launched, but that's never going to happen.
- In the case of a
job
, that's fine because Upstart doesn't block while waiting for a process to spawn, and calling stop
is apparently enough to make it stop waiting.
- In the case of a
task
, though, Upstart will just sit and hang forever -- or until you interrupt it. However, because it still hasn't found a spawned process, it is still technically running. That's is why you're able to query the status after interrupting and see all-my-workers start/running
.
For Interest's Sake
If, for some reason, you really wanted to make your parent job into a task, you would actually need two tasks: one to start the my-worker
instances and one to stop them. You would also need to delete the stop on stopping all-my-workers
stanza from my-worker
.
start-all-my-workers:
description "starts all-my-workers"
start on runlevel [2345]
task
console log
env NUM_INSTANCES=1
env STARTING_PORT=42002
script
for i in `seq 1 $NUM_INSTANCES`;
do
start my-worker N=$i PORT=$(($STARTING_PORT + $i))
done
end script
stop-all-my-workers:
description "stops all-my-workers"
start on runlevel [!2345]
task
console log
env NUM_INSTANCES=1
env STARTING_PORT=42002
script
for i in `seq 1 $NUM_INSTANCES`;
do
stop my-worker N=$i PORT=$(($STARTING_PORT + $i)) || true
done
end script