I've got an Upstart task that starts multiple instances of a service based on Starting multiple upstart instances automatically and Restarting Upstart instance processes. It's working and it starts all instances but after it successfully starts them it just hangs. If I Ctrl-C out and then check the instances with either service status or looking in ps they're all successfully started, so I don't know what it's doing when it's hanging.

Here's my script:

description "all-my-workers"

start on runlevel [2345]

task

console log

env NUM_INSTANCES=1
env STARTING_PORT=42002

pre-start script
  for i in `seq 1 $NUM_INSTANCES`;
  do
    start my-worker N=$i PORT=$(($STARTING_PORT + $i))
  done
end script

When I do service start all-my-workers I get this:

vagrant@vagrant-service:/etc/init$ sudo service all-my-workers start

And then it just hangs there and doesn't prompt me again. As I said I can Ctrl-C out and see the running workers:

vagrant@vagrant-service:/etc/init$ sudo service all-my-workers status
all-my-workers start/running
vagrant@vagrant-service:/etc/init$ sudo service my-worker status N=1
my-worker (1) start/running, process 21938

And in ps:

worker    21938  0.0  0.1   4392   612 ?        Ss   21:46   0:00 /bin/sh -e /proc/self/fd/9
worker    21941  0.2  7.3 174076 27616 ?        Sl   21:46   0:00 python /var/lib/my-system/script/start_worker.py

I don't think the problem is in the my-worker.conf but just in case:

description "my-worker"

stop on stopping all-my-workers

setuid worker
setgid worker

respawn

instance $N

console log

env SCRIPT_PATH="/var/lib/my-system/script/"

script
    export PROVIDER=vagrant
    export REGION=all
    export ENVIRONMENT=cert

    . /var/lib/my-system/.virtualenvs/my-system/bin/activate

    python $SCRIPT_PATH/start_worker.py

    END
end script

Thanks a bunch!

How Do I Fix It?

I'm going to assume that my-worker is a long-lived process, and you want to have any easy way to spin up & tear down multiple parallel instances of my-worker.

If this is the case, you probably don't want all-my-workers to be a task. You'd want the following instead:

description "all-my-workers"

start on runlevel [2345]

console log

env NUM_INSTANCES=1
env STARTING_PORT=42002

pre-start script
    for i in `seq 1 $NUM_INSTANCES`;
    do
        start my-worker N=$i PORT=$(($STARTING_PORT + $i))
    done
end script

pre-stop script
    for i in `seq 1 $NUM_INSTANCES`;
    do
        stop my-worker N=$i PORT=$(($STARTING_PORT + $i)) || true
    done
end script

Then you can run start all-my-workers to start all of the my-worker instances and then run stop all-my-workers to stop them. Effectively, all-my-workers becomes a parent job that manages starting and stoping it's child jobs.

Why?

You cited two SO answers showing this idea of a parent job managing child jobs. They show:

A task with a script stanza
A job with a pre-start stanza

Your parent job is a task with a pre-start stanza, and that's why you're running into this odd behavior.

script vs pre-start

From this Ask Ubuntu answer which cites this deprecated documentation, there are two very important statements (with emphasis added):

All job files must have either an exec or script stanza. This specifies what will be run for the job.

Additional shell code can be given to be run before or after the binary or script specified with exec or script. These are not expected to start the process, in fact, they can't. They are intended for preparing the environment and cleaning up afterwards.

In summary, any background processes spawned by the pre-start stanza are ignored (i.e., not monitored) by Upstart. Instead, you must use exec or script to spawn a process which Upstart will monitor.

What happens if you omit the exec/script stanza? Upstart will sit and wait for a process to be spawned. Thus, you might as well have written a while-true loop:

script
    while true; do
        true
    done
end script

The only difference is that the while-true loop is a live-lock whereas an empty stanza results in a dead-lock.

Jobs vs. Tasks

Knowing the above, the Upstart documentation for tasks finally leads us to what's going on:

Without the 'task' keyword, the events that cause the job to start will be unblocked as soon as the job is started. This means the job has emitted a starting(7) event, run its pre-start, begun its script/exec, and post-start, and emitted its started(7) event.

With task, the events that lead to this job starting will be blocked until the job has completely transitioned back to stopped. This means that the job has run up to the previously mentioned started(7) event, and has also completed its post-stop, and emitted its stopped(7) event.

(Some of the specifics about events and states will make more sense if you read the documenation about starting and stopping jobs).

In simpiler terms:

With a normal Upstart job, the exec/script stanza is expected to block indefinitely because it's launching a long-lived process. Thus, Upstart stops blocking once it has finished the pre-start stanza.
With a task, the exec/script stanza is expected to block for a "finite" period because it's launching a short-lived process. Thus, Ubstart blocks until after the exec/script stanza has completed.

But what happens if there is no exec/script stanza? Upstart sits and waits indefinitely for something to be launched, but that's never going to happen.

In the case of a job, that's fine because Upstart doesn't block while waiting for a process to spawn, and calling stop is apparently enough to make it stop waiting.
In the case of a task, though, Upstart will just sit and hang forever -- or until you interrupt it. However, because it still hasn't found a spawned process, it is still technically running. That's is why you're able to query the status after interrupting and see all-my-workers start/running.

For Interest's Sake

If, for some reason, you really wanted to make your parent job into a task, you would actually need two tasks: one to start the my-worker instances and one to stop them. You would also need to delete the stop on stopping all-my-workers stanza from my-worker.

start-all-my-workers:

description "starts all-my-workers"

start on runlevel [2345]

task

console log

env NUM_INSTANCES=1
env STARTING_PORT=42002

script
    for i in `seq 1 $NUM_INSTANCES`;
    do
        start my-worker N=$i PORT=$(($STARTING_PORT + $i))
    done
end script

stop-all-my-workers:

description "stops all-my-workers"

start on runlevel [!2345]

task

console log

env NUM_INSTANCES=1
env STARTING_PORT=42002

script
    for i in `seq 1 $NUM_INSTANCES`;
    do
        stop my-worker N=$i PORT=$(($STARTING_PORT + $i)) || true
    done
end script

How Do I Fix It?

Why?

script vs pre-start

Jobs vs. Tasks

For Interest's Sake

Recommended topics

Hot tags