How to wait for child process to set variable in parent process?
Asked Answered
V

1

4
 use Parallel::ForkManager;    
 my $number_running = 0;
 my $pm = new Parallel::ForkManager(30); 
 $pm->run_on_start( sub { ++$number_running; } );
 $pm->run_on_finish( sub { --$number_running; } );
 for (my $i=0; $i<=100; $i++)
 {
     if ($number_running == 5) { while ($number_running > 0) {} }  # waits forever
     $pm->start and next;
     print $i;
     $pm->finish;
 }

The above code uses Parallel::ForkManager to execute code in a for loop using parallel processes. It is counting how many child processes are running and setting the $number_running variable accordingly. Once 5 child processes are running, I would like it to wait until 0 child processes are running before continuing.

The first line in the for loop is designed to achieve this but it waits forever on that line. It's like the change to the variable made by the child processes is not available to that line of code. What am I doing wrong? Note: I am aware of wait_all_children but I don't want to use it.

Viewfinder answered 22/8, 2016 at 4:9 Comment(3)
You shouldn't spin in an idle loop like that. Your parent process will use up all the CPU time it can get, endlessly testing the value of $number_running.Autrey
How can I wait until $number_running is decremented to zero by the child processes?Viewfinder
@Viewfinder Child processes cannot affect $number_running used in the parent -- child and parent processes cannot write to each other's variables. The parent has to decrement this variable for itself. Please see explanations added to my answer.Uhlan
U
3

Short   The callback run_on_finish normally doesn't get triggered for every child's exit, so $number_running doesn't get reduced and thus it can't control the loop. Ways to fix this:

  • use reap_finished_children in order to communicate as individual children exit, so that run_on_finish indeed gets to run as each child exits

  • use wait_for_available_procs to wait for the whole batch to finish before starting a new set of processes

As for the title ("How to wait for child process to set variable in parent process?"), a child process cannot set anything in the parent, nor could parent in the child. They must communicate to accord actions, using some form of Inter-Process-Communication (IPC). This module provides some of that, and a few methods useful for this question are outlined above.


The callback run_on_start runs with every new process and the counter is incremented. But the callback run_on_finish is never triggered so the counter is never decremented. Thus once it reaches 5 the code sits in the while loop. Note that a parent and children are separate processes which thus don't know about each other's variables and cannot change them.

The callback run_on_finish is commonly triggered by having wait_all_children after all processes were forked. Its job is also done when maximum number of processes run and one exits. This is done in start by a call to wait_one_child (which calls on_finish, see below).

Or, this can be done at will by calling reap_finished_children method

This is a non-blocking call to reap children and execute callbacks independent of calls tostart or wait_all_children. Use this in scenarios where start is called infrequently but you would like the callbacks executed quickly.

This resolves the main concern of how to communicate as individual children exit (as clarified in comments), and not by wait_all_children.

Here is an example of how to use it so that the callback runs right as a child exits. A good deal of the code is merely for diagnostics (prints).

use warnings;
use strict;
use feature 'say';
use Parallel::ForkManager;    
$| = 1;

my $total_to_process = 3;  # only a few for this test
my $number_running   = 0;    
my @ds;

my $pm = Parallel::ForkManager->new(30);

$pm->run_on_start( sub {
    ++$number_running;
    say "Started $_[0], total: $number_running";
});
$pm->run_on_finish( sub {
    --$number_running;
    my ($pid, $code, $iden, $sig, $dump, $rdata) = @_;
    push @ds, "gone-$pid";
    say "Cleared $pid, ", ($rdata->[0] // ''), ($code ? " exit $code" : '');
});

foreach my $i (1 .. $total_to_process)
{
    $pm->start and next;
    run_job($i);
    $pm->finish(10*$i, [ "kid #$i" ]);
}
say "Running: ", map { "$_ " } $pm->running_procs;  # pid's of children

# Reap right as each process exits, retrieve and print info
my $curr = $pm->running_procs;
while ($pm->running_procs) 
{
    $pm->reap_finished_children;    # may be fewer now
    if ($pm->running_procs < $curr) {
        $curr = $pm->running_procs;
        say "Remains: $number_running. Data: @ds";
    }
    sleep 1;  # or use Time::HiRes::sleep 0.1;
}

sub run_job {
    my ($num) = @_;
    my $sleep_time = ($num == 1) ? 1 : ($num == 2 ? 10 : 20);
    sleep $sleep_time;
    say "\tKid #$num slept for $sleep_time, exiting";
}

Use of this method is equivalent to calling waitpid -1, POSIX::WNOHANG in a loop after fork. This forks fewer than the max (30) processes to see output more easily and demonstrate that the callback runs right as a child exits. Change these numbers to see its full operation.

A child process exits with 10*$i, so to be able to track children processes in the output. The data returned in an anonymous array [...] is a string identifying the child process. As soon as the reap_finished_children call completes the $number_running is reduced, in the callback. This is the reason for having the $curr variable, again for diagnostics.

This prints

start: Started 4656, running: 1
start: Started 4657, running: 2
start: Started 4658, running: 3
Running: 4656 4658 4657 
        Kid #1 slept for 1, exiting
Cleared 4656, kid #1 exit 10
Remains: 2. Data: gone-4656
        Kid #2 slept for 10, exiting
Cleared 4657, kid #2 exit 20
Remains: 1. Data: gone-4656 gone-4657
        Kid #3 slept for 20, exiting
Cleared 4658, kid #3 exit 30
Remains: 0. Data: gone-4656 gone-4657 gone-4658

The direct question is of how to wait for the whole batch to finish before starting a new one. This can be done directly by wait_for_available_procs($n)

Wait until $n available process slots are available. If $n is not given, defaults to 1.

If $MAX is used for $n, that many slots will become available only once the whole batch completed. What to use for $n can also be decided at runtime.


Some details of module's operation

When a child exits the SIGCHLD signal is sent to the parent, which it must catch in order to know that the child is gone (and to avoid zombies, in the first place). This is done by using wait or waitpid, in code or in the SIGCHLD handler (but only at one place). See fork, Signals in perlipc, waitpid and wait.

We see from P::FM's source that this is done in wait_one_child (via _waitpid sub)

sub wait_one_child { my ($s,$par)=@_;  
  my $kid;
  while (1) {
    $kid = $s->_waitpid(-1,$par||=0);
    last if $kid == 0 || $kid == -1; # AS 5.6/Win32 returns negative PIDs
    redo if !exists $s->{processes}->{$kid};
    my $id = delete $s->{processes}->{$kid};
    $s->on_finish( $kid, $? >> 8 , $id, $? & 0x7f, $? & 0x80 ? 1 : 0);
    last;
  }
  $kid;
};  

which is used in wait_all_children

sub wait_all_children { my ($s)=@_;
  while (keys %{ $s->{processes} }) {
    $s->on_wait;
    $s->wait_one_child(defined $s->{on_wait_period} ? &WNOHANG : undef);
  };
}

The method reap_finished_children used above is a synonym for this method.

The method wait_one_child that gets the signal is used by start to reap child processes when maximum number of processes is filled and one exits. This is how the module knows when it can start another process and respect its maximum. (It is also used by a few other routines that wait for processes. ). And this is when run_on_finish gets triggered, by $s->on_finish( $kid, ... )

sub on_finish {
  my ($s,$pid,@par)=@_;
  my $code=$s->{on_finish}->{$pid} || $s->{on_finish}->{0} or return 0;
  $code->($pid,@par);
};

The callback is in the coderef $code, retrieved from the object's on_finish key, which itself is set in the sub run_on_finish. This is how the callback is set up, once that sub runs.

The methods availed to the user for this are wait_all_children and reap_finished_children.

Since none of this is used in the posted code the $number_running is not getting updated so while is an infinite loop. Recall that the variable $number_running in the parent cannot be directly changed by child processes.

Uhlan answered 22/8, 2016 at 5:31 Comment(10)
run_on_finish is executed every time a child process finishes. $number_runningis updated by the child processes. I have tested this myself. I don't want to use wait_for_children. I want to know how to wait for $number_running to be decremented to zero by the child processes.Viewfinder
@Viewfinder 1. The child has nothing to do with the counter. Child code is what we type after start. 2. You want to wait (or waitpid) for children, that is the first rule of forking. 3. The program is informed of child's exit via signals, which it must catch (and handle) in order to actually know that the child exited. The other way would be for a child to pipe back (or send a SIGUSR signal) to say that it is extiing, but we type child code so that is not the case. 4. I finally got to look at the source code, this wait-ing is indeed done in wait_all_children. I'll update my post.Uhlan
@Viewfinder The while is clearly infinite, which can only be because $number_running doesn't get updated, even though the child processes do exit very shortly after they are created. This can only be because the callback doesn't run. When I comment out wait_all_children I see the confirmation of it -- the Cleared PID lines start being printed only after all processes have been forked.Uhlan
OK, I can see in the source that finish is not calling on_finish, but I don't understand why that can't happen. Why can't it do that?Viewfinder
@Viewfinder I just tried it -- set the limit to 10 but ran only 5 processes (and removed wait_all_children). It never printed Cleared ..., the callback never executed. Please watch though, this can result in zombies. Some systems take care of that, but it cannot be expected in general.Uhlan
@Viewfinder If I may ask, why do you not like wait_all_children? Its code is standard and clear, this is more or less what I do by hand in my signal handlers.Uhlan
I just want to see that the callback can occur at the end of a child process because I may want to use for it for other reasons. I don't see why a callback can't be called at the end of a child process.Viewfinder
Can $pm->reap_finished_children be used in this situation? Would it achieve my aim?Viewfinder
@Viewfinder I just got around to complete this -- the post is entirely rewritten. There is a direct way to track each child, just as you expected, and it is by the method you mentioned. There is also a direct way to wait for a batch to finish. Let me know whether it needs edits. (I will tweak and add anyway, it's just too much to get right at once, for me.) I suggest that we clean up many of the comments, as they do not directly relate to the post now. Let me know if you are OK with that.Uhlan
@Viewfinder I hope that this provides both things that you needed, as I understood them. Please let me know if that isn't the case, or if more code/source commentary would be useful.Uhlan

© 2022 - 2024 — McMap. All rights reserved.