How to pass a variable from a child process (fork by Parallel::ForkManager)?
Asked Answered
E

2

1

My query:

In the following code i had tried to bring the print $commandoutput[0] to be shifted or passed into the upcoming subroutine.i tried the shift to pass it.But i failed with it.Can you please help me the right way to follow?

Code:

my $max_forks = 4;

#createThreads();
my %commandData;
my @arr = (
   'bhappy',  'bload -m all -l -res CPUSTEAL',
   'bqueues', 'bjobs -u all -l -hfreq 101'
);

#print @arr;
my $fork = new Parallel::ForkManager($max_forks);
$fork->run_on_start(
   sub {
      my $pid = shift;
   }
);
$fork->run_on_finish(
   sub {
      my ( $pid, $exit, $ident, $signal, $core ) = @_;
      if ($core) {
         print "PID $pid core dumped.\n";
      }
      else { }
   }
);
my @Commandoutput;
my $commandposition = 0;
for my $command (@arr) {
   $fork->start and next;
   my @var = split( " ", $command );
   $commandoutput[$commandposition] = `$command`;
   $commandposition++;
   $line = $commandoutput[0];

# print $line;
   $fork->finish;
}
$fork->wait_all_children;

#print Dumper(\%commandData);
print $commandoutput[0];

Here i had tried to store the print $commandoutput[0] in the variable inside the subroutine.I gated here how to pass the variables from outside to inside the subroutine.

sub gen_help_data
{
  my $lines=shift;
  print $lines;
}
Erlking answered 27/1, 2017 at 9:47 Comment(0)
B
1

The code between start and finish runs in a separate process and the child and parent cannot write to each other's variables (even if with the same name). Forking creates an independent process with its own memory and data. To pass data between these processes we need to use an "Inter-Process-Communication" (IPC) mechanism.

This module does provide a ready and simple way to pass data back from a child to the parent. See Retrieving data structures from child processes in docs.

You first need to supply to finish a reference to the data structure that the child wants to return. In your case, you want to return a scalar $commandoutput[0] so do

$fork->finish(0, \$commandoutput[0]);

This reference is then found in the callback as the last, sixth, parameter. The one your code left out. So in the callback you need

my %ret_data;  # to store data from different child processes

$pm->run_on_finish( 
    sub { 
        my ($pid, $exit, $ident, $signal, $core, $dataref) = @_; 
        $ret_data{$pid} = $dataref;
    }
);

Here $dataref is \$commandoutput[0], which is stored in %ret_data as the value for the key which is the process id. So after the foreach completes you can find all data in %ret_data

foreach my $pid (keys %ret_data) {
    say "Data from $pid => ${$ret_data{$pid}}";
}

Here we dereference $ret_data{$pid} as a scalar reference, since your code returns that.

Note that the data is passed by writing out files and that can be slow if a lot is going on.


Here is a full example, where each child returns an array reference, by passing it tofinish, which is then retrieved in the callback. For a different example see this post.

use warnings;
use strict;
use feature 'say';

use Parallel::ForkManager;    
my $pm = Parallel::ForkManager->new(4); 

my %ret_data;

$pm->run_on_finish( sub { 
    my ($pid, $exit, $ident, $signal, $core, $dataref) = @_; 
    $ret_data{$pid} = $dataref;
});

foreach my $i (1..8)
{
    $pm->start and next;
    my $ref = run_job($i);
    $pm->finish(0, $ref);
}
$pm->wait_all_children;

foreach my $pid (keys %ret_data) {
    say "$pid returned: @{$ret_data{$pid}}";
}

sub run_job { 
    my ($i) = @_;
    return [ 1..$i ];  # make up return data: arrayref with list 1..$i
}

Prints

15037 returned: 1 2 3 4 5 6 7
15031 returned: 1 2
15033 returned: 1 2 3 4
15036 returned: 1 2 3 4 5 6
15035 returned: 1 2 3 4 5
15038 returned: 1 2 3 4 5 6 7 8
15032 returned: 1 2 3
15030 returned: 1

On modern systems as little data is copied as possible as a new process is forked, for performance reasons. So variables that a child "inherits" by forking aren't actually copies and thus the child does in fact read parent's variables that existed when it was forked.

However, any data that a child writes in memory is inaccessible to the parent (and what parent writes after forking is unknown to the child). If that data is written to a variable "inherited" from a parent at forking then a data copy happens so that the child's new data is independent.

There are certainly subtleties and complexities in how data is managed, with apparently a number of pointers maintained even as data changes in the child. I'd guess that this is mostly to simplify data management, and to reduce copying; there appears to be far finer granularity in data management than at a "variable" level.

But these are implementation details and in general child and parent can't poke at each other's data.

Bamboozle answered 27/1, 2017 at 10:5 Comment(1)
@examplefile Great -- let me know if more explanation would be useful.Bamboozle
V
2

I think you're misunderstanding what a fork does. When you successfully fork, you're creating a subprocess, independent from the process you started with, to continue doing work. Because it's a separate process, it has its own memory, variables, etc., even though some of these started out as copies from the parent process.

So you're setting $commandoutput[0] in each subprocess, but then, when that subprocess dies, so does the content of its copy of @commandoutput.

You can either run each command serially, or you can use threads (which comes with a host of other issues - your code would need some significant redesign to work even with threads), or you can use events (POE, AnyEvent, etc., and this will be another significant redesign). Or you could run each command with its output put into temporary files, then, once all the children are done, read each file and continue. This also comes with issues, but generally fewer issues than the others.

Viola answered 27/1, 2017 at 9:56 Comment(2)
If there's a need for data sharing, I'm a big fan of doing it "worker threads" style with Thread::Queue Like ThisOystercatcher
@Oystercatcher One can nicely return data from a child in P::FM, see my answer.Bamboozle
B
1

The code between start and finish runs in a separate process and the child and parent cannot write to each other's variables (even if with the same name). Forking creates an independent process with its own memory and data. To pass data between these processes we need to use an "Inter-Process-Communication" (IPC) mechanism.

This module does provide a ready and simple way to pass data back from a child to the parent. See Retrieving data structures from child processes in docs.

You first need to supply to finish a reference to the data structure that the child wants to return. In your case, you want to return a scalar $commandoutput[0] so do

$fork->finish(0, \$commandoutput[0]);

This reference is then found in the callback as the last, sixth, parameter. The one your code left out. So in the callback you need

my %ret_data;  # to store data from different child processes

$pm->run_on_finish( 
    sub { 
        my ($pid, $exit, $ident, $signal, $core, $dataref) = @_; 
        $ret_data{$pid} = $dataref;
    }
);

Here $dataref is \$commandoutput[0], which is stored in %ret_data as the value for the key which is the process id. So after the foreach completes you can find all data in %ret_data

foreach my $pid (keys %ret_data) {
    say "Data from $pid => ${$ret_data{$pid}}";
}

Here we dereference $ret_data{$pid} as a scalar reference, since your code returns that.

Note that the data is passed by writing out files and that can be slow if a lot is going on.


Here is a full example, where each child returns an array reference, by passing it tofinish, which is then retrieved in the callback. For a different example see this post.

use warnings;
use strict;
use feature 'say';

use Parallel::ForkManager;    
my $pm = Parallel::ForkManager->new(4); 

my %ret_data;

$pm->run_on_finish( sub { 
    my ($pid, $exit, $ident, $signal, $core, $dataref) = @_; 
    $ret_data{$pid} = $dataref;
});

foreach my $i (1..8)
{
    $pm->start and next;
    my $ref = run_job($i);
    $pm->finish(0, $ref);
}
$pm->wait_all_children;

foreach my $pid (keys %ret_data) {
    say "$pid returned: @{$ret_data{$pid}}";
}

sub run_job { 
    my ($i) = @_;
    return [ 1..$i ];  # make up return data: arrayref with list 1..$i
}

Prints

15037 returned: 1 2 3 4 5 6 7
15031 returned: 1 2
15033 returned: 1 2 3 4
15036 returned: 1 2 3 4 5 6
15035 returned: 1 2 3 4 5
15038 returned: 1 2 3 4 5 6 7 8
15032 returned: 1 2 3
15030 returned: 1

On modern systems as little data is copied as possible as a new process is forked, for performance reasons. So variables that a child "inherits" by forking aren't actually copies and thus the child does in fact read parent's variables that existed when it was forked.

However, any data that a child writes in memory is inaccessible to the parent (and what parent writes after forking is unknown to the child). If that data is written to a variable "inherited" from a parent at forking then a data copy happens so that the child's new data is independent.

There are certainly subtleties and complexities in how data is managed, with apparently a number of pointers maintained even as data changes in the child. I'd guess that this is mostly to simplify data management, and to reduce copying; there appears to be far finer granularity in data management than at a "variable" level.

But these are implementation details and in general child and parent can't poke at each other's data.

Bamboozle answered 27/1, 2017 at 10:5 Comment(1)
@examplefile Great -- let me know if more explanation would be useful.Bamboozle

© 2022 - 2024 — McMap. All rights reserved.