Perl daemonize with child daemons
Asked Answered
A

1

6

I have to employ daemons in my code. I need a control daemon that constantly checks the database for the tasks and supervises child daemons. The control daemon must assign tasks to the child daemons, control tasks, create new children if one of them dies, etc. The child daemons check database for tasks for them (by PID). How should I implement daemons for this purpose?

Achromat answered 10/10, 2014 at 9:31 Comment(0)
F
14

Daemon is just a code word for "background process that runs a long time". So the answer is 'it depends'. Perl has two major ways of doing multiprocessing:

Threading

You run a subroutine as a thread, in parallel with the main program code. (Which may then just monitor thread states).

The overhead of creating a thread is higher, but it's better suited for 'shared memory' style multiprocessing, e.g. when you're passing significant quantities of data back and forth. There's several libraries that make passing information between threads positively straightforward. Personally I quite like Thread::Queue, Thread::Semaphore and Storable.

In particular - Storable has freeze and thaw which lets you move complex data structures (e.g. objects/hashes) around in queues, which is very useful.

Basic threading example:

#!/usr/bin/perl

use strict;
use warnings;

use threads;

use Thread::Queue;

my $nthreads = 5;

my $process_q = Thread::Queue->new();
my $failed_q  = Thread::Queue->new();

#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.

sub worker {
    #NB - this will sit a loop indefinitely, until you close the queue.
    #using $process_q -> end
    #we do this once we've queued all the things we want to process
    #and the sub completes and exits neatly.
    #however if you _don't_ end it, this will sit waiting forever.
    while ( my $server = $process_q->dequeue() ) {
        chomp($server);
        print threads->self()->tid() . ": pinging $server\n";
        my $result = `/bin/ping -c 1 $server`;
        if ($?) { $failed_q->enqueue($server) }
        print $result;
    }
}

#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
$process_q->enqueue(<$input_fh>);
close($input_fh);

#we 'end' process_q  - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();

#start some threads
for ( 1 .. $nthreads ) {
    threads->create( \&worker );
}

#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
    $thr->join();
}

#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
    print "$server failed to ping\n";
}

Storable

When it comes to Storable, this is worth a separate example I think, because it's handy to move data around.

use Storable qw ( freeze thaw );
use MyObject;    #home made object.
use Thread::Queue;

my $work_q = Thread::Queue->new();

sub worker_thread {
    while ( my $packed_item = $work_q->dequeue ) {
        my $object = thaw($packed_item);
        $object->run_some_methods();
        $object->set_status("processed");
        #maybe return $object via 'freeze' and a queue?
    }
}

my $thr       = threads->create( \&worker_thread );
my $newobject = MyObject->new("some_parameters");
$work_q->enqueue( freeze($newobject) );
$work_q->end();
$thr->join();

Because you're passing the object around within the queue, you're effectively cloning it between threads. So bear in mind that you may need to freeze it and 'return' it somehow once you've done something to it's internal state. But it does mean you can do this asynchronously without needing to arbitrate locking or shared memory. You may also find it useful to be able to 'store' and 'retrieve' and object - this works as you might expect. (Although I daresay you might need to be careful about availability of module versions vs. defined attributes if you're retrieving a stored object)

Forking

Your script clones itself, leaving a 'parent' and 'child' - the child then generally diverges and does something different. This uses the Unix built in fork() which as a result is well optimised and generally very efficient - but because it's low level, means it's difficult to do lots of data transfer. You'll end up some slightly complicated things to do Interprocess communication - IPC. (See perlipc for more detail). It's efficient not least because most fork() implementations do a lazy data copy - memory space for your process is only allocated as it's needed e.g. when it's changed.

It's therefore really good if you want to delegate a lot of tasks that don't require much supervision from the parent. For example - you might fork a web server, because the child is reading files and delivering them to a particular client, and the parent doesn't much care. Or perhaps you would do this if you want to spend a lot of CPU time computing a result, and only pass that result back.

It's also not supported on Windows.

Useful libraries include Parallel::ForkManager

A basic example of 'forking' code looks a bit like this:

#!/usr/bin/perl
use strict;
use warnings;
use Parallel::ForkManager;

my $concurrent_fork_limit = 4;

my $fork_manager = Parallel::ForkManager->new($concurrent_fork_limit);

foreach my $thing ( "fork", "spoon", "knife", "plate" ) {
    my $pid = $fork_manager->start;
    if ($pid) {
        print "$$: Fork made a child with pid $pid\n";
    } else {
        print "$$: child process started, with a key of $thing ($pid)\n";
    }
    $fork_manager->finish;
}

$fork_manager->wait_all_children();

Which is right for you?

So it's hard to say without a bit more detail about what you're trying to accomplish. This is why StacKOverflow usually likes to show some working, approaches you've tried, etc.

I would generally say:

  • if you need to pass data around, use threads. Thread::Queue especially when combined with Storable is very good for it.

  • if you don't, forks (on Unix) are generally faster/more efficient. (But fast alone isn't usually enough - write understandable stuff first, and aim for speed second. It doesn't matter much usually).

Avoid where possible spawning too many threads - they're fairly intensive on memory and creation overhead. You're far better off using a fixed number in a 'worker thread' style of programming, than repeatedly creating new, short lived threads. (On the other hand - forks are actually very good at this, because they don't copy your whole process).

I would suggest in the scenario you give - you're looking at threads and queues. Your parent process can track child threads via threads -> list() and join or create to keep the right number. And can feed data via a central queue to your worker threads. Or have multiple queues - one per 'child' and use that as a task assignment system.

Florida answered 10/10, 2014 at 10:24 Comment(6)
I need a fixed number of workers demons (5). They should check out some variables from the database, and then get the message from the mail service. I think that the fork is a better way for me, the demons will work only with messages (but attachments can be large).Achromat
ForkManager will let you have 5 active 'tasks' - you could set a variable just before forking with the ID of what to fetch and process. Then fork, do the processing, exit and join that fork. ForkManager will then keep the number of forks to 5, but it'll spawn a new one for each message, which may do the trick.Florida
Maybe the question sounds stupid, but the realisation of the child's and the main daemon must be in separate files? And if in one, how is a child will understand what code for them?Achromat
Parent and child is a specific terminology to denote one spawning the other within their code. If you've completely separate scripts, then you still probably want to look at that perlipc link - that discusses ways of communicating between processes. Personally, I find multithreading one of the most versatile forms of IPC.Florida
As for forking and Parallel:ForkManager, it helps with passing data back, like this. It's in the docs. Threads are of course better for that, as you say.Okinawa
@Okinawa That discouraged is a bit controversial. It's mostly down to usage - some languages threads are a lightweight thing, and in perl, they aren't.Florida

© 2022 - 2024 — McMap. All rights reserved.