Memory considerations for long-running php scripts
Asked Answered
P

3

16

I want to write a worker for beanstalkd in php, using a Zend Framework 2 controller. It starts via the CLI and will run forever, asking for jobs from beanstalkd like this example.

In simple pseudo-like code:

while (true) {
    $data   = $beanstalk->reserve();

    $class  = $data->class;
    $params = $data->params;

    $job    = new $class($params);
    $job();
}

The $job has here an __invoke() method of course. However, some things in these jobs might be running for a long time. Some might run with a considerable amount of memory. Some might have injected the $beanstalk object, to start new jobs themselves, or have a Zend\Di\Locator instance to pull objects from the DIC.

I am worried about this setup for production environments on the long term, as perhaps circular references might occur and (at this moment) I do not explicitly "do" any garbage collection while this action might run for weeks/months/years *.

*) In beanstalk, reserve is a blocking call and if no job is available, this worker will wait until it gets any response back from beanstalk.

My question: how will php handle this on the long term and should I take any special precaution to keep this from blocking?

This I did consider and might be helpful (but please correct if I am wrong and add more if possible):

  1. Use gc_enable() before starting the loop
  2. Use gc_collect_cycles() in every iteration
  3. Unset $job in every iteration
  4. Explicitly unset references in __destruct() from a $job

(NB: Update from here)

I did run some tests with arbitrary jobs. The jobs I included were: "simple", just set a value; "longarray", create an array of 1,000 values; "producer", let the loop inject $pheanstalk and add three simplejobs to the queue (so there is now a reference from job to beanstalk); "locatoraware", where a Zend\Di\Locator is given and all job types are instantiated (though not invoked). I added 10,000 jobs to the queue, then I reserved all jobs in a queue.

Results for "simplejob" (memory consumption per 1,000 jobs, with memory_get_usage())

0:     56392
1000:  548832
2000:  1074464
3000:  1538656
4000:  2125728
5000:  2598112
6000:  3054112
7000:  3510112
8000:  4228256
9000:  4717024
10000: 5173024

Picking a random job, measuring the same as above. Distribution:

["Producer"] => int(2431)
["LongArray"] => int(2588)
["LocatorAware"] => int(2526)
["Simple"] => int(2456)

Memory:

0:     66164
1000:  810056
2000:  1569452
3000:  2258036
4000:  3083032
5000:  3791256
6000:  4480028
7000:  5163884
8000:  6107812
9000:  6824320
10000: 7518020

The execution code from above is updated to this:

$baseMemory = memory_get_usage();
gc_enable();

for ( $i = 0; $i <= 10000; $i++ ) {
    $data = $bheanstalk->reserve();

    $class = $data->class;
    $params = $data->params;

    $job = new $class($params);
    $job();

    $job = null;
    unset($job);

    if ( $i % 1000 === 0 ) {
        gc_collect_cycles();
        echo sprintf( '%8d: ', $i ), memory_get_usage() - $baseMemory, "<br>";
    }
}

As everybody notices, the memory consumption is in php not leveraged and kept to a minimum, but increases over time.

Philander answered 2/4, 2012 at 13:56 Comment(1)
This is interesting question, I added some related research about using gc_collect_cycles #38850891Jerlenejermain
P
2

I ended up benchmarking my current code base line for line, after which I came to this:

$job = $this->getLocator()->get($data->name, $params);

It uses the Zend\Di dependency injection which instance manager tracks instances through the complete process. So after a job was invoked and could be removed, the instance manager still kept it in memory. Not using Zend\Di for instantiating the jobs immediately resulted in a static memory consumption instead of a linear one.

Philander answered 7/4, 2012 at 21:4 Comment(2)
I am also facing similar issue. Do you think just below methods don't help? -gc_enable() before starting the loop -Use gc_collect_cycles() in every iteration -Unset $job in every iteration -Explicitly unset references in __destruct() from a $jobElectrum
Just make sure you do not keep an instance of the class inside the container. I ended up using the ServiceManager and setting its shared behaviour to false.Philander
B
2

I've usually restarted the script regularly - though you don't have to do it after every job is run (unless you want to, and it's useful to clear memory). You could for example run for up to 100 jobs or more at a time or till the script had used say 20MB RAM, and then exit the script, to be instantly re-run.

My blogpost at http://www.phpscaling.com/2009/06/23/doing-the-work-elsewhere-sidebar-running-the-worker/ has some example shell scripts of re-running the scripts.

Bumbailiff answered 2/4, 2012 at 17:8 Comment(1)
Also here, memory considerations are covered by using bash to control the sequence instead of php itself. I was hoping for a php only solution, but as it seems it might not be possible. The exit code strategy seems to give more control over the flow however.Philander
P
2

I ended up benchmarking my current code base line for line, after which I came to this:

$job = $this->getLocator()->get($data->name, $params);

It uses the Zend\Di dependency injection which instance manager tracks instances through the complete process. So after a job was invoked and could be removed, the instance manager still kept it in memory. Not using Zend\Di for instantiating the jobs immediately resulted in a static memory consumption instead of a linear one.

Philander answered 7/4, 2012 at 21:4 Comment(2)
I am also facing similar issue. Do you think just below methods don't help? -gc_enable() before starting the loop -Use gc_collect_cycles() in every iteration -Unset $job in every iteration -Explicitly unset references in __destruct() from a $jobElectrum
Just make sure you do not keep an instance of the class inside the container. I ended up using the ServiceManager and setting its shared behaviour to false.Philander
V
1

For memory safety, don't use looping after each sequence job in PHP. But just create simple bash script to do looping:

while [ true ] ; do
    php  do_jobs.php 
done

Hey there, with do_jobs.php contains something like:

// ...

$data   = $beanstalk->reserve();

$class  = $data->class;
$params = $data->params;

$job    = new $class($params);
$job();


// ...

simple right? ;)

Vicechairman answered 2/4, 2012 at 15:20 Comment(2)
I would like to keep the control within php. When something is wrong during the job execution, bash is unaware of this and just starts the next job. You get less control over it with in this situation. Also, with a ZF2 cli app, you invoke a controller directly through (for example) app.php worker reserve --watch default --sleep-between 100 --log ./data/log/worker, which is what I'd like to do.Philander
Does your job sequences have dependency to other job (in looping)? if it does, then you have to use full PHP solution. If each job independence, IMHO bash & php combination is the best bet to avoid PHP memory leaks.Vicechairman

© 2022 - 2024 — McMap. All rights reserved.