I want to write a worker for beanstalkd in php, using a Zend Framework 2 controller. It starts via the CLI and will run forever, asking for jobs from beanstalkd like this example.
In simple pseudo-like code:
while (true) {
$data = $beanstalk->reserve();
$class = $data->class;
$params = $data->params;
$job = new $class($params);
$job();
}
The $job
has here an __invoke()
method of course. However, some things in these jobs might be running for a long time. Some might run with a considerable amount of memory. Some might have injected the $beanstalk
object, to start new jobs themselves, or have a Zend\Di\Locator
instance to pull objects from the DIC.
I am worried about this setup for production environments on the long term, as perhaps circular references might occur and (at this moment) I do not explicitly "do" any garbage collection while this action might run for weeks/months/years *.
*) In beanstalk, reserve
is a blocking call and if no job is available, this worker will wait until it gets any response back from beanstalk.
My question: how will php handle this on the long term and should I take any special precaution to keep this from blocking?
This I did consider and might be helpful (but please correct if I am wrong and add more if possible):
- Use gc_enable() before starting the loop
- Use gc_collect_cycles() in every iteration
- Unset
$job
in every iteration - Explicitly unset references in
__destruct()
from a$job
(NB: Update from here)
I did run some tests with arbitrary jobs. The jobs I included were: "simple", just set a value; "longarray", create an array of 1,000 values; "producer", let the loop inject $pheanstalk
and add three simplejobs to the queue (so there is now a reference from job to beanstalk); "locatoraware", where a Zend\Di\Locator
is given and all job types are instantiated (though not invoked). I added 10,000 jobs to the queue, then I reserved all jobs in a queue.
Results for "simplejob" (memory consumption per 1,000 jobs, with memory_get_usage()
)
0: 56392
1000: 548832
2000: 1074464
3000: 1538656
4000: 2125728
5000: 2598112
6000: 3054112
7000: 3510112
8000: 4228256
9000: 4717024
10000: 5173024
Picking a random job, measuring the same as above. Distribution:
["Producer"] => int(2431)
["LongArray"] => int(2588)
["LocatorAware"] => int(2526)
["Simple"] => int(2456)
Memory:
0: 66164
1000: 810056
2000: 1569452
3000: 2258036
4000: 3083032
5000: 3791256
6000: 4480028
7000: 5163884
8000: 6107812
9000: 6824320
10000: 7518020
The execution code from above is updated to this:
$baseMemory = memory_get_usage();
gc_enable();
for ( $i = 0; $i <= 10000; $i++ ) {
$data = $bheanstalk->reserve();
$class = $data->class;
$params = $data->params;
$job = new $class($params);
$job();
$job = null;
unset($job);
if ( $i % 1000 === 0 ) {
gc_collect_cycles();
echo sprintf( '%8d: ', $i ), memory_get_usage() - $baseMemory, "<br>";
}
}
As everybody notices, the memory consumption is in php not leveraged and kept to a minimum, but increases over time.
gc_collect_cycles
#38850891 – Jerlenejermain