What do I use when a cron job isn't enough? (php)
Asked Answered
G

4

10

I'm trying to figure out the most efficient way to running a pretty hefty PHP task thousands of times a day. It needs to make an IMAP connection to Gmail, loop over the emails, save this info to the database and save images locally.

Running this task every so often using a cron isn't that big of a deal, but I need to run it every minute and I know eventually the crons will start running on top of each other and cause memory issues.

What is the next step up when you need to efficiently run a task multiple times a minute? I've been reading about beanstalk & pheanstalk and I'm not entirely sure if that will do what I need. Thoughts???

Gambado answered 21/4, 2010 at 6:10 Comment(1)
This is not directly related to your question, but use CURL to retrieve the images. It caches DNS request, whereas file_get_contents() & other native file function does not. I once needed to do a script to retrieve images, and pretty much all of the execution time was network latency. So that can help a little to reduce it.Antitrust
K
7

Either create a locking mechanism so the scripts won't overlap. This is quite simple as scripts only run every minute, a simple .lock file would suffice:

<?php
  if (file_exists("foo.lock")) exit(0);
  file_put_contents("foo.lock", getmypid());

  do_stuff_here();

  unlink("foo.lock");
?>

This will make sure scripts don't run in parallel, you just have to make sure the .lock file is deleted when the program exits, so you should have a single point of exit (except for the exit at the beginning).

A good alternative - as Brian Roach suggested - is a dedicated server process that runs all the time and keeps the connection to the IMAP server up. This reduces overhead a lot and is not much harder than writing a normal php script:

<?php
  connect();
  while (is_world_not_invaded_by_aliens())
  {
    get_mails();
    get_images();
    sleep(time_to_next_check());
  }
  disconnect();
?>
Kalasky answered 21/4, 2010 at 6:28 Comment(1)
I think the daemon is going to be my best bet and keeping the IMAP open should make things a lot quicker. Thanks for the advice!Gambado
P
10

I'm not a PHP guy but ... what prevents you from running your script as a daemon? I've written many a perl script that does just that.

Pohai answered 21/4, 2010 at 6:15 Comment(7)
I've never written a daemon before, but I will start doing some more research now. Thanks for the suggestion.Gambado
Basically ... you just wrap everything in a while(1) and run the script in the background. If it's important that it finish doing something rather than just being killed, look into signal handling so you can clean up before exiting. Bonus points for forking rather than requiring that it be run from the shell in the backgorund :)Pohai
I would suggest 2 files: the first one creates another process which runs the daemon. The first will just wait a couple of seconds and check if the daemon is still running. If not, it can re-launch it. I don't really trust PHP for running such a long time, so I think it's better to take precautions.Antitrust
PHP scripts have no problem with long run times, we have scripts here that run for weeks without problems. You don't have to like PHP (i don't) but the language has matured a lot and is now quite stable.Kalasky
@Brian Roach: I believe PHP has traditionally had more memory leak issues than Perl. That, and of course valuing your sanity :)Squeegee
@mike, even if you do it in php you can take a look at perl for basic concepts search.cpan.org/~ehood/Proc-Daemon-0.03/Daemon.pm (double forking and other system stuff to make it more robust)Adonis
That's just a safety for robustness. I don't know whether it's recommended or not with Perl, but I'll do the same. ;)Antitrust
K
7

Either create a locking mechanism so the scripts won't overlap. This is quite simple as scripts only run every minute, a simple .lock file would suffice:

<?php
  if (file_exists("foo.lock")) exit(0);
  file_put_contents("foo.lock", getmypid());

  do_stuff_here();

  unlink("foo.lock");
?>

This will make sure scripts don't run in parallel, you just have to make sure the .lock file is deleted when the program exits, so you should have a single point of exit (except for the exit at the beginning).

A good alternative - as Brian Roach suggested - is a dedicated server process that runs all the time and keeps the connection to the IMAP server up. This reduces overhead a lot and is not much harder than writing a normal php script:

<?php
  connect();
  while (is_world_not_invaded_by_aliens())
  {
    get_mails();
    get_images();
    sleep(time_to_next_check());
  }
  disconnect();
?>
Kalasky answered 21/4, 2010 at 6:28 Comment(1)
I think the daemon is going to be my best bet and keeping the IMAP open should make things a lot quicker. Thanks for the advice!Gambado
P
3

I've got a number of scripts like these, where I don't want to run them from cron in case they stack-up.

#!/bin/sh
php -f fetchFromImap.php
sleep 60
exec $0

The exec $0 part starts the script running again, replacing itself in memory, so it will run forever without issues. Any memory the PHP script uses is cleaned up whenever it exits, so that's not a problem either.

A simple line will start it, and put it into the background:

cd /x/y/z ; nohup ./loopToFetchMail.sh &

or it can be similarly started when the machine starts with various means (such as Cron's '@reboot ....')

Perlie answered 21/4, 2010 at 19:50 Comment(0)
S
0

fcron http://fcron.free.fr/ will not start new job if old one is still running, Your could use @ 1 command and not worry about race conditions.

Sateen answered 16/6, 2010 at 10:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.