APScheduler shut down randomly
Asked Answered
I

1

6

Scheduler running fine in production, then all of a sudden it shut down. Clearly DB might have been offline for a bit (web apps never missed a beat so it was transient).

Log reported...

[2019-11-25 07:59:14,907: INFO/ercscheduler] Scheduler has been shut down
[2019-11-25 07:59:14,908: DEBUG/ercscheduler] Looking for jobs to run
[2019-11-25 07:59:14,909: WARNING/ercscheduler] Error getting due jobs from job store 'default': (psycopg2.OperationalError) could not connect to server: Network is unreachable
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 6432?

(Background on this error at: http://sqlalche.me/e/e3q8)
[2019-11-25 07:59:14,909: DEBUG/ercscheduler] Next wakeup is due at 2019-11-25 13:59:24.908318+00:00 (in 10.000000 seconds)
[2019-11-25 07:59:14,909: INFO/ercscheduler] listener closed
[2019-11-25 07:59:14,909: INFO/ercscheduler] server has terminated
[2019-11-25 08:00:10,747: INFO/ercscheduler] Adding job tentatively -- it will be properly scheduled when the scheduler starts
[2019-11-25 08:00:10,797: INFO/ercscheduler] Adding job tentatively -- it will be properly scheduled when the scheduler starts
[2019-11-26 15:27:48,392: INFO/ercscheduler] Adding job tentatively -- it will be properly scheduled when the scheduler starts
[2019-11-26 15:27:48,392: INFO/ercscheduler] Adding job tentatively -- it will be properly scheduled when the scheduler starts

How do I make the scheduler more fault tolerant? I have to restart the daemon again to get it going.

Interknit answered 26/11, 2019 at 21:55 Comment(0)
P
6

I found something very similar to your issue on the APScheduler Github repo. https://github.com/agronholm/apscheduler/issues/109

This issue here seems to be mitigated and merged in version 3.3.

All you have to do is upgrade to at least to 3.3. If you would like to alter the default 10 seconds interval then you have to set the jobstore_retry_interval when you create the scheduler instance.

If you cannot upgrade, then i would try monkey patching the corresponding function in APScheduler.

def monkey_patched_process_jobs(self):

     # You have alter the way job processing done in this function.

     pass

# replacing the function with the patched one
BackgroundScheduler._process_jobs = monkey_patched_process_jobs

scheduler = BackgroundScheduler()

Keep in mind that this is not ideal, i would only do monkey patching if i am unable to upgrade due to breaking changes.


How this functionality works under the hood

This is a snippet from the APScheduler Git repo

try:
    due_jobs = jobstore.get_due_jobs(now)
except Exception as e:
    # Schedule a wakeup at least in jobstore_retry_interval seconds
    self._logger.warning('Error getting due jobs from job store %r: %s',
                         jobstore_alias, e)
    retry_wakeup_time = now + timedelta(seconds=self.jobstore_retry_interval)
    if not next_wakeup_time or next_wakeup_time > retry_wakeup_time:
        next_wakeup_time = retry_wakeup_time

    continue

self.jobstore_retry_interval is set in the following manner:

self.jobstore_retry_interval = float(config.pop('jobstore_retry_interval', 10))
Piceous answered 2/12, 2019 at 15:4 Comment(1)
I would assume in the log I would see "Error getting due jobs from job store" if this was the case. Is it possible my postgres DB is offline for (say) more than 10 seconds and I am using the default or 10 seconds and it just stops processing. I would want the scheduler to NEVER stop attempting to connect to postgres, so that unless the DB crashed the scheduler would restart. Is this possible?Interknit

© 2022 - 2025 — McMap. All rights reserved.