I am trying to use APScheduler to run periodic jobs with an IntervalTrigger, I've intentionally set the maximum number of running instances to one because I don't want jobs to overlap.
Problem is that after some time the scheduler starts reporting that the maximum number of running instance for a job have been reached even after it previously informed that the job finished successfully, I found this on the logs:
2015-10-28 22:17:42,137 INFO Running job "ping (trigger: interval[0:01:00], next run at: 2015-10-28 22:18:42 VET)" (scheduled at 2015-10-28 22:17:42-04:30)
2015-10-28 22:17:44,157 INFO Job "ping (trigger: interval[0:01:00], next run at: 2015-10-28 22:18:42 VET)" executed successfully
2015-10-28 22:18:42,335 WARNING Execution of job "ping (trigger: interval[0:01:00], next run at: 2015-10-28 22:18:42 VET)" skipped: maximum number of running instances reached (1)
2015-10-28 22:19:42,171 WARNING Execution of job "ping (trigger: interval[0:01:00], next run at: 2015-10-28 22:19:42 VET)" skipped: maximum number of running instances reached (1)
2015-10-28 22:20:42,181 WARNING Execution of job "ping (trigger: interval[0:01:00], next run at: 2015-10-28 22:20:42 VET)" skipped: maximum number of running instances reached (1)
2015-10-28 22:21:42,175 WARNING Execution of job "ping (trigger: interval[0:01:00], next run at: 2015-10-28 22:21:42 VET)" skipped: maximum number of running instances reached (1)
2015-10-28 22:22:42,205 WARNING Execution of job "ping (trigger: interval[0:01:00], next run at: 2015-10-28 22:22:42 VET)" skipped: maximum number of running instances reached (1)
as you can see on the logs the ping job was reported to be executed successfully but shortly after the next execution is skipped from that point.
this is the code I use to schedule jobs:
executors = {'default': ThreadPoolExecutor(10)}
jobstores = {'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')}
self.scheduler = BackgroundScheduler(executors = executors,jobstores=jobstores)
...
self.scheduler.add_job(func=func,
trigger=trigger,
kwargs=kwargs,
id=plan_id,
name=name,
misfire_grace_time=misfire_grace_time,
replace_existing=True)
the function itself that is being run starts some threads to execute the ping command over several network nodes and saves the results to a file
threads = []
for link in links:
thread = Thread(target = ping_test, args = (link,count,interval,timeout))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
notice that the timeout is set to a number much lower than the trigger interval so it's impossible that the job is still executing when the next run triggers.
Any insights on this problem are highly appreciated.