Recover from trigger ERROR state after Job constructor threw an exception?
Asked Answered
O

6

8

When using Quartz.net to schedule jobs, I occasionally receive an exception when instantiating a job. This, in turn causes Quartz to set the trigger for the job to an error state. When this occurs, the trigger will cease firing until some manual intervention occurs (restarting the service since I'm using in-memory job scheduling).

How can I prevent the error state from being set, or at the very least, tell Quartz to retry triggers that are in the error state?

The reason for the exception is due to flaky network calls that are required to get configuration data that is passed in to the job's constructor. I'm using a custom IJobFactory to do this.

I've seen other references to this without resolutions:

Organo answered 28/8, 2015 at 14:35 Comment(1)
FYI: We observe the exact same behaviour in the Java implementation of QuartzKoerlin
C
5

Actually the best way to reset Trigger from ERROR state is:

private final SchedulerFactoryBean schedulerFactoryBean;
Scheduler scheduler = schedulerFactoryBean.getScheduler();

TriggerKey triggerKey = TriggerKey.triggerKey(triggerName, triggerGroup);
if (scheduler.getTriggerState(triggerKey).equals(Trigger.TriggerState.ERROR)) {
    scheduler.resetTriggerFromErrorState(triggerKey);
}

Note:

You should never modify the records in a table from a third-party library or software manually. All changes should be made through the API to that library if there is any functionality.

JobStoreSupport.resetTriggerFromErrorState

Cleanthes answered 21/1, 2021 at 11:13 Comment(3)
Please note that my answer was related to the .NET implementation, which has not such api yetDelano
@GiacomoDeLiberali sorry if you have noticed my comment as personal, but in reality it is like a note to keep in mindCleanthes
resetTriggerFromErrorState will immediately triggerd the cronTriggers which might not be right way of doing it. As clients sometimes have usecase where they don't want any trigger to fire immediately after recovering from error state.Valaria
O
16

For the record, I consider this a design flaw of Quartz. If a job can't be constructed once, that doesn't mean it can't always be constructed. This is a transient error and should be treated as such. Stopping all future scheduled jobs violates the principle of least astonishment.

Anyway, my hack solution is to catch any errors that are the result of my job construction and instead of throwing an error or returning null to return a custom IJob instead that simply logs an error. This isn't perfect, but at least it doesn't prevent future triggering of the job.

public IJob NewJob(TriggerFiredBundle bundle, IScheduler scheduler)
{
    try
    {
        var job = this.container.Resolve(bundle.JobDetail.JobType) as IJob;
        return job;
    }
    catch (Exception ex)
    {
        this.logger.Error(ex, "Exception creating job. Giving up and returning a do-nothing logging job.");
        return new LoggingJob(this.logger);
    }
}
Organo answered 31/8, 2015 at 16:14 Comment(0)
J
8

When exception occurs on trigger instatiating IJob class, then trigger change it TRIGGER_STATE to ERROR, and then trigger in this state will no longer fire.

To reenable trigger your need to change it state to WAITING, and then it could to fire again. Here the example how your can reenable yours misfired trigger.

var trigerKey = new TriggerKey("trigerKey", "trigerGroup");
if (scheduler.GetTriggerState(trigerKey) == TriggerState.Error)
{
    scheduler.ResumeTrigger(trigerKey);
}
Jevons answered 11/11, 2016 at 17:33 Comment(0)
C
5

Actually the best way to reset Trigger from ERROR state is:

private final SchedulerFactoryBean schedulerFactoryBean;
Scheduler scheduler = schedulerFactoryBean.getScheduler();

TriggerKey triggerKey = TriggerKey.triggerKey(triggerName, triggerGroup);
if (scheduler.getTriggerState(triggerKey).equals(Trigger.TriggerState.ERROR)) {
    scheduler.resetTriggerFromErrorState(triggerKey);
}

Note:

You should never modify the records in a table from a third-party library or software manually. All changes should be made through the API to that library if there is any functionality.

JobStoreSupport.resetTriggerFromErrorState

Cleanthes answered 21/1, 2021 at 11:13 Comment(3)
Please note that my answer was related to the .NET implementation, which has not such api yetDelano
@GiacomoDeLiberali sorry if you have noticed my comment as personal, but in reality it is like a note to keep in mindCleanthes
resetTriggerFromErrorState will immediately triggerd the cronTriggers which might not be right way of doing it. As clients sometimes have usecase where they don't want any trigger to fire immediately after recovering from error state.Valaria
S
2

How can I prevent the error state from being set, or at the very least, tell Quartz to retry triggers that are in the error state?

Unfortunately, in current version, you cannot retry those triggers. As per the documentation of Quartz,

It should be extremely rare for this method to throw an exception - basically only the case where there is no way at all to instantiate and prepare the Job for execution. When the exception is thrown, the Scheduler will move all triggers associated with the Job into the state, which will require human intervention (e.g. an application restart after fixing whatever configuration problem led to the issue with instantiating the Job).

Sinter answered 28/8, 2015 at 16:30 Comment(2)
I can catch the exception in my JobFactory and retry the creation. But is there any way, using the TriggerFiredBundle to skip the further processing of the trigger?Organo
What I meant to say was skip that current firing of the job.Organo
L
2

Simply put, you should follow good object oriented practices: constructors should not throw exceptions. Try to move pulling of configuration data to job's execution phase (Execute method) where retries will be handled correctly. This might mean providing a service/func via constructor that allows pulling the data.

Landscape answered 31/8, 2015 at 8:53 Comment(1)
I'm using a IoC container to construct the job, along with its constructors arguments. As such, I'm not throwing exceptions in my constructors. I'd consider this an oversight of Quartz.net. It is possible that my job may not be able to be built some of the time and it should be treated as a transient error, not a fatal one. I could return a null object, but that would just push the errors further down. I have a rather hack-ish solution that seems to work that I will post.Organo
D
1

To change the trigger state to WAITING the author also suggests that a way could be to manually update the database.

[...] You might need to update database manually, but yeah - if jobs cannot be instantiated it's considered quite bad thing and Quartz will flag them as broken.

I created another job scheduled at app startup that updates the triggers in error state to recover them.

UPDATE QRTZ_TRIGGERS SET [TRIGGER_STATE] = 'WAITING' WHERE [TRIGGER_STATE] = 'ERROR'

More information in this github discussion.

Delano answered 18/1, 2021 at 15:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.