I'm currently using code like this to detect if a SQL server job is running. (this is SQL Server 2005, all SP's)
return (select isnull(
(select top 1 CASE
WHEN current_execution_status = 4 THEN 0
ELSE 1
END
from openquery(devtestvm, 'EXEC msdb.dbo.sp_help_job')
where current_execution_status = 4 and
name = 'WQCheckQueueJob' + cast(@Index as varchar(10))
), 1)
)
No problems there, and generally speaking, it works just fine.
But.... (always a but)
On occasion, I'll invoke this, get back a "job is not running" result, at which point I'll try and start the job, via
exec msdb.dbo.sp_start_job @JobName
and SQL will return that "SQLAgent has refused to start the job because it already has a pending request".
Ok. Also not a problem. It's conceivable that there's a slight window where the target job could get started before this code can start it, but after checking if it's started. However, I can just wrap that up in a try catch and just ignore the error, right?
begin try
if dbo.WQIsQueueJobActive(@index) = 0 begin
exec msdb.dbo.sp_start_job @JobName
break
end
end try begin catch
-- nothing here
end catch
here's the problem, though.
9 times out of 10, this works just fine. SQL agent will raise the error, it's caught, and processing just continues on, since the job is already running, no harm no foul.
But occasionally, I'll get a message in the Job History view (keep in mind the above code to detect if a specific job is running and start it if not is actually running from another job) saying that the job failed because "SQLAgent has refused to start the job because it already has a pending request".
Of course, this is the exact error that TRY CATCH is supposed to be handling!
When this happens, the executing job just dies, but not immediately from what I can tell, just pretty close. I've put logging all over the place and there's no consistency. One time it fails, it'll be at place a, the next time at place b. In some cases, Place A and place B have nothing but a
select @var = 'message'
in between them. Very strange. Basically, the job appears to be unceremoniously dumped and anything left to execute in the job is +not+ executed at all.
However, if I remove the "exec StartJob" (or have it invoked exactly one time, when I KNOW that the target job can't already be running), everything works perfectly and all my processing in the job runs through.
The purpose behind all this is to have a job started as a result of a trigger (among other things), and, if the job is already started, there's really no need to "start it again".
Anyone ever run into behavior like this with SQL Agent's Job handling?
EDIT: Current flow of control is like so:
- Change to a table (update or insert)...
- fires trigger which calls...
- a stored proc which calls...
- sp_Start_Job which...
- starts a specific job which...
- calls another stored proc (called CheckQueue) which...
- performs some processing and...
- checks several tables and depending on their contents might...
- invoke sp_start_job on another job to start up a second, simultaneous job to process the additional work (this second job calls the CheckQueue sproc also but the two invocations operate on completely separate sets of data)