What does Kubernetes cronjobs `startingDeadlineSeconds` exactly mean?
Asked Answered
C

1

47

In Kubernetes cronjobs, It is stated in the limitations section that

Jobs may fail to run if the CronJob controller is not running or broken for a span of time from before the start time of the CronJob to start time plus startingDeadlineSeconds, or if the span covers multiple start times and concurrencyPolicy does not allow concurrency.

What I understand from this is that, If the startingDeadlineSeconds is set to 10 and the cronjob couldn't start for some reason at its scheduled time, then it can still be attempted to start again as long as those 10 seconds haven't passed, however, after the 10 seconds, it for sure won't be started, is this correct?

Also, If I have concurrencyPolicy set to Forbid, does K8s count it as a fail if a cronjob tries to be scheduled, when there is one already running?

Countless answered 27/6, 2018 at 14:36 Comment(0)
C
84

After investigating the code base of the Kubernetes repo, so this is how the CronJob controller works:

  1. The CronJob controller will check the every 10 seconds the list of cronjobs in the given Kubernetes Client.

  2. For every CronJob, it checks how many schedules it missed in the duration from the lastScheduleTime till now. If there are more than 100 missed schedules, then it doesn't start the job and records the event:

    "FailedNeedsStart", "Cannot determine if job needs to be started. Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew."

It is important to note, that if the field startingDeadlineSeconds is set (not nil), it will count how many missed jobs occurred from the value of startingDeadlineSeconds till now. For example, if startingDeadlineSeconds = 200, It will count how many missed jobs occurred in the last 200 seconds. The exact implementation of counting how many missed schedules can be found here.

  1. In case there are not more than a 100 missed schedules from the previous step, the CronJob controller will check if the time now is not after the time of its scheduledTime + startingDeadlineSeconds , i.e. that it's not too late to start the job (passed the deadline). If it wasn't too late, the job will continue to be attempted to be started by the CronJob Controller. However, If it is already too late, then it doesn't start the job and records the event:

    "Missed starting window for {cronjob name}. Missed scheduled time to start a job {scheduledTime}"

It is also important to note, that if the field startingDeadlineSeconds is not set, then it means there is no deadline at all. This means the job will be attempted to start by the CronJob controller without checking if it's later or not.

Therefore to answer the questions above:

1. If the startingDeadlineSeconds is set to 10 and the cronjob couldn't start for some reason at its scheduled time, then it can still be attempted to start again as long as those 10 seconds haven't passed, however, after the 10 seconds, it for sure won't be started, is this correct?

The CronJob controller will attempt to start the job and it will be successfully scheduled if the 10 seconds after it's schedule time haven't passed yet. However, if the deadline has passed, it won't be started this run, and it will be counted as a missed schedule in later executions.

2. If I have concurrencyPolicy set to Forbid, does K8s count it as a fail if a cronjob tries to be scheduled, when there is one already running?

Yes, it will be counted as a missed schedule. Since missed schedules are calculated as I stated above in point 2.

Countless answered 28/6, 2018 at 10:11 Comment(6)
Hesham. I have a job which missed it's schedule more than 100 times and I have set the concurrencyPolicy to Forbid and did not set the startingDeadlineSeconds. How can I prevent it by the "Too many missed start time (>100)" err, as based on the data size the job make take less or more time.Marketing
e.g. suppose a job is scheduled to run every one minute at 5 AM and concurrencyPolicy is set to Forbid. And if the first job (5 AM one) completed it's job at 7 AM, we have missed 119 schedules for 2 hours which is >100 and the cronjob will throw the err What if I add startingDeadlineSeconds to 200 seconds will it check how many missed schedules happened in last 200 seconds or (2 hours + 200 seconds)Marketing
@Marketing Setting startingDeadlineSeconds to 200, will count the missed schedules that happened in the last 200 seconds -> only 3 missed schedules in your case since it's scheduled to run every 1 minute. Which means that the controller will start scheduling it again. As desired.Countless
This design is very very confusing.Isochronal
Just checking, when you say "... is set (not nil), then it means there is no deadline...", is this backwards? Based on it being repeated text from earlier in the answer, and based on the behavior you describe, it seems like that should be "... is unset (nil), then it means there is no deadline..."Ole
@Ole From the docs of 4.7: If not specified, there is no deadline. I will correct the answer's text.Ancona

© 2022 - 2024 — McMap. All rights reserved.