Is there a way to set a maximum running time for AWS Batch jobs (or queues)? This is a standard setting in most batch managers, which avoids wasting resources when a job hangs for whatever reason.
As of April, 2018, AWS Batch now supports setting a Job Timeout when submitting a Job, or in the job definition.
You specify an attemptDurationSeconds parameter, which must be at least 60 seconds, either in your job definition, or when you submit the job. When this number of seconds has passed following the job attempt's startedAt timestamp, AWS Batch terminates the job. On the compute resource, your job's container receives a SIGTERM signal to give your application a chance to shut down gracefully; if the container is still running after 30 seconds, a SIGKILL signal is sent to forcefully shut down the container.
Source: https://docs.aws.amazon.com/batch/latest/userguide/job_timeouts.html
POST /v1/submitjob HTTP/1.1
Content-type: application/json
{
...
"timeout": {
"attemptDurationSeconds": number
}
}
AFAIK there is no feature to do this. However, a workaround was suggested in the forum for a similar question.
One idea is to call Batch as an Activity from Step Functions, pingback back on a schedule (e.g. every minute) from that job. If it stops responding then you can detect that situation as a Timeout in the activity and act accordingly (terminate the job etc.). Not an ideal solution (especially if the job continues to ping back as a "zombie"), but it's a start. You'd also likely have to store activity tokens in a database to trace them to Batch job id.
Alternatively, you split that setup into 2 steps, and schedule a Batch job from a Lambda in the first state, then pass the Batch job id to the second step which then polls Batch (from another Lambda) for its state with Retry and IntervalSeconds (e.g. once every minute, or even with exponential backoff), and MaxAttempts calculated based on your timeout. This way, you don't need any external state storage mechanism, long polling or even a "ping back" from the job (it CAN be a zombie), but the downside is more steps.
There is no option to set timeout
on batch job but you can setup a lambda function that triggers every 1 hour or so and deletes jobs created before say 24 hours
.
working with aws for some time now and could not find a way to set a maximum running time for batch jobs. However there are some alternative way which you could utilize. AWS Forum
Sadly there is no way to set the limit execution time on AWS Batch. One solution may be to edit the docker's entry point to schedule the execution time limit.
© 2022 - 2024 — McMap. All rights reserved.