How do I view AWS Batch Compute Environment Errors?
Asked Answered
A

1

1

We setup a batch compute environment, job queue, and job definition. The min CPUs for the compute environment is set to 16, so it should always have at least one EC2 instance running. It's a MANAGED environment. It is not starting any, yet everything is still reporting healthy. I've looked at the troubleshooting page and nothing useful has come of it yet.

Where can I go to see what is going wrong? Is this completely a black box and if I make a mistake somewhere in my config (Probable some kind of ARN permissions problem), I have to scan every line till I happen to see the mistake?

Astyanax answered 12/9, 2018 at 18:48 Comment(0)
A
6

The answer is, look at EC2 Auto Scaling Groups. There should be an autoscaling group named after the compute environment. All of the errors for starting EC2 instances should be in that auto scaling group, which is created and managed by the batch compute environment.

Astyanax answered 12/9, 2018 at 18:59 Comment(2)
So, we found this by looking for potential errors in the ARN (using the default recommended one) and noticing it was providing permissions for autoscaling. It seems like the troubleshooting page ought to mention looking in the autoscaling group for errors; it could have saved us (and maybe others) a lot of time.Astyanax
I discovered this recently when an admin deleted the AMI we were using for AWS Batch. No new instances were being created. Looking at the errors for the autoscaling group, the source of the problem was obvious.Littlejohn

© 2022 - 2024 — McMap. All rights reserved.