GitLab Runner suddenly fails to run jobs using Docker Machine and AWS Autoscaling
Asked Answered
E

5

18

I use GitLab Runner for running CI jobs on AWS EC2 spot instances, using its autoscaling feature with Docker Machine.

All of a sudden, today GitLab CI failed to run jobs and shows me the following job output for all jobs that I want to start:

Running with gitlab-runner 14.9.1 (f188edd7)
  on AWS EC2 runner ...
Preparing the "docker+machine" executor
10:05
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Job failed (system failure): exit status 1

I see in the AWS console that the EC2 instances do get created, but the instances always get stopped immediately by GitLab Runner again.

The GitLab Runner system logs show me the following errors:

ERROR: Machine creation failed                      error=exit status 1 name=runner-eauzytys-gitlab-ci-1651050768-f84b471e time=1m2.409578844s
ERROR: Error creating machine: Error running provisioning: error installing docker:   driver=amazonec2 name=runner-xxxxxxxx-gitlab-ci-1651050768-f84b471e operation=create

So the error seams somehow to be related to Docker machine. Upgrading GitLab Runner as well as GitLab's Docker Machine fork to the newest versions do not fix the error. I'm using GitLab 14.8 and tried GitLab Runner 14.9 and 14.10.

What can be the reason for this?

Exaggeration answered 27/4, 2022 at 9:44 Comment(0)
E
20

Update:

In the meantime, GitLab have released a new version of their Docker Machine fork which upgrades the default AMI to Ubuntu 20.04. That means that upgrading Docker Machine to the latest version released by GitLab will fix the issue without changing your runner configuration. The latest release can be found here.

Original Workaround/fix:

Explicitly specify the AMI in your runner configuration and do not rely on the default one anymore, i.e. add something like "amazonec2-ami=ami-02584c1c9d05efa69" to your MachineOptions:

MachineOptions = [
  "amazonec2-access-key=xxx",
  "amazonec2-secret-key=xxx",
  "amazonec2-region=eu-central-1",
  "amazonec2-vpc-id=vpc-xxx",
  "amazonec2-subnet-id=subnet-xxx",
  "amazonec2-use-private-address=true",
  "amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
  "amazonec2-security-group=ci-runners",
  "amazonec2-instance-type=m5.large",
  "amazonec2-ami=ami-02584c1c9d05efa69",  # Ubuntu 20.04 for amd64 in eu-central-1
  "amazonec2-request-spot-instance=true",
  "amazonec2-spot-price=0.045"
]

You can get a list of Ubuntu AMI IDs here. Be sure to select one that fits your AWS region and instance architecture and is supported by Docker.

Explanation:

The default AMI that GitLab Runner / the Docker Machine EC2 driver use is Ubuntu 16.04. The install script for Docker, which is available on https://get.docker.com/ and which Docker Machine relies on, seems to have stopped supporting Ubuntu 16.04 recently. Thus, the installation of Docker fails on the EC2 instance spawned by Docker Machine and the job cannot run.

See also this GitLab issue.

Azure and GCP suffer from similar problems.

Exaggeration answered 27/4, 2022 at 9:44 Comment(6)
just to facilitate: If you wana use the same ubuntu that you runner is configured on you can get the ami by goint on: EC2->instances->instance ID (from the mannager) -> details -> AMI ID.Besprent
I've updated to the 20.04 AMI however it's still not working. It's creating the machines then shutting them down in a loop. I've also tried the 18 AMI but have the same problem. Has anyone else come across this?Montmartre
@Montmartre are you sure that you selected an AMI with the correct AWS region and architecture?Exaggeration
@Exaggeration - Yes - the runners are starting up with the correct AMI and I can see them in a docker-machine ls. However they never complete the initialisation and gitlab-runner shuts them down to start new ones. I've started them manually but it appears as if docker never gets installed as docker-machine can't connect!Montmartre
@Montmartre GitLab have released an updated version of Docker Machine which fixes the issue. I updated the answer accordingly. You could try to upgrade Docker Machine and use the default AMI again.Exaggeration
@Exaggeration - Just tried the update and still getting the same issue!Montmartre
P
3

Make sure to select an ami for Ubuntu and not Debian and that your aws account is subscribed to it

What I did

  1. subscribe in aws marketplace to a Ubuntu Amazon Image (Ubuntu 20.04 LTS - Focal)
  2. select launch instance, choose the region, and copy the ami shown
Pirtle answered 27/4, 2022 at 18:14 Comment(0)
G
0

I had the same issue since yesterday.

It could be related to GitLab releasing 15.0 with breaking changes (going live on GitLab.com sometime between April 23 – May 22)

Adding field AMI solved the issue on my side.

Glucinum answered 27/4, 2022 at 13:16 Comment(1)
I'm still on GitLab 14, so I assume that's not the issue.Exaggeration
S
0

As Moritz pointed out:

Adding:

MachineOptions = [

  "amazonec2-ami=ami-02584c1c9d05efa69",
]

solves the issue.

Shiloh answered 28/4, 2022 at 7:38 Comment(0)
E
0

Just wanted to add as well, go here for the ubuntu that corresponds with your region. Amis are region specific

Estop answered 28/4, 2022 at 10:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.