cloud-init: delay disk_setup and fs_setup
Asked Answered
O

4

7

I have a cloud-init file that sets up all requirements for our AWS instances, and part of those requirements is formating and mounting an EBS volume. The issue is that on some instances volume attachment occurs after the instance is up, so when cloud-init executes the volume /dev/xvdf does not yet exist and it fails.

I have something like:

#cloud-config

resize_rootfs: false
disk_setup:
    /dev/xvdf:
        table_type: 'gpt'
        layout: true
        overwrite: false

fs_setup:
    - label: DATA
      filesystem: 'ext4'
      device: '/dev/xvdf'
      partition: 'auto'

mounts:
    - [xvdf, /data, auto, "defaults,discard", "0", "0"]

And would like to have something like a sleep 60 or something like that before the disk configuration block.

If the whole cloud-init execution can be delayed, that would also work for me.

Also, I'm using terraform to create the infrastructure.

Thanks!

Oidium answered 25/9, 2020 at 22:15 Comment(1)
actually: Attached devices are from a system perspective hotplug devices. So you can utilize udev rules. Actually most distributions have some to create links the way Amazon Linux does also for other cloud providers. Be warned the disk setup is not available with all distributions. You may need to add the module manually(available in the cloud init source). Restart is simpler, for that more practical. If worried about the hackish nature:: udev, hotplug handler provides the trigger when a disk is attached( attaching is an additional aws call at unknown time...)Dichotomy
O
5

I was able to resolve the issue with two changes:

  1. Changed the mount options, adding nofail option.
  2. Added a line to the runcmd block, deleting the semaphore file for disk_setup.

So my new cloud-init file now looks like this:

#cloud-config

resize_rootfs: false
disk_setup:
    /dev/xvdf:
        table_type: 'gpt'
        layout: true
        overwrite: false

fs_setup:
    - label: DATA
      filesystem: 'ext4'
      device: '/dev/xvdf'
      partition: 'auto'

mounts:
    - [xvdf, /data, auto, "defaults,discard", "0", "0"]
    
runcmd:
    - [rm, -f, /var/lib/cloud/instances/*/sem/config_disk_setup]

power_state:
    mode: reboot
    timeout: 30

It will reboot, then it will execute the disk_setup module once more. By this time, the volume will be attached so the operation won't fail.

I guess this is kind of a hacky way to solve this, so if someone has a better answer (like how to delay the whole cloud-init execution) please share it.

Oidium answered 28/9, 2020 at 20:25 Comment(1)
How would this work for non-NVME disks?Alible
K
4

I guess cloud-init does have an option for running adhoc commands. have a look into this link.

https://cloudinit.readthedocs.io/en/latest/topics/modules.html?highlight=runcmd#runcmd

Not sure what your code looks like, but I just tried to pass the below as user_data in AWS and could see that the init script sleep for 1000 seconds... ( Just added a couple of echo statements to check later). I guess you can add a little more logic as well to verify the presence of the volume.

#cloud-config

runcmd:
 - [ sh, -c, "echo before sleep:`date` >> /tmp/user_data.log" ]
 - [ sh, -c, "sleep 1000" ]
 - [ sh, -c, "echo after sleep:`date` >> /tmp/user_data.log" ]
 
<Rest of the script> 
Kulturkampf answered 28/9, 2020 at 1:40 Comment(2)
Forgot to mention that I'd like to keep the formating within the cloud-init space, not relying on commands, since I was given the task to translate the old shell scripts into cloud.init. Anyway, if it's not possible to delay this, then I'd have to give it a try. ;)Oidium
sure. post back just in case if you happen to find any diff options. also, using "runcmd" is still part of the the cloud-init, but just that you will still executing linux cmds (but still no external shell scripts... so tidy to some extent at least). cheers, good luck!Kulturkampf
P
1

I know this already has an accepted answer. But I just went through this exercise and solved it a slightly different way by waiting for the disk instead of rebooting and running again. Here was my solution:

#cloud-config
bootcmd:
  - |
    timeout 30s sh -c 'while [ ! -e /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_${volid} ]; do sleep 1; done'
device_aliases:
  my_data: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_${volid}
disk_setup:
  my_data:
    table_type: gpt
    layout: true
    overwrite: false
fs_setup:
  - label: my_data
    filesystem: xfs
    partition: any
    device: my_data
    overwrite: false
mounts:
  - [my_data, /opt/splunk, xfs]

My provisioner (in this case Terraform) replaces ${volid} with the volume ID that I expect attached to the instance (which comes from a function like replace(aws_ebs_volume.splunk_data[count.index].id, "-", "")). This may be helpful to someone as an alternative way of achieving the goal.

Purificator answered 23/1 at 18:40 Comment(1)
I believe the the bootcmd to wait for the device can be replaced by an mount option like [my_data, /opt/splunk, xfs, "defaults,x-systemd.device-timeout=30"] see systemd.mount documentationPrevailing
P
0

Building up from this other answer.

There are two solutions depending on the cloud-init version

  • cloud-init >= 24.2, using device_aliases, disk_setup, fs_setup and mounts with x-systemd.device-timeout
  • cloud-init < 24.2, using disk_setup and mounts with x-systemd.device-timeout and x-systemd.makefs (this options make a substitute for fs_setup not working with nvme partition on cloud-init < 24.2)

cloud-init >= 24.2

If you can use cloud-init 24.2 (released July 2024) you can partition, format and mount EBS volumes (that are exposed as NVMe in AWS Nitro instances, see Amazon EBS and NVMe) like this (tested on Fedora 41 Rawhide 20240711):

#cloud-config
device_aliases:
  disk1: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30
disk_setup:
  disk1:
    table_type: gpt
    layout: [50,25,25]
    overwrite: false
fs_setup:
  - label: disk1-earth
    filesystem: xfs
    device: disk1
    partition: 1
  - label: disk1-mars
    filesystem: xfs
    device: disk1
    partition: 2
  - label: disk1-venus
    filesystem: xfs
    device: disk1
    partition: 3
mounts:
  - [ LABEL=disk1-earth, /earth, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
  - [ LABEL=disk1-mars,  /mars, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
  - [ LABEL=disk1-venus, /venus, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
mounts_default_fields: [ None, None, "auto", "defaults,nofail", "0", "2"]

cloud-init 24.2 is required if you want to partition the disk since previous versions do not work with NVMe (see #5246 that was fixed by #5263 and released on cloud-init 24.2)

If you don't need several partitions you can use any reasonably recent cloud-init.

The x-systemd.device-timeout=30 in the mount options tells mount to wait 30 seconds for the device to become available providing the delay requested by the OP.

You can verify the proper partitioning, formatting and mount afterwards with the following commands

sudo blkid -s LABEL
lsblk -o name,size,mountpoint,label
findmnt --fstab

cloud-init < 24.2

If your distro does not have cloud-init 24.2, you can't use fs_setup for NVMe with partitions (see bug #5246 that was fixed by #5263 and released on cloud-init 24.2.

Since you can't use fs_setup, you need to use x-systemd.makefs on the mount options. fs_setup did also serve to assign a disk partition label and that you can't do via mount options, so you "lose" the ability of giving it a label.

#cloud-config
device_aliases:
  disk1: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30
disk_setup:
  disk1:
    table_type: gpt
    layout: [50,25,25]
    overwrite: false
fs_setup:
  - label: disk1-earth
    filesystem: xfs
    device: disk1
    partition: 1
  - label: disk1-mars
    filesystem: xfs
    device: disk1
    partition: 2
  - label: disk1-venus
    filesystem: xfs
    device: disk1
    partition: 3
mounts:
- [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part1, /earth, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
  - [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part2, /mars, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
  - [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part3, /venus, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
mounts_default_fields: [ None, None, "auto", "defaults,nofail", "0", "2"]

You can verify the proper partitioning, formatting and mount afterwards with the following commands

lsblk -o name,size,mountpoint,label
findmnt --fstab
Prevailing answered 12/7 at 7:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.