How to give permissions to AKS to access ACR via terraform?
Asked Answered
A

4

17

Question and details

How can I allow a Kubernetes cluster in Azure to talk to an Azure Container Registry via terraform?

I want to load custom images from my Azure Container Registry. Unfortunately, I encounter a permissions error at the point where Kubernetes is supposed to download the image from the ACR.

What I have tried so far

My experiments without terraform (az cli)

It all works perfectly after I attach the acr to the aks via az cli:

az aks update -n myAKSCluster -g myResourceGroup --attach-acr <acrName>

My experiments with terraform

This is my terraform configuration; I have stripped some other stuff out. It works in itself.

terraform {
  backend "azurerm" {
    resource_group_name  = "tf-state"
    storage_account_name = "devopstfstate"
    container_name       = "tfstatetest"
    key                  = "prod.terraform.tfstatetest"
  }
}

provider "azurerm" {
}
provider "azuread" {
}
provider "random" {
}

# define the password
resource "random_string" "password" {
  length  = 32
  special = true
}

# define the resource group
resource "azurerm_resource_group" "rg" {
        name = "myrg"
        location = "eastus2"
}

# define the app
resource "azuread_application" "tfapp" {
  name                       = "mytfapp"
}

# define the service principal
resource "azuread_service_principal" "tfapp" {
  application_id = azuread_application.tfapp.application_id
}

# define the service principal password
resource "azuread_service_principal_password" "tfapp" {
  service_principal_id = azuread_service_principal.tfapp.id
  end_date = "2020-12-31T09:00:00Z"
  value = random_string.password.result
}

# define the container registry
resource "azurerm_container_registry" "acr" {
  name                     = "mycontainerregistry2387987222"
  resource_group_name      = azurerm_resource_group.rg.name
  location                 = azurerm_resource_group.rg.location
  sku                      = "Basic"
  admin_enabled            = false
}

# define the kubernetes cluster
resource "azurerm_kubernetes_cluster" "mycluster" {
  name                = "myaks"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  dns_prefix          = "mycluster"
  network_profile {
    network_plugin      = "azure"
  }

  default_node_pool {
    name       = "default"
    node_count = 1
    vm_size    = "Standard_B2s"
  }
  # Use the service principal created above
  service_principal {
    client_id     = azuread_service_principal.tfapp.application_id
    client_secret = azuread_service_principal_password.tfapp.value
  }
  tags = {
    Environment = "demo"
  }
  windows_profile {
    admin_username = "dingding"
    admin_password = random_string.password.result
  }
}

# define the windows node pool for kubernetes
resource "azurerm_kubernetes_cluster_node_pool" "winpool" {
  name                  = "winp"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.mycluster.id
  vm_size               = "Standard_B2s"
  node_count            = 1
  os_type       = "Windows"
}

# define the kubernetes name space
resource "kubernetes_namespace" "namesp" {
  metadata {
    name = "namesp"
  }
}

# Try to give permissions, to let the AKR access the ACR
resource "azurerm_role_assignment" "acrpull_role" {
  scope                            = azurerm_container_registry.acr.id
  role_definition_name             = "AcrPull"
  principal_id                     = azuread_service_principal.tfapp.object_id
  skip_service_principal_aad_check = true
}

This code is adapted from https://github.com/terraform-providers/terraform-provider-azuread/issues/104.

Unfortunately, when I launch a container inside the kubernetes cluster, I receive an error message:

Failed to pull image "mycontainerregistry.azurecr.io/myunittests": [rpc error: code = Unknown desc = Error response from daemon: manifest for mycontainerregistry.azurecr.io/myunittests:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://mycontainerregistry.azurecr.io/v2/myunittests/manifests/latest: unauthorized: authentication required]

Update / note:

When I run terraform apply with the above code, the creation of resources is interrupted:

azurerm_container_registry.acr: Creation complete after 18s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222]
azurerm_role_assignment.acrpull_role: Creating...
azuread_service_principal_password.tfapp: Still creating... [10s elapsed]
azuread_service_principal_password.tfapp: Creation complete after 12s [id=000/000]
azurerm_kubernetes_cluster.mycluster: Creating...
azurerm_role_assignment.acrpull_role: Creation complete after 8s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222/providers/Microsoft.Authorization/roleAssignments/000]
azurerm_kubernetes_cluster.mycluster: Still creating... [10s elapsed]

Error: Error creating Managed Kubernetes Cluster "myaks" (Resource Group "myrg"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ServicePrincipalNotFound" Message="Service principal clientID: 000 not found in Active Directory tenant 000, Please see https://aka.ms/aks-sp-help for more details."

  on test.tf line 56, in resource "azurerm_kubernetes_cluster" "mycluster":
  56: resource "azurerm_kubernetes_cluster" "mycluster" {

I think, however, that this is just because it takes a few minutes for the service principal to be created. When I run terraform apply again a few minutes later, it goes beyond that point without issues.

Adler answered 30/1, 2020 at 1:52 Comment(4)
this looks fine, are you using a pullsecret by any chance? and just to clarify, this runs without any errors? might want to change scope to azurerm_container_registry.acr.id, but should be fine both ways, tbhDispensation
I had to slightly modify it to run it isolated; code updated. I have also added a note about an interruption that occurs in the terraform apply run after creating the service principal. I have modified the scope as you suggested, but the image is still not pulled. :(Adler
Yay - it actually does work with the modifications. I had to entirely terraform destroy the resources and re-create them - and everything was great then (the same thing did not work before the changes were applied). Thanks!Adler
It might have been the object_id that was missing.Adler
M
12

This code worked for me.


resource "azuread_application" "aks_sp" {
  name = "sp-aks-${local.cluster_name}"
}

resource "azuread_service_principal" "aks_sp" {
  application_id               = azuread_application.aks_sp.application_id
  app_role_assignment_required = false
}

resource "azuread_service_principal_password" "aks_sp" {
  service_principal_id = azuread_service_principal.aks_sp.id
  value                = random_string.aks_sp_password.result
  end_date_relative    = "8760h" # 1 year

  lifecycle {
    ignore_changes = [
      value,
      end_date_relative
    ]
  }
}

resource "azuread_application_password" "aks_sp" {
  application_object_id = azuread_application.aks_sp.id
  value                 = random_string.aks_sp_secret.result
  end_date_relative     = "8760h" # 1 year

  lifecycle {
    ignore_changes = [
      value,
      end_date_relative
    ]
  }
}

data "azurerm_container_registry" "pyp" {
  name                = var.container_registry_name
  resource_group_name = var.container_registry_resource_group_name
}

resource "azurerm_role_assignment" "aks_sp_container_registry" {
  scope                = data.azurerm_container_registry.pyp.id
  role_definition_name = "AcrPull"
  principal_id         = azuread_service_principal.aks_sp.object_id
}

# requires Azure Provider 1.37+
resource "azurerm_kubernetes_cluster" "pyp" {
  name                = local.cluster_name
  location            = azurerm_resource_group.pyp.location
  resource_group_name = azurerm_resource_group.pyp.name
  dns_prefix          = local.env_name_nosymbols
  kubernetes_version  = local.kubernetes_version

  default_node_pool {
    name            = "default"
    node_count      = 1
    vm_size         = "Standard_D2s_v3"
    os_disk_size_gb = 80
  }

  windows_profile {
    admin_username = "winadm"
    admin_password = random_string.windows_profile_password.result
  }

  network_profile {
    network_plugin     = "azure"
    dns_service_ip     = cidrhost(local.service_cidr, 10)
    docker_bridge_cidr = "172.17.0.1/16"
    service_cidr       = local.service_cidr
    load_balancer_sku  = "standard"
  }

  service_principal {
    client_id     = azuread_service_principal.aks_sp.application_id
    client_secret = random_string.aks_sp_password.result
  }

  addon_profile {
    oms_agent {
      enabled                    = true
      log_analytics_workspace_id = azurerm_log_analytics_workspace.pyp.id
    }
  }

  tags = local.tags
}

source https://github.com/giuliov/pipeline-your-pipelines/tree/master/src/kubernetes/terraform

Momentum answered 30/1, 2020 at 9:59 Comment(1)
Yes; worked! I had to destroy the entire resource set and re-apply for it to become effective. This took a while; needed to find a good time window, hence the late answer. Thank you!Adler
V
33

(I did up the answer above)

Just adding a simpler way where you don't need to create a service principal for anyone else that might need it.

resource "azurerm_kubernetes_cluster" "kubweb" {
  name                = local.cluster_web
  location            = local.rgloc
  resource_group_name = local.rgname
  dns_prefix          = "${local.cluster_web}-dns"
  kubernetes_version  = local.kubversion

  # used to group all the internal objects of this cluster
  node_resource_group = "${local.cluster_web}-rg-node"

  # azure will assign the id automatically
  identity {
    type = "SystemAssigned"
  }

  default_node_pool {
    name                 = "nodepool1"
    node_count           = 4
    vm_size              = local.vm_size
    orchestrator_version = local.kubversion
  }

  role_based_access_control {
    enabled = true
  }

  addon_profile {
    kube_dashboard {
      enabled = true
    }
  }

  tags = {
    environment = local.env
  }
}

resource "azurerm_container_registry" "acr" {
  name                = "acr1"
  resource_group_name = local.rgname
  location            = local.rgloc
  sku                 = "Standard"
  admin_enabled       = true

  tags = {
    environment = local.env
  }
}

# add the role to the identity the kubernetes cluster was assigned
resource "azurerm_role_assignment" "kubweb_to_acr" {
  scope                = azurerm_container_registry.acr.id
  role_definition_name = "AcrPull"
  principal_id         = azurerm_kubernetes_cluster.kubweb.kubelet_identity[0].object_id
}
Vhf answered 15/11, 2020 at 4:44 Comment(2)
its so funky that the cluster data resource has like 3-4 principal ids. This seems to be the right one though. At least its matching the one from here az aks show --resource-group groupName --name aksName --query identityProfile.kubeletidentity.objectId which is stated by other people to be the right one.Corrugation
This should be the correct answer.Zabaglione
M
12

This code worked for me.


resource "azuread_application" "aks_sp" {
  name = "sp-aks-${local.cluster_name}"
}

resource "azuread_service_principal" "aks_sp" {
  application_id               = azuread_application.aks_sp.application_id
  app_role_assignment_required = false
}

resource "azuread_service_principal_password" "aks_sp" {
  service_principal_id = azuread_service_principal.aks_sp.id
  value                = random_string.aks_sp_password.result
  end_date_relative    = "8760h" # 1 year

  lifecycle {
    ignore_changes = [
      value,
      end_date_relative
    ]
  }
}

resource "azuread_application_password" "aks_sp" {
  application_object_id = azuread_application.aks_sp.id
  value                 = random_string.aks_sp_secret.result
  end_date_relative     = "8760h" # 1 year

  lifecycle {
    ignore_changes = [
      value,
      end_date_relative
    ]
  }
}

data "azurerm_container_registry" "pyp" {
  name                = var.container_registry_name
  resource_group_name = var.container_registry_resource_group_name
}

resource "azurerm_role_assignment" "aks_sp_container_registry" {
  scope                = data.azurerm_container_registry.pyp.id
  role_definition_name = "AcrPull"
  principal_id         = azuread_service_principal.aks_sp.object_id
}

# requires Azure Provider 1.37+
resource "azurerm_kubernetes_cluster" "pyp" {
  name                = local.cluster_name
  location            = azurerm_resource_group.pyp.location
  resource_group_name = azurerm_resource_group.pyp.name
  dns_prefix          = local.env_name_nosymbols
  kubernetes_version  = local.kubernetes_version

  default_node_pool {
    name            = "default"
    node_count      = 1
    vm_size         = "Standard_D2s_v3"
    os_disk_size_gb = 80
  }

  windows_profile {
    admin_username = "winadm"
    admin_password = random_string.windows_profile_password.result
  }

  network_profile {
    network_plugin     = "azure"
    dns_service_ip     = cidrhost(local.service_cidr, 10)
    docker_bridge_cidr = "172.17.0.1/16"
    service_cidr       = local.service_cidr
    load_balancer_sku  = "standard"
  }

  service_principal {
    client_id     = azuread_service_principal.aks_sp.application_id
    client_secret = random_string.aks_sp_password.result
  }

  addon_profile {
    oms_agent {
      enabled                    = true
      log_analytics_workspace_id = azurerm_log_analytics_workspace.pyp.id
    }
  }

  tags = local.tags
}

source https://github.com/giuliov/pipeline-your-pipelines/tree/master/src/kubernetes/terraform

Momentum answered 30/1, 2020 at 9:59 Comment(1)
Yes; worked! I had to destroy the entire resource set and re-apply for it to become effective. This took a while; needed to find a good time window, hence the late answer. Thank you!Adler
D
8

The Terraform documentation for the Azure Container Registry resource now keeps track of this, which should always be up to date.

https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/container_registry#example-usage-attaching-a-container-registry-to-a-kubernetes-cluster

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_container_registry" "example" {
  name                = "containerRegistry1"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
}

resource "azurerm_kubernetes_cluster" "example" {
  name                = "example-aks1"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  dns_prefix          = "exampleaks1"

  default_node_pool {
    name       = "default"
    node_count = 1
    vm_size    = "Standard_D2_v2"
  }

  identity {
    type = "SystemAssigned"
  }

  tags = {
    Environment = "Production"
  }
}

resource "azurerm_role_assignment" "example" {
  principal_id                     = azurerm_kubernetes_cluster.example.kubelet_identity[0].object_id
  role_definition_name             = "AcrPull"
  scope                            = azurerm_container_registry.example.id
  skip_service_principal_aad_check = true
}
Dysphemism answered 20/4, 2022 at 9:26 Comment(1)
Thanks for this clarity. This worked. Didn't find kubelet_identity[0].object_id output in the AKS documentation - registry.terraform.io/providers/hashicorp/azurerm/latest/docs/…. Found it here - registry.terraform.io/providers/hashicorp/azurerm/latest/docs/…Spiky
F
2

Just want to go into more depth as this was something I struggled with as-well.

The recommended approach is to use Managed Identities instead of Service Principal due to less overhead.

Create a Container Registry:

 resource "azurerm_container_registry" "acr" {
  name                = "acr"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  sku                 = "Standard"
  admin_enabled       = false
}

Create a AKS Cluster, the code below creates the AKS Cluster with 2 Identities:

  1. A System Assigned Identity which is assigned to the Control Plane.
  2. A User Assigned Managed Identity is also automatically created and assigned to the Kubelet, notice I have no specific code for that as it happens automatically.

The Kubelet is the process which goes to the Container Registry to pull the image, thus we need to make sure this User Assigned Managed Identity has the AcrPull Role on the Container Registry.

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "aks"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  dns_prefix          = "aks"
  node_resource_group = "aks-node"
 
  default_node_pool {
    name                = "default"
    node_count          = 1
    vm_size             = "Standard_Ds2_v2"
    enable_auto_scaling = false
    type                = "VirtualMachineScaleSets"
    vnet_subnet_id      = azurerm_subnet.aks_subnet.id
    max_pods            = 50
  }
 
  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "Standard"
  }
 
  identity {
    type = "SystemAssigned"
  }
}

Create Role Assignment mentioned above to allow the User Assigned Managed Identity to Pull from the Container Registry.

resource "azurerm_role_assignment" "ra" {
  principal_id                     =  azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
  role_definition_name             = "AcrPull"
  scope                            = azurerm_container_registry.acr.id
  skip_service_principal_aad_check = true
}

Hope that clears things up for you, as I have seen some confusion on the internet about the two identities created.

source: https://jimferrari.com/2022/02/09/attach-azure-container-registry-to-azure-kubernetes-service-terraform/

Foregoing answered 9/2, 2022 at 12:31 Comment(1)
In my case (the same as described above) it does not work. It only works when I run this oneliner: az aks update -n myAKSCluster -g myResourceGroup --attach-acr <acr-name>Adversity

© 2022 - 2024 — McMap. All rights reserved.