Question and details
How can I allow a Kubernetes cluster in Azure to talk to an Azure Container Registry via terraform?
I want to load custom images from my Azure Container Registry. Unfortunately, I encounter a permissions error at the point where Kubernetes is supposed to download the image from the ACR.
What I have tried so far
My experiments without terraform (az cli)
It all works perfectly after I attach the acr to the aks via az cli:
az aks update -n myAKSCluster -g myResourceGroup --attach-acr <acrName>
My experiments with terraform
This is my terraform configuration; I have stripped some other stuff out. It works in itself.
terraform {
backend "azurerm" {
resource_group_name = "tf-state"
storage_account_name = "devopstfstate"
container_name = "tfstatetest"
key = "prod.terraform.tfstatetest"
}
}
provider "azurerm" {
}
provider "azuread" {
}
provider "random" {
}
# define the password
resource "random_string" "password" {
length = 32
special = true
}
# define the resource group
resource "azurerm_resource_group" "rg" {
name = "myrg"
location = "eastus2"
}
# define the app
resource "azuread_application" "tfapp" {
name = "mytfapp"
}
# define the service principal
resource "azuread_service_principal" "tfapp" {
application_id = azuread_application.tfapp.application_id
}
# define the service principal password
resource "azuread_service_principal_password" "tfapp" {
service_principal_id = azuread_service_principal.tfapp.id
end_date = "2020-12-31T09:00:00Z"
value = random_string.password.result
}
# define the container registry
resource "azurerm_container_registry" "acr" {
name = "mycontainerregistry2387987222"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Basic"
admin_enabled = false
}
# define the kubernetes cluster
resource "azurerm_kubernetes_cluster" "mycluster" {
name = "myaks"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "mycluster"
network_profile {
network_plugin = "azure"
}
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_B2s"
}
# Use the service principal created above
service_principal {
client_id = azuread_service_principal.tfapp.application_id
client_secret = azuread_service_principal_password.tfapp.value
}
tags = {
Environment = "demo"
}
windows_profile {
admin_username = "dingding"
admin_password = random_string.password.result
}
}
# define the windows node pool for kubernetes
resource "azurerm_kubernetes_cluster_node_pool" "winpool" {
name = "winp"
kubernetes_cluster_id = azurerm_kubernetes_cluster.mycluster.id
vm_size = "Standard_B2s"
node_count = 1
os_type = "Windows"
}
# define the kubernetes name space
resource "kubernetes_namespace" "namesp" {
metadata {
name = "namesp"
}
}
# Try to give permissions, to let the AKR access the ACR
resource "azurerm_role_assignment" "acrpull_role" {
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.tfapp.object_id
skip_service_principal_aad_check = true
}
This code is adapted from https://github.com/terraform-providers/terraform-provider-azuread/issues/104.
Unfortunately, when I launch a container inside the kubernetes cluster, I receive an error message:
Failed to pull image "mycontainerregistry.azurecr.io/myunittests": [rpc error: code = Unknown desc = Error response from daemon: manifest for mycontainerregistry.azurecr.io/myunittests:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://mycontainerregistry.azurecr.io/v2/myunittests/manifests/latest: unauthorized: authentication required]
Update / note:
When I run terraform apply
with the above code, the creation of resources is interrupted:
azurerm_container_registry.acr: Creation complete after 18s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222]
azurerm_role_assignment.acrpull_role: Creating...
azuread_service_principal_password.tfapp: Still creating... [10s elapsed]
azuread_service_principal_password.tfapp: Creation complete after 12s [id=000/000]
azurerm_kubernetes_cluster.mycluster: Creating...
azurerm_role_assignment.acrpull_role: Creation complete after 8s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222/providers/Microsoft.Authorization/roleAssignments/000]
azurerm_kubernetes_cluster.mycluster: Still creating... [10s elapsed]
Error: Error creating Managed Kubernetes Cluster "myaks" (Resource Group "myrg"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ServicePrincipalNotFound" Message="Service principal clientID: 000 not found in Active Directory tenant 000, Please see https://aka.ms/aks-sp-help for more details."
on test.tf line 56, in resource "azurerm_kubernetes_cluster" "mycluster":
56: resource "azurerm_kubernetes_cluster" "mycluster" {
I think, however, that this is just because it takes a few minutes for the service principal to be created. When I run terraform apply
again a few minutes later, it goes beyond that point without issues.
azurerm_container_registry.acr.id
, but should be fine both ways, tbh – Dispensationterraform apply
run after creating the service principal. I have modified the scope as you suggested, but the image is still not pulled. :( – Adlerterraform destroy
the resources and re-create them - and everything was great then (the same thing did not work before the changes were applied). Thanks! – Adlerobject_id
that was missing. – Adler