I've been trying to manage an Azure Kubernetes Service (AKS) instance via Terraform. When I create the AKS instance via the Azure CLI per this MS tutorial, then install an ingress controller with a static public IP, per this MS tutorial, everything works fine. This method implicitly creates a service principal (SP).
When I create an otherwise exact duplicate of the AKS cluster via Terraform, I am forced to supply the service principal explicitly. I gave this new SP "Contributor" access to the cluster's entire resource group yet, when I get to the step to create the ingress controller (using the same command that tutorial 2 provided, above: helm install stable/nginx-ingress --set controller.replicaCount=2 --set controller.service.loadBalancerIP="XX.XX.XX.XX"
), the ingress service comes up but it never acquires its public IP. The IP status remains "<pending>" indefinitely, and I can find nothing in any log about why. Are there logs that should tell me why my IP is still pending?
Again, I am fairly certain that, other than the SP, the Terraform AKS cluster is an exact duplicate of the one created based on the MS tutorial. Running terraform plan
finds no differences between the two. Does anyone have any idea what permission my AKS SP might need or what else I might be missing here? Strangely, I can't find ANY permissions assigned to the implicitly created principal via the Azure portal, but I can't think of anything else that might be causing this behavior.
Not sure if it's a red herring or not, but other users have complained about a similar problem in the context of issues opened against the second tutorial. Their fix always appears to be "tear down your cluster and retry", but that isn't an acceptable solution in this context. I need a reproducible working cluster and azurerm_kubernetes_cluster doesn't currently allow for building an AKS instance with an implicitly created SP.