Terraform: Error: Kubernetes cluster unreachable: invalid configuration
Asked Answered
C

10

29

After deleting kubernetes cluster with "terraform destroy" I can't create it again anymore.

"terraform apply" returns the following error message:

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Here is the terraform configuration:

terraform {
  backend "s3" {
    bucket = "skyglass-msur"
    key    = "terraform/backend"
    region = "us-east-1"
  }
}

locals {
  env_name         = "staging"
  aws_region       = "us-east-1"
  k8s_cluster_name = "ms-cluster"
}

variable "mysql_password" {
  type        = string
  description = "Expected to be retrieved from environment variable TF_VAR_mysql_password"
}

provider "aws" {
  region = local.aws_region
}

data "aws_eks_cluster" "msur" {
  name = module.aws-kubernetes-cluster.eks_cluster_id
}

module "aws-network" {
  source = "github.com/skyglass-microservices/module-aws-network"

  env_name              = local.env_name
  vpc_name              = "msur-VPC"
  cluster_name          = local.k8s_cluster_name
  aws_region            = local.aws_region
  main_vpc_cidr         = "10.10.0.0/16"
  public_subnet_a_cidr  = "10.10.0.0/18"
  public_subnet_b_cidr  = "10.10.64.0/18"
  private_subnet_a_cidr = "10.10.128.0/18"
  private_subnet_b_cidr = "10.10.192.0/18"
}

module "aws-kubernetes-cluster" {
  source = "github.com/skyglass-microservices/module-aws-kubernetes"

  ms_namespace       = "microservices"
  env_name           = local.env_name
  aws_region         = local.aws_region
  cluster_name       = local.k8s_cluster_name
  vpc_id             = module.aws-network.vpc_id
  cluster_subnet_ids = module.aws-network.subnet_ids

  nodegroup_subnet_ids     = module.aws-network.private_subnet_ids
  nodegroup_disk_size      = "20"
  nodegroup_instance_types = ["t3.medium"]
  nodegroup_desired_size   = 1
  nodegroup_min_size       = 1
  nodegroup_max_size       = 5
}

# Create namespace
# Use kubernetes provider to work with the kubernetes cluster API
provider "kubernetes" {
  # load_config_file       = false
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.msur.certificate_authority.0.data)
  host                   = data.aws_eks_cluster.msur.endpoint
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws-iam-authenticator"
    args        = ["token", "-i", "${data.aws_eks_cluster.msur.name}"]
  }
}

# Create a namespace for microservice pods
resource "kubernetes_namespace" "ms-namespace" {
  metadata {
    name = "microservices"
  }
}

P.S. There seems to be the issue with terraform kubernetes provider for 0.14.7

I couldn't use "load_config_file" = false in this version, so I had to comment it, which seems to be the reason of this issue.

P.P.S. It could also be the issue with outdated cluster_ca_certificate, which terraform tries to use: deleting this certificate could be enough, although I'm not sure, where it is stored.

Clem answered 1/3, 2021 at 17:53 Comment(4)
Can this issue be fixed without adding export KUBE_CONFIG_PATH=/path/to/.kube/config? I mean in case of, for example, AWS codepipeline it can not work.Aliaalias
@Aliaalias Try this command: aws eks --region ${var.region} update-kubeconfig --name ${var.cluster_name} Make sure you have access to your AWS account inside AWS codepipeline. Make sure your AWS EKS cluster exists at the time of furnning the script. Replace var.region and var.cluster_name with your own values.Clem
Thanks, @Mykhailo Skliar for the help. By the way, I noticed a strange behavior: after upgrading to Terraform 0.15.5, my problem suddenly disappears.Aliaalias
since i encountered this so i can share my experience. It happens when you move your resources(while refactoring likely) to a different modules. Terraform could not reconcile the state and resources and gives this error. I maybe wrong here but that has been my experience.Idealize
H
50

Before doing something radical like manipulating the state directly, try setting the KUBE_CONFIG_PATH variable:

export KUBE_CONFIG_PATH=/path/to/.kube/config

After this rerun the plan or apply command. This has fixed the issue for me.

Hamite answered 3/3, 2021 at 7:39 Comment(1)
for some reason my env had KUBECONFIG set instead which seemed to work - until recently. KUBE_CONFIG_PATH werks for me. cool!Drucie
O
12

I had the same issue. I even manually deleted the EKS cluster which really messed up the terraform state.

However, after wasting a few hours, I found out that there is a very simple solution.

You can run

terraform state rm <resource_type>.<resource_name>

I just executed

terraform state rm `terraform state list | grep eks`

to remove all the entries for a particular service from state file in a safe manner.

Orfield answered 15/6, 2021 at 13:7 Comment(2)
Thanks I solved my problem by terraform state rm terraform state list | grep eks terraform state rm terraform state list | grep kubectl terraform state rm terraform state list | grep helmEvetta
After trying all the other answers, this one helped me. I had to manually delete resources which messed up the state. terraform state rm fixed the issueSeppala
H
2

This happened to me when I was needed to make an update to the cluster that requires to delete some resources. You can also try to run terraform apply -refresh=false and just let it destroy it.

Hindsight answered 21/9, 2022 at 19:16 Comment(0)
C
1

Deleting terraform state S3 bucket on AWS solved the issue.

Clem answered 1/3, 2021 at 17:53 Comment(0)
M
1

In my case this error was occurring when I was trying to destroy resources with 'tf destroy'

The logical solution for me was to do the following action:

  1. Run 'tf apply -refresh=true' on terraform state, where you bootstrap K8S cluster. This is the workspace, where you output K8S credentials (k8s_cluster_access_token)
  2. Run 'tf apply -refresh=true' on terraform state, that is using above K8S credentials to create K8S resources.
  3. Run 'tf destroy' (finished with success)
Murray answered 29/8, 2022 at 13:59 Comment(0)
H
1

I solved this by using the official helm provider instead of the kubernetes one.

First, we list the required providers:

terraform {
  backend "s3" {
    bucket  = "..."
    key     = "..."
    region  = "..."
    profile = "..."
  }

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.49"
    }

    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.16.1"
    }

    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.8.0"
    }
  }
}

Then, we configure the provider:

data "aws_eks_cluster" "cluster" {
  name = var.cluster_name
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
    exec {
      api_version = "client.authentication.k8s.io/v1"
      command = "aws"
      args = [
        "eks",
        "get-token",
        "--cluster-name",
        data.aws_eks_cluster.cluster.name,
        "--profile",
        var.profile
      ]
    }
  }
}

Finally, we add charts via the helm_release resources:

resource "helm_release" "foo" {
  name             = "foo"
  chart            = "foo"
  repository       = "https://foo.bar/chart"
  namespace        = "foo"
  create_namespace = true
  values           = [templatefile("${path.module}/chart/values.yaml", {})]
}
Harveyharvie answered 14/1, 2023 at 11:31 Comment(0)
C
0

Deleting .terraform sub-folder in the folder where you run "terraform" command should also solve the issue.

I didn't try it for this exact situation, but I had a similar issue today, so I decided to share another solution. It seems less radical, than deleting S3 bucket.

Clem answered 12/3, 2021 at 11:22 Comment(0)
E
0

The cause of this problem was that usually, I had something in my kubeconfig, but it was empty for some reason. Somewhat of a random fix, but when I reinitialized the config (I'm using Minikube as well, so I started minikube) Terraform was then happy.

I'm curious if using the aws command line to update the kubeconfig would work in your case. https://docs.aws.amazon.com/cli/latest/reference/eks/update-kubeconfig.html

Elephus answered 27/7, 2023 at 14:48 Comment(0)
C
0

If you don't know your KUBECONFIG path, use this command:

aws eks --region ${var.region} update-kubeconfig --name ${var.cluster_name}

  • Make sure you have access to your AWS account inside your terraform (or any other script).
  • Make sure your AWS EKS cluster exists at the time of running the script.
  • Replace var.region and var.cluster_name with your own values.

You can even create terraform null_resource, which will automatically update your KUBECONFIG path:

resource "null_resource" "update_kubeconfig" {
  provisioner "local-exec" {
    command = "aws eks --region ${var.region} update-kubeconfig --name ${var.cluster_name}"
  }
  depends_on = [your_kubernetes_cluster_resource]
}
Clem answered 7/11, 2023 at 18:52 Comment(0)
S
0

I also ran into that error when trying to perform a terraform plan. This happens to use Azure but I imagine that this scenario would play out in GCP, AWS as well.

azurerm_kubernetes_cluster.main: Drift detected (update)
|
│ Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
│ 
│ Error: Failed to get RESTMapper client
│ 
│ cannot create discovery client: no client config

The key was in that (update) log above. After running another plan with -target=CLUSTER_NAME to avoid the kubernetes error (e.g. -target=azurerm_kubernetes_cluster.main), I was able to see that the plan was causing terraform to plan on recreating the Kubernetes cluster.

My kubernetes and helm providers depend on that cluster, so they became unusable in the plan, hence the errors.

Soldiery answered 18/7 at 17:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.