How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?
Asked Answered
D

3

4

Our cloud backend setup contains 5 Cloud SQL for Postgres instances. We manage our infrastructure using Terraform. We are using connecting them from GKE using a public IP and the Cloud SQL container.

In order to simplify our setup we wish to get rid of the proxy containers by moving to a private IP. I tried following the Terraform guide. While a creating a single instance works fine, trying to create 5 instances simultaneously ends in 4 failed ones and one successful: Failed instance list in the GCP console

The error which appears in the Google Clod Console on the failed instances is "An Unknown Error occurred": Failed instance with error message in the GCP console

Following is the code which reproduces it. Pay attention to the count = 5 line:

resource "google_compute_network" "private_network" {
  provider = "google-beta"

  name = "private-network"
}

resource "google_compute_global_address" "private_ip_address" {
  provider = "google-beta"

  name = "private-ip-address"
  purpose = "VPC_PEERING"
  address_type = "INTERNAL"
  prefix_length = 16
  network = "${google_compute_network.private_network.self_link}"
}

resource "google_service_networking_connection" "private_vpc_connection" {
  provider = "google-beta"

  network = "${google_compute_network.private_network.self_link}"
  service = "servicenetworking.googleapis.com"
  reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}

resource "google_sql_database_instance" "instance" {
  provider = "google-beta"
  count = 5

  name = "private-instance-${count.index}"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

provider "google-beta" {
  version = "~> 2.5"
  credentials = "credentials.json"
  project = "PROJECT_ID"
  region = "us-central1"
  zone = "us-central1-a"
}

I tried several alternatives:

  • Waiting a minute after creating the google_service_networking_connection and then creating all the instances simultaneously, but I got the same error.
  • Creating an address range and a google_service_networking_connection per instance, but I got an error that google_service_networking_connection cannot be created simultaneously.
  • Creating an address range per instance and a single google_service_networking_connection which links to all of them, but I got the same error.
Droll answered 5/5, 2019 at 9:50 Comment(0)
D
4

Found an ugly yet working solution. There is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker as well.

One alternative is adding a dependence between the instances. This allows their creation to complete successfully. However, each instance takes several minutes to create. This accumulates to many spent minutes. If we add an artificial delay of 60 seconds between instance creation, we manage to avoid the failures. Notes:

  • The needed amount of seconds to delay depends on the instance tier. For example, for db-f1-micro, 30 seconds were enough. They were not enough for db-custom-1-3840.
  • I am not sure what is the exact number of needed seconds for db-custom-1-3840. 30 seconds were not enough, 60 were.

Following is a the code sample to resolve the issue. It shows 2 instances only since due to depends_on limitations I could not use the count feature and showing the full code for 5 instances would be very long. It works the same for 5 instances:

resource "google_compute_network" "private_network" {
  provider = "google-beta"

  name = "private-network"
}

resource "google_compute_global_address" "private_ip_address" {
  provider = "google-beta"

  name = "private-ip-address"
  purpose = "VPC_PEERING"
  address_type = "INTERNAL"
  prefix_length = 16
  network = "${google_compute_network.private_network.self_link}"
}

resource "google_service_networking_connection" "private_vpc_connection" {
  provider = "google-beta"

  network = "${google_compute_network.private_network.self_link}"
  service = "servicenetworking.googleapis.com"
  reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}

locals {
  db_instance_creation_delay_factor_seconds = 60
}

resource "null_resource" "delayer_1" {
  depends_on = ["google_service_networking_connection.private_vpc_connection"]

  provisioner "local-exec" {
    command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
  }
}

resource "google_sql_database_instance" "instance_1" {
  provider = "google-beta"

  name = "private-instance-delayed-1"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection",
    "null_resource.delayer_1"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

resource "null_resource" "delayer_2" {
  depends_on = ["google_service_networking_connection.private_vpc_connection"]

  provisioner "local-exec" {
    command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
  }
}

resource "google_sql_database_instance" "instance_2" {
  provider = "google-beta"

  name = "private-instance-delayed-2"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection",
    "null_resource.delayer_2"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

provider "google-beta" {
  version = "~> 2.5"
  credentials = "credentials.json"
  project = "PROJECT_ID"
  region = "us-central1"
  zone = "us-central1-a"
}

provider "null" {
  version = "~> 1.0"
}
Droll answered 5/5, 2019 at 12:18 Comment(0)
M
1

In case someone lands here with a slightly different case (creating google_sql_database_instance in a private network results in an "Unknown error"):

  1. Launch one Cloud SQL instance manually (this will enable servicenetworking.googleapis.com and some other APIs for the project it seems)
  2. Run your manifest
  3. Terminate the instance created in step 1.

Works for me after that

¯_(ツ)_/¯

Multiply answered 30/9, 2020 at 19:26 Comment(2)
when you said "Launch one Cloud SQL instance manually" what do you mean ? a non private one ?Aloeswood
using the browser to open the GCP console and going through the "wizards" :DMultiply
A
0

I land here with a slightly different case, same as @Grigorash Vasilij (creating google_sql_database_instance in a private network results in an "Unknown error").

I was using the UI to deploy an SQL instance on a private VPC, for some reason that trows me an "Unknown error" as well. I finally solved using the gcloud command instead (why that works and no the UI? IDK, maybe the UI is not doing the same as the command)

gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
       --network=[VPC_NETWORK_NAME]
       --no-assign-ip 

follow this for more details

Aloeswood answered 4/3, 2021 at 16:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.