RDS Proxy: PENDING_PROXY_CAPACITY and "DBProxy Target unavailable due to an internal error"
Asked Answered
C

3

9

When deploying an RDS database via Terraform, my Default target is unavailable. Running the following command: aws rds describe-db-proxy-targets --db-proxy-name <my_proxy_name_here>

I get two errors: initially its in state: PENDING_PROXY_CAPACITY eventually that times out with the following error: DBProxy Target unavailable due to an internal error

Cirillo answered 8/7, 2021 at 9:27 Comment(1)
Do you have any TF code that could showcase the issue and allow for its reproduction?Esophagitis
C
16

Following extensive research, a two hour call with AWS support and very few search results for the error: PENDING_PROXY_CAPACITY

I stumbled across the following discussion: https://github.com/hashicorp/terraform-provider-aws/issues/16379

I had a couple of issues with my config:

  1. Outbound rules for my RDS proxy security group was limited to internal traffic only. This causes problems as you need public internet access to access AWS Secrets manager!

  2. At the time of writing the Terraform documentation here suggests you can pass a "username" option to the Auth block for the rds_proxy resource (see: https://registry.terraform.io/providers/hashicorp/aws/4.26.0/docs/resources/db_proxy). This does not work, and returns an error stating the username option is not expected. This is because the rds_proxy expects all the information for Auth to be contained in one json object within the secret arn provided. For this reason I created a 2nd secret containing all the auth information like so:

resource "aws_secretsmanager_secret_version" "lambda_rds_test_proxy_creds" {
  secret_id     = aws_secretsmanager_secret.lambda_rds_test_proxy_creds.id
  secret_string = jsonencode({
    "username"             = aws_db_instance.lambda_rds_test.username
    "password"             = module.lambda_rds_secret.secret
    "engine"               = "postgres"
    "host"                 = aws_db_instance.lambda_rds_test.address
    "port"                 = 5432
    "dbInstanceIdentifier" = aws_db_instance.lambda_rds_test.id
  })
}
  1. Fixing both issues still gave me an Auth error for credentials, this required the IAM permissions fixing (this is discussed in the above github issue). But by creating the new Secret to contain all the info required both the proxy, It no longer had access to the new secret so I updated my IAM role for the newly created resource

I am posting this here as the Github issue is archived and I am unable to update the comments to include some of my search terms to assist those searching for the same issue to come across the issue quicker as there is very little info out there regarding RDS_PROXY errors experienced here.

Cirillo answered 8/7, 2021 at 9:27 Comment(1)
@WillBroadbend: in #1 above you mention the need for Internet access for proxy to reach Secrets Manager. Would a vpc-endpoint be a viable alternative? (my proxy is in a private subnet and no NAT GW). Even with vpc endpoint, Proxy SG should still allow outgoing TCP 443 for API Calls to Secrets ManagerSev
I
3

EDIT: It seems that "rds-ca-ecc384-g1" just doesn't work with RDS proxy (anymore?), the proxy just stopped working again. Switching to "rds-ca-rsa4096-g1" makes it works again.

Original post:

For anyone else who is experiencing this, but none of the answers / debugging steps have helped: switch your underlying RDS cluster's Certificate authority to something else en back to what it was again.

This took met days of extremely frustrating trial and error to figure out, and I have absolutely no idea why this worked, it was pure luck that had me try this.

The CA that I use on my Aurora cluster is "rds-ca-ecc384-g1", and RDS proxy was working fine up until a few days ago. It suddenly stopped working without any changes. The behavior was exactly as the original question described.

After exhausting all logical debugging steps, including recreating the proxy, secrets, roles, networks, etc. multiple times, I just started changing random things.

As soon as I switched the CA on the Aurora cluster to "rds-ca-rsa4096-g1" the proxy started working again. I had to do this on both the primary instance and the read replicas. Before I switched the read replica CA, the primary target reported healthy, but the read replica target started reporting:

"TargetHealth": {
   "State": "UNAVAILABLE",
   "Reason": "CONNECTION_FAILED",
   "Description": "Database handshake failed"
}

Previously, just like the primary, it also reported:

"TargetHealth": {
  "State": "UNAVAILABLE",
  "Description": "DBProxy Target unavailable due to an internal error"
}

After switching the read replica CA it also started reporting healthy. To be honest this makes no sense.

I have now switched them back to "rds-ca-ecc384-g1" and the proxy is still working (at least for now).

Insistent answered 7/10, 2023 at 12:4 Comment(1)
Thank you so much. Just switching CA to rds-ca-rsa4096-g1 made it workMinimus
K
2

I had this issue as well. My issue was that I had the cluster configured to use TLSv1.3. However the proxy only supports up to TLSv1.2, as of the time of this writing.

Kyte answered 9/8, 2023 at 20:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.