EDIT: It seems that "rds-ca-ecc384-g1" just doesn't work with RDS proxy (anymore?), the proxy just stopped working again. Switching to "rds-ca-rsa4096-g1" makes it works again.
Original post:
For anyone else who is experiencing this, but none of the answers / debugging steps have helped: switch your underlying RDS cluster's Certificate authority to something else en back to what it was again.
This took met days of extremely frustrating trial and error to figure out, and I have absolutely no idea why this worked, it was pure luck that had me try this.
The CA that I use on my Aurora cluster is "rds-ca-ecc384-g1", and RDS proxy was working fine up until a few days ago. It suddenly stopped working without any changes. The behavior was exactly as the original question described.
After exhausting all logical debugging steps, including recreating the proxy, secrets, roles, networks, etc. multiple times, I just started changing random things.
As soon as I switched the CA on the Aurora cluster to "rds-ca-rsa4096-g1" the proxy started working again. I had to do this on both the primary instance and the read replicas. Before I switched the read replica CA, the primary target reported healthy, but the read replica target started reporting:
"TargetHealth": {
"State": "UNAVAILABLE",
"Reason": "CONNECTION_FAILED",
"Description": "Database handshake failed"
}
Previously, just like the primary, it also reported:
"TargetHealth": {
"State": "UNAVAILABLE",
"Description": "DBProxy Target unavailable due to an internal error"
}
After switching the read replica CA it also started reporting healthy. To be honest this makes no sense.
I have now switched them back to "rds-ca-ecc384-g1" and the proxy is still working (at least for now).