Google App Engine and Cloud SQL: Lost connection to MySQL server at 'reading initial communication packet'
Asked Answered
E

4

12

I have a Django app on Google App Engine app which is connected to a Google Cloud SQL, using the App Engine authentication.

Most of the time everything works fine, but from time to time the following exception is raised:

OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 38")

According to the docs, this error is returned when:

If Google Cloud SQL rejects the connection, for example, because the IP address your client is connecting from is not authorized.

This doesn't make much sense in my case, because the authentication is done by the App Engine server.

What might cause these sporadic errors?

Erechtheum answered 5/8, 2014 at 12:53 Comment(6)
Just to make sure, your application is deployed to the cloud right? You're not running localhost?Oleary
@Oleary yes, it's on the GAE cloud.Erechtheum
I couldn't exactly find much info about error 38. But most errors regarding Losing connections to MySQL server at 'reading initial command.. etc' had to do with the SQL settings, particularly timeouts and authorization, but these were all localhost issues. Take a look at this doc: developers.google.com/cloud-sql/docs/admin-api/v1beta1/… and see if any setting you can modify on your Cloud SQL instance that could resolve this issue.Oleary
Did you set your app to run only on EU servers?Oleary
@Oleary Thanks. I couldn't find any setting that seems related to my issue. Most of the time everything works well so I don't want to change anything in my production environment unless I know it will solve my issue. I did not restrict my app to EU.Erechtheum
I have the same issue from time to time. I run Django 1.5 on AppEngine using CloudSQL and get the same exact error occasionally.Seif
B
16

I had a similar issue and ended up contacting Google for help. They explained it happens when they need to restart or move an instance. If the client instance restarted or was moved to another host server (for various versions) the IP’s won’t match and throw that error. They mentioned that the servers may restart for patches, errors and slow downs causing a similar behavior (be it the same error or similar). The server also moves to try and be closer to the instances to increase response times. If you send a request during the move it will throw errors.

They told me I need to code in retry catches incase that happens, similar to how you handle datastore timeouts. Keeping in mind to build in back off mechanics, sending too many request too quickly after a restart could cause a crash.

How often does this happen?

Bevatron answered 14/8, 2014 at 18:27 Comment(8)
developers.google.com/cloud-sql/faq#maintenancerestart developers.google.com/appengine/articles/… en.wikipedia.org/wiki/Exponential_backoffBevatron
Thanks, It's very interesting to hear Google's response. We actually do have retries in our code, and exponential backoff as well, but maybe too few retries.. How many retries does your code do and with what backoff? Did the retries solved the problem completely?Erechtheum
For me I did 3 retires if it still failed I sent it to a taskqueue. You can go higher depending on if your hitting the global timeout for the instance. Its very rare for mine to hit the taskqueue but I've seen it once or twice. How long are you waiting and does it happen more than a couple times a month that the it gets through the retires?Bevatron
It happens a lot more than twice a month.. 5 retries with 5 sec delay and x2 backoff. It's a basic scaling instance so no global timeout.Erechtheum
Just found out that there was some library code that wasn't wrapped with retries. I'm adding retries, let's wait and see if this solves the problem.Erechtheum
That is a lot more than I was getting. Let me know if the new back offs help.Bevatron
So far it looks like after adding the missing retries it solved the problem. You have earned your bounty with honor :)Erechtheum
It was my pleasure to help ;)Bevatron
H
3

In our case we had renamed the instances incorrectly inside the code. When we changed back to the correct names everything worked fine. Make sure your Cloud SQL instance is named correctly both inside the Google Cloud Console and within the code you use to access it, and make sure that your Cloud SQL instance allows your Google App Engine instance to connect to it it's Access control.

Hooke answered 13/5, 2015 at 15:53 Comment(1)
This is not related to original question. 99% of the time its working just fineHandcraft
O
1

In my case the issue was caused my expired server SSL certificate on the CloudSQL instance. Strangely it was not shown in the Google Cloud Console and figured it out after downloading the certificate and decoding it with openssl (openssl x509 -in server-ca.pem -text -noout).

I was able to figure out cause of the problem after trying to connect with cloud_sql_proxy; luckily it gave more meaningful error message couldn't connect to "...": x509: certificate has expired or is not yet valid.

Connection from AppEngine Standard application started to work immediately after reseting SSL configuration from Google Cloud Console. I noticed that after reset validity date appeared on the console.

Olwen answered 30/5, 2018 at 8:16 Comment(1)
Reseting SSL helped me even though downloaded certificate was still validAltamira
I
-1

I had this problem too using Django 1.10 and GAE. The application worked fine locally (connecting the cloud sql via cloud_sql_proxy), but I'd get the 38 error when using the GAE instance of the application.

My problem turned out to be my database user. The user had a hyphen in it. Once I created a new user without a hyphen and changed my application to use the new user, the GAE instance of the application worked if

Illomened answered 23/3, 2017 at 15:22 Comment(1)
This is not related to original question. 99% of the time its working just fineHandcraft

© 2022 - 2024 — McMap. All rights reserved.