What are the connection limits for Google Cloud SQL from App Engine, and how to best reuse DB connections?
Asked Answered
T

2

12

I have a Google App Engine app that uses a Google Cloud SQL instance for storing data. I need my instance to be able to serve hundreds of clients at a time, via restful calls, which each result in one or a handful of DB queries. I've wrapped the methods that need DB access and store the handle to the DB connection in os.environ. See this SO question/answer for basically how I'm doing it.

However, as soon as a couple hundred clients connect to my app and trigger database calls, I start getting these errors in the Google App Engine error logs (and my app returns 500, of course):

could not connect: ApplicationError: 1033 Instance has too many concurrent requests: 100 Traceback (most recent call last): File "/base/python27_run

Any tips from experienced users of Google App Engine and Google Cloud SQL? Thanks in advance.

Here's the code for the decorator I use around methods that require DB connection:

def with_db_cursor(do_commit = False):
    """ Decorator for managing DB connection by wrapping around web calls.
    Stores connections and open connection count in the os.environ dictionary
    between calls.  Sets a cursor variable in the wrapped function. Optionally
    does a commit.  Closes the cursor when wrapped method returns, and closes
    the DB connection if there are no outstanding cursors.

    If the wrapped method has a keyword argument 'existing_cursor', whose value
    is non-False, this wrapper is bypassed, as it is assumed another cursor is
    already in force because of an alternate call stack.

    Based mostly on post by : Shay Erlichmen
    At: https://mcmap.net/q/658446/-what-39-s-a-good-approach-to-managing-the-db-connection-in-a-google-cloud-sql-gae-python-app
    """

    def method_wrap(method):
        def wrap(*args, **kwargs):
            if kwargs.get('existing_cursor', False):
                #Bypass everything if method called with existing open cursor
                vdbg('Shortcircuiting db wrapper due to exisiting_cursor')
                return  method(None, *args, **kwargs)

            conn = os.environ.get("__data_conn")

            # Recycling connection for the current request
            # For some reason threading.local() didn't work
            # and yes os.environ is supposed to be thread safe 
            if not conn:                    
                conn = _db_connect()
                os.environ["__data_conn"] = conn
                os.environ["__data_conn_ref"] = 1
                dbg('Opening first DB connection via wrapper.')
            else:
                os.environ["__data_conn_ref"] = (os.environ["__data_conn_ref"] + 1)
                vdbg('Reusing existing DB connection. Count using is now: {0}',
                    os.environ["__data_conn_ref"])        
            try:
                cursor = conn.cursor()
                try:
                    result = method(cursor, *args, **kwargs)
                    if do_commit or os.environ.get("__data_conn_commit"):
                        os.environ["__data_conn_commit"] = False
                        dbg('Wrapper executing DB commit.')
                        conn.commit()
                    return result                        
                finally:
                    cursor.close()                    
            finally:
                os.environ["__data_conn_ref"] = (os.environ["__data_conn_ref"] -
                        1)  
                vdbg('One less user of DB connection. Count using is now: {0}',
                    os.environ["__data_conn_ref"])
                if os.environ["__data_conn_ref"] == 0:
                    dbg("No more users of this DB connection. Closing.")
                    os.environ["__data_conn"] = None
                    db_close(conn)
        return wrap
    return method_wrap

def db_close(db_conn):
    if db_conn:
        try:
            db_conn.close()
        except:
            err('Unable to close the DB connection.', )
            raise
    else:
        err('Tried to close a non-connected DB handle.')
Theall answered 3/5, 2012 at 2:46 Comment(4)
Do you have threadsafe: true in app.yaml?Accrue
Using 'threadsafe: true' isn't going to work well with using os.environ since connections cannot be shared across threads. See my answer https://mcmap.net/q/658446/-what-39-s-a-good-approach-to-managing-the-db-connection-in-a-google-cloud-sql-gae-python-app for a threadsafe solution.Demetrademetre
@Theall Cleared lots of thing. If you can tell, what is difference between 100 Pending Connections & 250 Concurrent Connections per tier mentioned cloud.google.com/sql/docs/diagnose-issues#limits and cloud.google.com/sql/pricing#v1-pricingMcfarlin
@Kartik Sorry, I don't know. (But you should be a nice user and vote up good questions/answers you read on Stack Overflow. :-))Theall
D
15

Short answer: Your queries are probably too slow and the mysql server doesn't have enough threads to process all of the requests you are trying to send it.

Long Answer:

As background, Cloud SQL has two limits that are relevant here:

  • Connections: These correspond to the 'conn' object in your code. There is a corresponding datastructure on the server. Once you have too many of these objects (currently configured to 1000), the least recently used will automatically be closed. When a connection gets closed underneath you, you'll get an unknown connection error (ApplicationError: 1007) the next time you try to use that connection.
  • Concurrent Requests: These are queries that are executing on the server. Each executing query ties up a thread in the server, so there is a limit of 100. When there are too many concurrent requests, subsequent requests will be rejected with the error you are getting (ApplicationError: 1033)

It doesn't sound like the connection limit is affecting you, but I wanted to mention it just in case.

When it comes to Concurrent Requests, increasing the limit might help, but it usually makes the problem worse. There are two cases we've seen in the past:

  • Deadlock: A long running query is locking a critical row of the database. All subsequent queries block on that lock. The app times out on those queries, but they keep running on the server, tying up those threads until the deadlock timeout triggers.
  • Slow Queries: Each query is really, really slow. This usually happens when the query requires a temporary file sort. The application times out and retries the query while the first try of the query is still running and counting against the concurrent request limit. If you can find your average query time, you can get an estimate of how many QPS your mysql instance can support (e.g. 5 ms per query means 200 QPS for each thread. Since there are 100 threads, you could do 20,000 QPS. 50 ms per query means 2000 QPS.)

You should use EXPLAIN and SHOW ENGINE INNODB STATUS to see which of the two problems is going on.

Of course, it is also possible that you are just driving a ton of traffic at your instance and there just aren't enough threads. In that case, you'll probably be maxing out the cpu for the instance anyway, so adding more threads won't help.

Demetrademetre answered 3/5, 2012 at 17:9 Comment(6)
Thanks Ken. I don't see any error 1007, and I believe I'm closing the connections correctly. I also don't believe that slow queries are the culprit, as all queries are fairly simple (no joins, few group-bys) the tables are small (both horizontally and vertically), there are indexes (or equivalently unique constraints) on all columns being looked up by, etc. Maxing out CPU on the instance is not likely an issue either, from the dashboard state. So, I think this narrows it down to deadlocks. I am looking further into it. May ping you for advice later, if that's okay.Theall
In your snippet above, I don't see where the connection is closed. Perhaps you didn't include all of it?Demetrademetre
Also, a GROUP BY can still result in a filesort, so even one could be hurting you.Demetrademetre
Yes, sorry, copy/paste fail. I added the missing lines closing the connection and ending the functions. Thanks!Theall
Came here from Google Support - cleared a lot. ThanksRoveover
@KenAshcraft Cleared lots of thing. Thanks. However I am not able to get what is difference between 100 Pending Connections & 250 Concurrent Connections per tier mentioned cloud.google.com/sql/docs/diagnose-issues#limits and cloud.google.com/sql/pricing#v1-pricingMcfarlin
O
5

I read from the documentation and noticed there's a 12 connection / instance limit:

Look for "Each App Engine instance cannot have more than 12 concurrent connections to a Google Cloud SQL instance." in https://developers.google.com/appengine/docs/python/cloud-sql/

Otherwhere answered 1/9, 2014 at 4:2 Comment(2)
Thanks. I didn't see this in the docs back when I posted the question two years+ ago, but maybe it was there and I just missed it. :-D In any case, it helps explain why my project wouldn't scale and had to be abandoned eventually. :-(Theall
So did you solve the problem ? Beyond clearing out old connections were you able to figure out anything else ? kinda ran into that same issue recentlyLamplighter

© 2022 - 2024 — McMap. All rights reserved.