We had an outage event recently where the application threads got stuck while retrieving connections from c3p0. The configuration set is the following:
Version of c3p0 used: 0.9.1.2
- c3p0.acquireRetryDelay = 10000;
- c3p0.acquireRetryAttempts = 0;
- c3p0.breakAfterAcquireFailure = false;
- c3p0.numHelperThreads = 8;
- c3p0.idleConnectionTestPeriod = 3;
- c3p0.preferredTestQuery = "select 1 from dual";
- c3p0.checkoutTimeout = 3000;
- c3p0.user = "XYZ"; // changed to XYZ while posting
- c3p0.password = "XYZ"; // change to XYZ while posting
During a normal scenario everything works fine and c3p0 has been serving us well. However, during a recent network event (network partitioning - where application hosts could not talk to the database), we saw that applications were indefinitely stuck on trying to get connections from c3p0.
Stacktrace seen in logs:
Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.
at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:106)
at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:65)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:527)
at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:128)
at amazon.identity.connection.WrappedDataSource.getConnectionWithOptionalCredentials(WrappedDataSource.java:42)
at amazon.identity.connection.LoggingDataSource.getConnectionWithOptionalCredentials(LoggingDataSource.java:55)
at amazon.identity.connection.WrappedDataSource.getConnection(WrappedDataSource.java:30)
at amazon.identity.connection.WrappedDataSource.getConnectionWithOptionalCredentials(WrappedDataSource.java:42)
at amazon.identity.connection.ConnectionProfilingDataSource.profileGetConnectionWithOptionalCredentials(ConnectionProfilingDataSource.java:118)
at amazon.identity.connection.ConnectionProfilingDataSource.getConnectionWithOptionalCredentials(ConnectionProfilingDataSource.java:99)
at amazon.identity.connection.WrappedDataSource.getConnection(WrappedDataSource.java:30)
at amazon.identity.connection.CallCountTrackingDataSource.getConnectionWithOptionalCredentials(CallCountTrackingDataSource.java:82)
at amazon.identity.connection.WrappedDataSource.getConnection(WrappedDataSource.java:30)
at com.amazon.jdbc.FailoverDataSource.doGetConnection(FailoverDataSource.java:133)
at com.amazon.jdbc.FailoverDataSource.getConnection(FailoverDataSource.java:109)
at com.amazon.identity.accessmanager.WrappedConnection$1.call(WrappedConnection.java:84)
at com.amazon.identity.accessmanager.WrappedConnection$1.call(WrappedConnection.java:82)
at com.amazon.identity.accessmanager.WrappedConnection.getConnection(WrappedConnection.java:110)
... 40 more
Caused by: com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResourcePool@185e5c6b -- timeout at
awaitAvailable()
at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1317)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:557)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
....... (total of 317 such instances of prelimCheckoutResource):
Some excerpts I pulled up from the c3p0 documentation
When a c3p0 DataSource attempts and fails to acquire a Connection, it will retry up to acquireRetryAttempts times, with a delay of acquireRetryDelay between each attempt. If all attempts fail, any clients waiting for Connections from the DataSource will see an Exception, indicating that a Connection could not be acquired. Note that clients do not see any Exception until a full round of attempts fail, which may be some time after the initial Connection attempt. If acquireRetryAttempts is set to 0, c3p0 will attempt to acquire new Connections indefinitely, and calls to getConnection() may block indefinitely waiting for a successful acquisition.
checkoutTimeout limits how long a client will wait for a Connection, if all Connections are checked out and one cannot be supplied immediately
So here's my theory around why this happened:
The network partitioning existed for several minutes. I am assuming by then, the idle connection tests would have invalidated all active connections in the pool. This means that c3p0 would now be involved in getting new connections. If any application hosts tries to obtain connection from pool, it would have to wait indefinitely until connection has been acquired (see excerpt from the c3p0 docs). Also checkout timeout parameter would not have helped in this case since it enforces timeout only if all connections were checked out (and this was not the case).
My question here is the following:
- Is my understanding of the system correct?
- If yes, should checkoutTimeout (or some other parameter be present) which would timeout such application connection requests rather than hang forever?
- If there any better way to configure c3p0 to get away from facing this issue again. I can try wrapping getting a connection from c3p0 enforcing timeout (thread based timeout), but this is something I want to avoid if its possible to have a better c3p0 configuration or apply a c3p0 patch.
Thanks
"select 1 from dual"
will only work for oracle, unless you create the dual table.. – Rustie