In these kind of situations, it is probably best to test and measure.
I do not have 'hands-on' experience with the PostrgeSQL hot standby mode, but I have done database fail-over for a Java application.
First, test the claims on the PostgreSQL driver documentation page
about the ?targetServerType=master
parameter (mentioned at the bottom of page).
Write a small Java "PgHsm" class with a main-method that uses the PostgreSQL JDBC driver via DriverManager.getConnection
and runs a simple update query.
It should use server A to do the update query. Stop PostgreSQL on server A, run PgHsm: it should fail to connect since server B is not a master.
Make server B the master, run PgHsm: it should run OK.
The datasource is backed by a database connection pool in TomEE. This page lists the ones available in TomEE.
But not all database connection pools are equal and I now prefer HikariCP because, in my experience, it handles the "database down" scenario more predictably.
See also the test with results on HikariCP's handling database down page.
Unfortunately, HikariCP uses JDBC's get/setNetworkTimeout
to behave predictably
and the PostgreSQL JDBC driver does not implement this (*).
So to be sure that (JavaEE) application threads do not hang forever on a database action, you need to set the connectTimeout
and socketTimeout
JDBC driver options. Setting a socketTimeout
is precarious as it automatically sets a time-limit for ALL queries to the database.
(*) Update: since version 42.2.x
network timeouts are implemented.
The second test to perform involves updating the Java "PgHsm" class to use the database connection pool implementation of your choosing
and start (at least) two threads that continually run simple update queries in a loop (in the loop a database connection is acquired from the pool and returned to the pool after commit/rollback).
While you bring down server A and switch server B to "master" mode, monitor the exceptions logged by "PgHsm" and how long a thread waits/hangs on performing a database action.
The results from the tests can be used to update the JDBC driver options and pool settings. Focus on results where:
- invalid connections are removed from the pool as soon as possible so that the application gets mostly valid connections from the pool
- as few as possible application threads hang (for the shortest amount of time) when a database goes down
The second test relies on server A not being available so that connection test queries (performed by the database connection pool) fail.
In the case where both servers remain available, but master and slave switch, a connection test query will not help
and the database connection pool will provide the wrong (now read-only) database connections to the application.
In that case, manual intervention is required. A "fail-over pattern" for HikariCP is described here
(only available with option allowPoolSuspension
described on the configuration page):
- suspendPool()
- softEvictConnections()
- Wait until activeConnections goes to 0.
- resumePool()
The third test will be with the JavaEE application and by now, you should have a good idea what problems to expect.
It is not uncommon for applications to get updated after these kind of tests to improve handling "database down" scenarios
(e.g. setting (default) query-timeouts).
In your case, a "suspend, flush and resume database connection pool" feature (the pattern described above) to use during the manual failover would also be desirable.