Spring transaction routing
First, we will create a DataSourceType
Java Enum that defines our transaction routing options:
public enum DataSourceType {
READ_WRITE,
READ_ONLY
}
To route the read-write transactions to the Primary node and read-only transactions to the Replica node, we can define a ReadWriteDataSource
that connects to the Primary node and a ReadOnlyDataSource
that connect to the Replica node.
The read-write and read-only transaction routing is done by the Spring AbstractRoutingDataSource
abstraction, which is implemented by the TransactionRoutingDatasource
, as illustrated by the following diagram:
The TransactionRoutingDataSource
is very easy to implement and looks as follows:
public class TransactionRoutingDataSource
extends AbstractRoutingDataSource {
@Nullable
@Override
protected Object determineCurrentLookupKey() {
return TransactionSynchronizationManager
.isCurrentTransactionReadOnly() ?
DataSourceType.READ_ONLY :
DataSourceType.READ_WRITE;
}
}
Basically, we inspect the Spring TransactionSynchronizationManager
class that stores the current transactional context to check whether the currently running Spring transaction is read-only or not.
The determineCurrentLookupKey
method returns the discriminator value that will be used to choose either the read-write or the read-only JDBC DataSource
.
Spring read-write and read-only JDBC DataSource configuration
The DataSource
configuration looks as follows:
@Configuration
@ComponentScan(
basePackages = "com.vladmihalcea.book.hpjp.util.spring.routing"
)
@PropertySource(
"/META-INF/jdbc-postgresql-replication.properties"
)
public class TransactionRoutingConfiguration
extends AbstractJPAConfiguration {
@Value("${jdbc.url.primary}")
private String primaryUrl;
@Value("${jdbc.url.replica}")
private String replicaUrl;
@Value("${jdbc.username}")
private String username;
@Value("${jdbc.password}")
private String password;
@Bean
public DataSource readWriteDataSource() {
PGSimpleDataSource dataSource = new PGSimpleDataSource();
dataSource.setURL(primaryUrl);
dataSource.setUser(username);
dataSource.setPassword(password);
return connectionPoolDataSource(dataSource);
}
@Bean
public DataSource readOnlyDataSource() {
PGSimpleDataSource dataSource = new PGSimpleDataSource();
dataSource.setURL(replicaUrl);
dataSource.setUser(username);
dataSource.setPassword(password);
return connectionPoolDataSource(dataSource);
}
@Bean
public TransactionRoutingDataSource actualDataSource() {
TransactionRoutingDataSource routingDataSource =
new TransactionRoutingDataSource();
Map<Object, Object> dataSourceMap = new HashMap<>();
dataSourceMap.put(
DataSourceType.READ_WRITE,
readWriteDataSource()
);
dataSourceMap.put(
DataSourceType.READ_ONLY,
readOnlyDataSource()
);
routingDataSource.setTargetDataSources(dataSourceMap);
return routingDataSource;
}
@Override
protected Properties additionalProperties() {
Properties properties = super.additionalProperties();
properties.setProperty(
"hibernate.connection.provider_disables_autocommit",
Boolean.TRUE.toString()
);
return properties;
}
@Override
protected String[] packagesToScan() {
return new String[]{
"com.vladmihalcea.book.hpjp.hibernate.transaction.forum"
};
}
@Override
protected String databaseType() {
return Database.POSTGRESQL.name().toLowerCase();
}
protected HikariConfig hikariConfig(
DataSource dataSource) {
HikariConfig hikariConfig = new HikariConfig();
int cpuCores = Runtime.getRuntime().availableProcessors();
hikariConfig.setMaximumPoolSize(cpuCores * 4);
hikariConfig.setDataSource(dataSource);
hikariConfig.setAutoCommit(false);
return hikariConfig;
}
protected HikariDataSource connectionPoolDataSource(
DataSource dataSource) {
return new HikariDataSource(hikariConfig(dataSource));
}
}
The /META-INF/jdbc-postgresql-replication.properties
resource file provides the configuration for the read-write and read-only JDBC DataSource
components:
hibernate.dialect=org.hibernate.dialect.PostgreSQL10Dialect
jdbc.url.primary=jdbc:postgresql://localhost:5432/high_performance_java_persistence
jdbc.url.replica=jdbc:postgresql://localhost:5432/high_performance_java_persistence_replica
jdbc.username=postgres
jdbc.password=admin
The jdbc.url.primary
property defines the URL of the Primary node while the jdbc.url.replica
defines the URL of the Replica node.
The readWriteDataSource
Spring component defines the read-write JDBC DataSource
while the readOnlyDataSource
component define the read-only JDBC DataSource
.
Note that both the read-write and read-only data sources use HikariCP for connection pooling.
The actualDataSource
acts as a facade for the read-write and read-only data sources and is implemented using the TransactionRoutingDataSource
utility.
The readWriteDataSource
is registered using the DataSourceType.READ_WRITE
key and the readOnlyDataSource
using the DataSourceType.READ_ONLY
key.
So, when executing a read-write @Transactional
method, the readWriteDataSource
will be used while when executing a @Transactional(readOnly = true)
method, the readOnlyDataSource
will be used instead.
Note that the additionalProperties
method defines the hibernate.connection.provider_disables_autocommit
Hibernate property, which I added to Hibernate to postpone the database acquisition for RESOURCE_LOCAL JPA transactions.
Not only that the hibernate.connection.provider_disables_autocommit
allows you to make better use of database connections, but it's the only way we can make this example work since, without this configuration, the connection is acquired prior to calling the determineCurrentLookupKey
method TransactionRoutingDataSource
.
The remaining Spring components needed for building the JPA EntityManagerFactory
are defined by the AbstractJPAConfiguration
base class.
Basically, the actualDataSource
is further wrapped by DataSource-Proxy and provided to the JPA EntityManagerFactory
. You can check the source code on GitHub for more details.
Testing time
To check if the transaction routing works, we are going to enable the PostgreSQL query log by setting the following properties in the postgresql.conf
configuration file:
log_min_duration_statement = 0
log_line_prefix = '[%d] '
The log_min_duration_statement
property setting is for logging all PostgreSQL statements while the second one adds the database name to the SQL log.
So, when calling the newPost
and findAllPostsByTitle
methods, like this:
Post post = forumService.newPost(
"High-Performance Java Persistence",
"JDBC", "JPA", "Hibernate"
);
List<Post> posts = forumService.findAllPostsByTitle(
"High-Performance Java Persistence"
);
We can see that PostgreSQL logs the following messages:
[high_performance_java_persistence] LOG: execute <unnamed>:
BEGIN
[high_performance_java_persistence] DETAIL:
parameters: $1 = 'JDBC', $2 = 'JPA', $3 = 'Hibernate'
[high_performance_java_persistence] LOG: execute <unnamed>:
select tag0_.id as id1_4_, tag0_.name as name2_4_
from tag tag0_ where tag0_.name in ($1 , $2 , $3)
[high_performance_java_persistence] LOG: execute <unnamed>:
select nextval ('hibernate_sequence')
[high_performance_java_persistence] DETAIL:
parameters: $1 = 'High-Performance Java Persistence', $2 = '4'
[high_performance_java_persistence] LOG: execute <unnamed>:
insert into post (title, id) values ($1, $2)
[high_performance_java_persistence] DETAIL:
parameters: $1 = '4', $2 = '1'
[high_performance_java_persistence] LOG: execute <unnamed>:
insert into post_tag (post_id, tag_id) values ($1, $2)
[high_performance_java_persistence] DETAIL:
parameters: $1 = '4', $2 = '2'
[high_performance_java_persistence] LOG: execute <unnamed>:
insert into post_tag (post_id, tag_id) values ($1, $2)
[high_performance_java_persistence] DETAIL:
parameters: $1 = '4', $2 = '3'
[high_performance_java_persistence] LOG: execute <unnamed>:
insert into post_tag (post_id, tag_id) values ($1, $2)
[high_performance_java_persistence] LOG: execute S_3:
COMMIT
[high_performance_java_persistence_replica] LOG: execute <unnamed>:
BEGIN
[high_performance_java_persistence_replica] DETAIL:
parameters: $1 = 'High-Performance Java Persistence'
[high_performance_java_persistence_replica] LOG: execute <unnamed>:
select post0_.id as id1_0_, post0_.title as title2_0_
from post post0_ where post0_.title=$1
[high_performance_java_persistence_replica] LOG: execute S_1:
COMMIT
The log statements using the high_performance_java_persistence
prefix were executed on the Primary node while the ones using the high_performance_java_persistence_replica
on the Replica node.
GitHub Repository
This is not just theory. It's all on GitHub and works like a charm. Use this test case as a reference.
So you can you use it a starting point for your transaction routing solution, as you have a fully-functional example.
Second-level caching
Once you are using replication, you are operating in a distributed environment, so you need to use a distributed caching solution, like Infinispan.
Since we are using replication to distribute traffic to more database nodes, it's obvious that we also have multiple application nodes which have to connect to those database nodes.
Therefore, using the READ_WRITE
CacheConcurrencyStrategy
in such an environment is a terrible anti-pattern as each distributed node will keep its own copy of the cached entries, leading you to consistency issues even if you didn't use transaction routing.
Not to mention the cold cache issue you'd face if you employed auto-scaling for your application nodes, as they would amplify the database traffic because new nodes would start with a cold cache.
So, if you plan to use transaction routing with the second-level cache mechanism, then you can do better than this.
Use the NONSTRICT_READ_WRITE
cache concurrency strategy with a second-level caching provider that can store the cached data in a distributed system of nodes that are readily available even when you create new application nodes.
Conclusion
You need to make sure you set the right size for your connection pools because that can make a huge difference. For this, I recommend using Flexy Pool.
You need to be very diligent and make sure you mark all read-only transactions accordingly. It's unusual that only 10% of your transactions are read-only. Could it be that you have such a write-most application or you are using write transactions where you only issue query statements?
For batch processing, you definitely need read-write transactions, so make sure you enable JDBC batching, like this:
<property name="hibernate.order_updates" value="true"/>
<property name="hibernate.order_inserts" value="true"/>
<property name="hibernate.jdbc.batch_size" value="25"/>
For batching you can also use a separate DataSource
that uses a different connection pool that connects to the Primary node.
Just make sure your total connection size of all connection pools is less than the number of connections PostgreSQL has been configured with.
Each batch job must use a dedicated transaction, so make sure you use a reasonable batch size.
More, you want to hold locks and to finish transactions as fast as possible. If the batch processor is using concurrent processing workers, make sure the associated connection pool size is equal to the number of workers, so they don't wait for others to release connections.