I have some code that performs an UPSERT, also known as a Merge. I want to clean-up this code, specifically, I want to move away from exception handling, and reduce overall verbosity and sheer complexity of the code for such a simple operation. The requirement is to insert each item unless it already exists:
public void batchInsert(IncomingItem[] items) {
try(Session session = sessionFactory.openSession()) {
batchInsert(session, items);
}
catch(PersistenceException e) {
if(e.getCause() instanceof ConstraintViolationException) {
logger.warn("attempting to recover from constraint violation");
DateTimeFormatter dbFormat = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");
items = Arrays.stream(items).filter(item -> {
int n = db.queryForObject("select count(*) from rets where source = ? and systemid = ? and updtdate = ?::timestamp",
Integer.class,
item.getSource().name(), item.getSystemID(),
dbFormat.format(item.getUpdtDateObj()));
if(n != 0) {
logger.warn("REMOVED DUPLICATE: " +
item.getSource() + " " + item.getSystemID() + " " + item.getUpdtDate());
return false;
}
else {
return true; // keep
}
}).toArray(IncomingItem[]::new);
try(Session session = sessionFactory.openSession()) {
batchInsert(session, items);
}
}
}
}
An initial search of SO is unsatisfactory:
- Hibernate Idempotent Update - conceptually similar but much simpler scenario with no regard for multi-threading or multi-processing.
- Can Hibernate work with MySQL's "ON DUPLICATE KEY UPDATE" syntax? much better, removes the race condition by pushing atomicity to the database using
@SQLInsert
annotation; unfortunately, this solution is too error-prone to use on wider tables, and maintenance-intensive in evolving applications. - How to mimic upsert behavior using Hibernate? very similar to the above question, with a similar answer
- Hibernate + "ON DUPLICATE KEY" logic same as above, answer mentions
merge()
which is ok when single-threaded - Bulk insert or update with Hibernate? similar question but the chosen answer is off-the-rails, using stored procedures
- Best way to prevent unique constraint violations with JPA again very naive, single-thread-oriented question and answers
In the question How to do ON DUPLICATE KEY UPDATE in Spring Data JPA? which was marked as a duplicate, I noticed this intriguing comment:
That was a dead-end as I really don't understand the comment, despite it sounding like a clever solution, and mention of "actual same SQL statement".
Another promising approach is this: Hibernate and Spring modify query Before Submitting to DB
ON CONFLICT DO NOTHING / ON DUPLICATE KEY UPDATE
Both of the major open-source databases support a mechanism to push idempotency down to the database. The examples below use the PostgreSQL syntax, but can be easily adapted for MySQL.
By following the ideas in Hibernate and Spring modify query Before Submitting to DB, Hooking into Hibernate's query generation, and How I can configure StatementInspector in Hibernate?, I implemented:
import org.hibernate.resource.jdbc.spi.StatementInspector;
@SuppressWarnings("serial")
public class IdempotentInspector implements StatementInspector {
@Override
public String inspect(String sql) {
if(sql.startsWith("insert into rets")) {
sql += " ON CONFLICT DO NOTHING";
}
return sql;
}
}
with property
<prop key="hibernate.session_factory.statement_inspector">com.myapp.IdempotentInspector</prop>
Unfortunately this leads to the following error when a duplicate is encountered:
Caused by: org.springframework.orm.hibernate5.HibernateOptimisticLockingFailureException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1; nested exception is org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
Which makes sense, if you think about what's going on under the covers: the ON CONFLICT DO NOTHING
causes zero rows to be inserted, but one insert is expected.
Is there a solution that enables thread-safe exception-free concurrent idempotent inserts and doesn't require manually defining the entire SQL insert statement to be executed by Hibernate?
For what it's worth, I feel that the approaches that push the dupcheck down to the database are the path to a proper solution.
CLARIFICATION
The IncomingItem
objects consumed by the batchInsert
method originate from a system where records are immutable. Under this special condition the ON CONFLICT DO NOTHING
behaves the same as an UPSERT, notwithstanding possible loss of the Nth update.
INSERT
? That way I can applysql += " ON CONFLICT DO NOTHING"
to it and pass it on to the native query API. – Neediness