How to ignore duplicate insertion for save operation using jpa repository saveAll?
Asked Answered
O

2

14

I need to be able to use saveAll() method of a spring data CrudRepository and ignore duplicate insertion on a unique constraint.

In my current project I'm currently using spring boot -> jpa stack and I have to process an import of a excel file that ends up into saving multiple entities into a database. I'm using liquibase to track changes on the DB and I'm using spring profiles to manage two different environments (one is using Postgres DB and the other one is using Oracle 19-c). The excel files are uploaded from time to time via an external ftp service. I have a Folder Watch service that waits for a file change in that folder and triggers the import when a new file is created.

The problem is that I have an unique constraint and the file MIGHT have new rows along with the old ones. The processing of the Excel file via POI is pretty heavy and I'm saving the result of the computation in memory via java.util.Set until I reach the point in time where I have to persist the Set with the saveAll() method of the CrudRepository.

I am currently experiencing: ERROR: duplicate key value violates unique constraint "csspd_avoid_duplicates"

org.springframework.transaction.UnexpectedRollbackException: Transaction silently rolled back because it has been marked as rollback-only

and the import stops on the first encountered duplicate.

I had, for the same project, a similar problem with a single insertion in a loop (single save()) and I've managed to workaround the issue with the following snippet:

                try {

                    repository.save(entity);
                } catch (DataIntegrityViolationException e) {
                    this.log.debug("Duplicate found, skipping");
                }

I've tried the same approach with the bulk save without any luck.

This is the interested method:

public boolean triggerExcelImport(Path path) {
    try {

        ExcelDataDTO excelExtractedData;
        if (path == null) {
            excelExtractedData = excelImporterService.importExcelFiles();
        } else {
            excelExtractedData = excelImporterService.importStandardXLSXFiles(path);
        }

        staticProductDataRepository.saveAll(excelExtractedData.getProductData());  // <-- This saveAll might have duplicates

        for (Set<AllocationChartData> allocationList : excelExtractedData.getAllocationMap().values()) {
            allocationChartDataRepository.saveAll(allocationList);   // <-- This saveAll might have duplicates
        }

        performanceRepository.saveAll(excelExtractedData.getPerformanceData());    // <-- This saveAll might have duplicates

        return true;
    } catch (DataIntegrityViolationException e) {
        this.log.debug("Duplicate found, skipping");
    }
    return false;
}

I would like to have the import process to actually finish the import avoiding termination on the first duplicate. The solution has to be database independent (I am using Postgres and Oracle managed with liquibase context)

this is the actual exception I'm having:

        org.springframework.transaction.UnexpectedRollbackException: Transaction silently rolled back because it has been marked as rollback-only
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:755) ~[spring-tx-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:714) ~[spring-tx-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:533) ~[spring-tx-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:304) ~[spring-tx-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98) ~[spring-tx-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688) ~[spring-aop-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at com.juliusbaer.ims.toolbox.services.CenshareService$$EnhancerBySpringCGLIB$$8e7529c1.triggerCenshareExcelImport(<generated>) ~[classes/:?]
        at com.juliusbaer.ims.toolbox.services.FolderWatchService.readAndCleanUpFile(FolderWatchService.java:168) ~[classes/:?]
        at com.juliusbaer.ims.toolbox.services.FolderWatchService.initializeFileImportAtStartup(FolderWatchService.java:99) ~[classes/:?]
        at com.juliusbaer.ims.toolbox.services.FolderWatchService.setUp(FolderWatchService.java:93) ~[classes/:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:363) ~[spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:307) ~[spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:136) ~[spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:414) ~[spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1770) ~[spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:593) ~[spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515) ~[spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320) ~[spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318) [spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199) [spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:849) [spring-beans-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:877) [spring-context-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:549) [spring-context-5.1.6.RELEASE.jar:5.1.6.RELEASE]
        at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:142) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]
        at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:775) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]
        at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:316) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1260) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1248) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]
        at com.juliusbaer.ims.toolbox.Application.main(Application.java:12) [classes/:?]

Thank you for the help.

EDIT: Just to give you a brief more explanation, the unique constraint is not based on a single identifier id, it's a composition of multiple columns (8 in total)

Optimism answered 23/8, 2019 at 8:59 Comment(12)
Why don't you remove the duplicates before saving ?Armandinaarmando
That's a viable option but I would like to avoid remove. Is it possible to have a solution that does "on duplicate do nothing" instead of bulk removing very long lists of items?Optimism
No that's not the the duty of the repository layer. If you are using java.util.Set you could implement equals() based on the unique fieldsMalodorous
I wouldn't try to suppress DB errors, I think that the client code should provide the correct values to insert.Armandinaarmando
I agree with Arnaud and with the first sentence of Simon, but building a filtering logic in the client would imply that other I have to source ALL the information first other I have to access many times the DB for reads, both cases are pretty slow in performancesOptimism
How many rows do you have ? You could make a single query to do that like SELECT id FROM t WHERE id IN (ids_list_you_want_to_insert) and then filter your collection using the returned id. I agree that it's not perfect, but at least you won't have to rely on DB errorArmandinaarmando
Your case works only if the number of elements is below 1000 (oracle in clause limit). In my case there are between 500.000 ~ 1.000.000 elementsOptimism
Do you frequently have to insert ? By the way, if you have 1M elements, you have at least 1M insert query. Adding 1000 select queries (1000x1000 ids) that only manipulate ids is not very costlyArmandinaarmando
Not really, this workload is done twice per month more or less.Optimism
Then adding several thousands of select queries on the id per month is nothing.Armandinaarmando
Hey man. Having the same issue. Any ideas about how you resolved without looping?Bateman
Any finding on this? I am having something similar issue, In my case, I have List<Entity> which is having some records as new and some are there to update, when I use saveAll() method, it giving issue when there are many records. Surprising thing is when there some 1k records it works perfect.Depraved
P
1

Why do not you implement equals() and hashCode() functions for your objects on columns based on which you want uniqueness? Store your objects in a Set and it will automatically remove duplicates following which you can use saveAll() without any exception.

Periostitis answered 5/2, 2023 at 21:6 Comment(0)
B
0

You could use an @Query annotation to define your own custom repo method to accomplish this.

@Query("INSERT IGNORE INTO table (COLUMNS) values (...)")
public List<S> saveAllInsertIgnore(List<> list);

Reference:

https://mcmap.net/q/902540/-is-it-possible-to-apply-save-all-method-as-insert-ignore-method-in-jpa-repository

Botulism answered 8/10, 2022 at 9:26 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.