How to do bulk (multi row) inserts with JpaRepository?
Asked Answered
J

6

114

When calling the saveAll method of my JpaRepository with a long List<Entity> from the service layer, trace logging of Hibernate shows single SQL statements being issued per entity.

Can I force it to do a bulk insert (i.e. multi-row) without needing to manually fiddle with EntityManger, transactions etc. or even raw SQL statement strings?

With multi-row insert I mean not just transitioning from:

start transaction
INSERT INTO table VALUES (1, 2)
end transaction
start transaction
INSERT INTO table VALUES (3, 4)
end transaction
start transaction
INSERT INTO table VALUES (5, 6)
end transaction

to:

start transaction
INSERT INTO table VALUES (1, 2)
INSERT INTO table VALUES (3, 4)
INSERT INTO table VALUES (5, 6)
end transaction

but instead to:

start transaction
INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
end transaction

In PROD I'm using CockroachDB, and the difference in performance is significant.

Below is a minimal example that reproduces the problem (H2 for simplicity).


./src/main/kotlin/ThingService.kt:

package things

import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.runApplication
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.data.jpa.repository.JpaRepository
import javax.persistence.Entity
import javax.persistence.Id
import javax.persistence.GeneratedValue

interface ThingRepository : JpaRepository<Thing, Long> {
}

@RestController
class ThingController(private val repository: ThingRepository) {
    @GetMapping("/test_trigger")
    fun trigger() {
        val things: MutableList<Thing> = mutableListOf()
        for (i in 3000..3013) {
            things.add(Thing(i))
        }
        repository.saveAll(things)
    }
}

@Entity
data class Thing (
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
)

@SpringBootApplication
class Application {
}

fun main(args: Array<String>) {
    runApplication<Application>(*args)
}

./src/main/resources/application.properties:

jdbc.driverClassName = org.h2.Driver
jdbc.url = jdbc:h2:mem:db
jdbc.username = sa
jdbc.password = sa

hibernate.dialect=org.hibernate.dialect.H2Dialect
hibernate.hbm2ddl.auto=create

spring.jpa.generate-ddl = true
spring.jpa.show-sql = true

spring.jpa.properties.hibernate.jdbc.batch_size = 10
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true

./build.gradle.kts:

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    val kotlinVersion = "1.2.30"
    id("org.springframework.boot") version "2.0.2.RELEASE"
    id("org.jetbrains.kotlin.jvm") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.spring") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.jpa") version kotlinVersion
    id("io.spring.dependency-management") version "1.0.5.RELEASE"
}

version = "1.0.0-SNAPSHOT"

tasks.withType<KotlinCompile> {
    kotlinOptions {
        jvmTarget = "1.8"
        freeCompilerArgs = listOf("-Xjsr305=strict")
    }
}

repositories {
    mavenCentral()
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-data-jpa")
    compile("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
    compile("org.jetbrains.kotlin:kotlin-reflect")
    compile("org.hibernate:hibernate-core")
    compile("com.h2database:h2")
}

Run:

./gradlew bootRun

Trigger DB INSERTs:

curl http://localhost:8080/test_trigger

Log output:

Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Jarrod answered 9/6, 2018 at 8:11 Comment(7)
Please check my answer, hope it will be helpful: https://mcmap.net/q/189925/-bulk-insert-with-spring-boot-and-spring-data-jpa-not-workingAdaadabel
@Adaadabel Thanks, but I'm already doing this (accumulating in a list and calling saveAll. I just added a minimal code example to reproduce the problem.Jarrod
Did you set hibernate.jdbc.batch_size property?Adaadabel
@Adaadabel Yes. (see above)Jarrod
It's incorrect, it must be in this form: spring.jpa.properties.hibernate.jdbc.batch_sizeAdaadabel
@Adaadabel Thanks, rieckpil already mentioned that and I adjusted my code accordingly. However it is still not batching.Jarrod
What you are showing is batch insert. Bulk insert is a much faster technique, but it is specific to the database and it is not supported by JPA.Haye
A
165

To get a bulk insert with Spring Boot and Spring Data JPA you need only two things:

  1. set the option spring.jpa.properties.hibernate.jdbc.batch_size to appropriate value you need (for example: 20).

  2. use saveAll() method of your repo with the list of entities prepared for inserting.

Working example is here.

Regarding the transformation of the insert statement into something like this:

INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)

the such is available in PostgreSQL: you can set the option reWriteBatchedInserts to true in jdbc connection string:

jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true

then jdbc driver will do this transformation.

Additional info about batching you can find here.

UPDATED

Demo project in Kotlin: sb-kotlin-batch-insert-demo

UPDATED

Hibernate disables insert batching at the JDBC level transparently if you use an IDENTITY identifier generator.

Adaadabel answered 15/6, 2018 at 21:36 Comment(11)
Thanks. I'm trying to get your Kotlin demo running, but did not yet succeed. I do git clone https://github.com/Cepr0/sb-kotlin-batch-insert-demo, cd sb-kotlin-batch-insert-demo and mvn package but then end up with the following error: gist.github.com/Dobiasd/7f1163110b52876f171d43e17af0853cJarrod
@Cepr0, I just tried your program with mySql db but it does not work as expected. Is there anything to do with driver. Here is a property that I am using, ``` spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.MySQL5Dialect ```Mathian
@ShaunakPatel What exactly doesn't work and in which program, java or kotlin?Adaadabel
@Adaadabel in Java. I see only one difference compared to your program. 1) Database (I am using MySQL). Mean, I am running your code against MySQLMathian
Is there one way to intercept or listen to list of saveAll(List..) method? –Ahab
What to use if I have no saveAll()? Spring Boot 1.5.1.RELEASE.Baritone
Batching with strong and JPA medium.com/@clydecroix/…Oday
Is it possible to also use a ON CONFLICT (I'll like to ignore them in my situation). My constraint is set for a unique fields combinationJerriejerrilee
Thanks this worked.. Was trying for few hours..!!!Suprematism
@Adaadabel Regarding the transformation of the insert statement into something like this... If I understand correctly, I can't get the benefit of batching my insert/update statements if I'm not using PostgreSQL with reWriteBatchedInserts=true. Is this statement correct? Is there another form of batching that doesn't include writing the values in a comma-delimited fashion?Manisa
Is reWriteBatchedInserts available for SQL server?Manchukuo
R
20

The underlying issues is the following code in SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

In addition to the batch size property settings, you have to make sure that the class SimpleJpaRepository calls persist and not merge. There are a few approaches to resolve this: use an @Id generator that does not query sequence, like

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

Or forcing the persistence to treat the records as new by having your entity implement Persistable and overriding the isNew() call

@Entity
class Thing implements Pesistable<Long> {
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
    @Transient
    private boolean isNew = true;
    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }
    @Override
    boolean isNew() {
        return isNew;
    }
}

Or override the save(List) and use the entity manager to call persist()

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
    private EntityManager entityManager;
    public ThingRepository(EntityManager entityManager) {
        super(Thing.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Thing> save(List<Thing> things) {
        things.forEach(thing -> entityManager.persist(thing));
        return things;
    }
}

The above code is based on the following links:

Requiem answered 16/6, 2018 at 18:6 Comment(4)
Thanks Jean for sharing useful links. But there is still an issue with persisting the @Generated @Id values using the Persistable method. The batch is only executed when I manually set the id field by my own logic. If I rely on @Generated for my Long id property, then the statements do no run in batches. All the links shared by you don't use @Generated type strategy with Persistable method. I even checked the Github code link that is provided in the 2nd link, but it also is assigning the id property manually.Roanne
I think this reply wasn't really understood (and appreciated enough). I found out the same problem with saveAll myself. So to rephrase the issue: if you HAVE working batching, your entity does NOT use generated ID, and you use SimpleJpaRepository with saveAll, then: 1.saveAll will use save in loop 2. save will call entityInformation.isNew(entity) getting response false for every call. 3. will call merge for every entity. 4. IIUC these merges calls select first, and those cannot be batched so this will create N+1 issue, due to incorrect saveAll implementation.Calamitous
Batching with spring and JPA medium.com/@clydecroix/…Oday
hi, could you please check my question relate to many to many problem #77257777Ungovernable
N
9

You can configure Hibernate to do bulk DML. Have a look at Spring Data JPA - concurrent Bulk inserts/updates. I think section 2 of the answer could solve your problem:

Enable the batching of DML statements Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.

Quoting from batch INSERT and UPDATE statements:

hibernate.jdbc.batch_size = 50

hibernate.order_inserts = true

hibernate.order_updates = true

hibernate.jdbc.batch_versioned_data = true

UPDATE: You have to set the hibernate properties differently in your application.properties file. They are under the namespace: spring.jpa.properties.*. An example could look like the following:

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....
Nutrition answered 9/6, 2018 at 8:15 Comment(6)
Thanks for the suggestion. I tried it out, but it did not work. I've added a minimal code example to my question to reproduce the problem, even with your settings.Jarrod
Thanks, I adjusted my configuration (and updated my question accordingly), but still no luck.Jarrod
have you tried it with a different database or is your H2 a requirement? @TobiasHermann I would suggest trying it with a MySQL database next. Not all database driver implement the JDBC batch insert/update properlyNutrition
I tried using CockroachDB 2.0.2. It supports multi-row inserts and is about 10 times faster when I manually create the needed java.sql.PreparedStatement in my application and send it out using the raw java.sql.Connection of the javax.sql.DataSource.Jarrod
What this mean spring.jpa.properties.hibernate.order_inserts?Transfer
hi, could you please check my question relate to many to many problem #77257777Ungovernable
S
4

Hibernate uses transaction-write-behind strategy to automatically perform batch insert, update or delete.

But only setting the property spring.jpa.properties.hibernate.jdbc.batch_size=100 will not work alone. we also have to set the ID generator as @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seq_generator")

If we use GenerationType.AUTO or GenerationType.IDENTITY on entity in this case batch insert, update will not work. Because in this case hibernate don't know about the Id value that to be inserted, because it is generated on DB level, so it disable the batch insertion and individual insert takes place.

So for using batch insert, update our entity should have Id generator as Sequence.

Stewardson answered 29/3, 2023 at 13:13 Comment(0)
C
3

All mentioned methods work but will be slow especially if the source for inserted data lies in some other table. Firstly, even with batch_size>1 the insert operation will be executed in multiple SQL queries. Secondly, if the source data lies in the other table you need to fetch the data with other queries (and in the worst case scenario load all data into memory), and convert it to static bulk inserts. Thirdly, with separate persist() call for each entity (even if batch is enabled) you will bloat entity manager first level cache with all these entity instances.

But there's another option for Hibernate. If you use Hibernate as a JPA provider you can fallback to HQL which supports bulk inserts natively with subselect from another table. The example:

Session session = entityManager.unwrap(Session::class.java)
session.createQuery("insert into Entity (field1, field2) select [...] from [...]")
  .executeUpdate();

Whether this will work depends on your ID generation strategy. If the Entity.id is generated by the database (for example MySQL auto increment), it will be executed successfully. If the Entity.id is generated by your code (especially true for UUID generators), it will fail with "unsupported id generation method" exception.

However, in the latter scenario this problem can be overcome by custom SQL function. For example in PostgreSQL I use uuid-ossp extension which provides uuid_generate_v4() function, which I finally register in my custom dialog:

import org.hibernate.dialect.PostgreSQL10Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.PostgresUUIDType;

public class MyPostgresDialect extends PostgreSQL10Dialect {

    public MyPostgresDialect() {
        registerFunction( "uuid_generate_v4", 
            new StandardSQLFunction("uuid_generate_v4", PostgresUUIDType.INSTANCE));
    }
}

And then I register this class as a hibernate dialog:

hibernate.dialect=MyPostgresDialect

Finally I can use this function in bulk insert query:

SessionImpl session = entityManager.unwrap(Session::class.java);
session.createQuery("insert into Entity (id, field1, field2) "+
  "select uuid_generate_v4(), [...] from [...]")
  .executeUpdate();

The most important is the underlying SQL generated by Hibernate to accomplish this operation and it's just a single query:

insert into entity ( id, [...] ) select uuid_generate_v4(), [...] from [...]
Claudine answered 11/8, 2020 at 11:34 Comment(1)
hi, could you please check my question relate to many to many problem #77257777Ungovernable
P
2

I faced the same issue but I couldn't see my hibernate queries in batch, I realized that query doesn't translate to what was really querying. But to be sure that is bulking you can enable to generate statistics spring.jpa.properties.hibernate.generate_statistics=true then you will see:

enter image description here

when you add the spring.jpa.properties.hibernate.jdbc.batch_size=100 you will start to see some differences, like less jdbc statements and more jdbc batches:

enter image description here

Pinkney answered 25/11, 2022 at 14:14 Comment(3)
I've spent quite some time figuring out where my configuration issue is before checking the hibernate statistics when first running into this issue. Thanks for letting me know that I was not the only one not going beyond hibernate initial logs.Counterpunch
I have been there @void, thanks for the feedbackPinkney
hi, could you please check my question relate to many to many problem #77257777Ungovernable

© 2022 - 2024 — McMap. All rights reserved.