How to persist a lot of entities (JPA)
Asked Answered
F

4

21

I need to process a CSV file and for each record (line) persist an entity. Right now, I do it this way:

while ((line = reader.readNext()) != null) {
    Entity entity = createEntityObject(line);
    entityManager.save(entity);
    i++;
}

where the save(Entity) method is basically just an EntityManager.merge() call. There are about 20,000 entities (lines) in the CSV file. Is this an effective way to do it? It seems to be quite slow. Would it be better to use EntityManager.persist()? Is this solution flawed in any way?

EDIT

It's a lengthy process (over 400s) and I tried both solutions, with persist and merge. Both take approximately the same amount of time to complete (459s vs 443s). The question is if saving the entities one by one like this is optimal. As far as I know, Hibernate (which is my JPA provider) does implement some cache/flush functionality so I shouldn't have to worry about this.

Fulvi answered 13/4, 2011 at 12:39 Comment(0)
B
14

The JPA API doesn't provide you all the options to make this optimal. Depending on how fast you want to do this you are going to have to look for ORM specific options - Hibernate in your case.

Things to check:

  1. Check you are using a single transaction (Yes, apparently you are sure of this)
  2. Check your JPA provider (Hibernate) is using the JDBC batch API (refer: hibernate.jdbc.batch_size)
  3. Check if you can bypass getting generated keys (depends on db/jdbc driver how much benefit you get from this - refer: hibernate.jdbc.use_getGeneratedKeys)
  4. Check if you can bypass cascade logic (only minimal performance benefit from this)

So in Ebean ORM this would be:

    EbeanServer server = Ebean.getServer(null);

    Transaction transaction = server.beginTransaction();
    try {
        // Use JDBC batch API with a batch size of 100
        transaction.setBatchSize(100);
        // Don't bother getting generated keys
        transaction.setBatchGetGeneratedKeys(false);
        // Skip cascading persist 
        transaction.setPersistCascade(false);

        // persist your beans ...
        Iterator<YourEntity> it = null; // obviously should not be null 
        while (it.hasNext()) {
            YourEntity yourEntity = it.next();
            server.save(yourEntity);
        }

        transaction.commit();
    } finally {
        transaction.end();
    }

Oh, and if you do this via raw JDBC you skip the ORM overhead (less object creation / garbage collection etc) - so I wouldn't ignore that option.

So yes, this doesn't answer your question but might help your search for more ORM specific batch insert tweaks.

Boadicea answered 14/4, 2011 at 4:20 Comment(1)
You could check hibernate.jdbc.batch_size and hibernate.jdbc.use_getGeneratedKeys (but not settable per transaction).Boadicea
B
6

I think one common way to do this is with transactions. If you begin a new transaction and then persist a large number of objects, they won't actually be inserted into the DB until you commit the transaction. This can gain you some efficiencies if you have a large number of items to commit.

Check out EntityManager.getTransaction

Brave answered 13/4, 2011 at 13:21 Comment(2)
It does run in a transaction (using Spring's @Transactional).Fulvi
You could try removing the annotation and see if the performance changes. You could also confirm that it is using one fell swoop by setting a break point and after some number of perist calls have run check the database to confirm that the rows aren't inserted yet. It may be that spring commits after 10 or 100 or so calls and there is some tweaking you can do to alter the performance.Brave
C
5

To make it go faster, at least in Hibernate, you would do a flush() and a clear() after a certain number of inserts. I have done this approach for millions of records and it works. It's still slow, but it's much faster than not doing it. The basic structure is like this:

int i = 0;
for(MyThingy thingy : lotsOfThingies) {

    dao.save(thingy.toModel())

    if(++i % 20 == 0) {
        dao.flushAndClear();
    }

}
Caution answered 13/4, 2011 at 13:29 Comment(0)
O
4

You can write them with a classical SQL Insert Statement direct into the database.

@see EntityManager.createNativeQuery

Ofelia answered 13/4, 2011 at 12:53 Comment(1)
In this particular case native queries won't provide much speed up. All you can do is just group them up with batching, which you can do on JPA provider level or JDBC driver level. However in my particular case I can use INSERT INTO ... SELECT FROM ... combo which would be a huge speed up, so have my +1.Ostracize

© 2022 - 2024 — McMap. All rights reserved.