Hibernate: Why should I manually flush() even if I set batch_size in configuration file?
Asked Answered
E

3

6

I'm learning to use java's hibernate 5.2.10. I started with a few tutorials online but faced the following question.

When using batching, all the tutorials I have seen first set the hibernate.jdbc.batch_size in the configuration file. After that the code is similar to this:

Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<1000000; i++ ) 
{
    Student student = new Student(.....);
    session.save(employee);
    if( i % 50 == 0 ) // Same as the JDBC batch size
    { 
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();

Why should I be doing flush() and clear() manually? Isn't this something that should be done automatically by hibernate since I have already set hibernate.jdbc.batch_size in the configuration file?

For me it seems like I'm batching my operations manually, so why do I have to set the value of hibernate.jdbc.batch_size then?

Enquire answered 10/7, 2017 at 8:58 Comment(0)
S
6

The use specifying a JDBC batch_size value in the configuration versus manually controlling the flush/clear of the persistence context are two independent strategies and serve very different purposes.

The primary goal for using flush() paired with clear() is to minimize the memory consumption on the java application side used by the PersistenceContext as you save your student records. It's important to remember that when you're using a stateful Session as your example illustrates, Hibernate maintains an attached/managed copy of the entity in memory and so it's important to clear and flush this to the database at regular intervals to avoid running out of memory or impacting performance.

The JDBC batch_size setting itself influences how frequent the actual driver flushes statements to the database in order to improve performance. Let's take a slightly modified example:

Session session = sessionFactory.openSession();
try {
  session.getTransaction().begin();
  for ( int i = 0; i < 10000; ++i ) {
    Student student = new Student();
    ...        
    session.save( student );
  }
  session.getTransaction().commit();
}
catch( Throwable t ) {
  if ( session.getTransaction().getStatus() == TransactionStatus.ACTIVE ) {
    session.getTransaction().rollback();
  }
  throw t;
}
finally {
  session.close();
}

As you can see, we're not using flush() or clear() here.

What happens here is that as Hibernate performs the flush at commit time, the driver will send batch_size number of inserts to the database in bulk rather than each one individually. So rather than 10,000 network packets being sent, if batch_size were 250 it would only send 40 packets.

Now what is important to recognize is there are factors that can disable batching such as using identity based identifiers like IDENTITY or AUTO_INCREMENT. Why?

That is because in order for Hibernate to store the entity in the PersistenceContext, it must know the entity's ID and the only way to obtain that value when using IDENTITY based identifier generation is to actually query the database for the value after each insert operation. Therefore, inserts cannot be batched.

This is precisely why people doing bulk insert operations often observe poor performance because they don't realize the impact the identifier generation strategy they pick can have.

It's best to use some type of cached sequence generator or some manually application assigned identifier instead when you want to optimize batch loading.

Now going back to your example using flush() and clear(), the same problems hold true with identifier generation strategy. If you want those operations to be bulk/batch sent to the database, be mindful of the identifier strategy you're using for Student.

Shanel answered 10/7, 2017 at 17:1 Comment(0)
E
1
  //flush a batch of inserts and release memory:
    session.flush();
    session.clear();

you should call flush() method for force generate sql queries and execute them. If you don't call flush() manually, if called by hibernate and commit transaction time.

you should call clear() method for deletion information about entities from persistence context to avoid OutOffMemeoryException, as you might have a batch with huge amount of entities and them might consume a lot of memory.

You should control batch operation manually as not for all hibernate's operation you need batch mode.

"Why should I be doing flush() and clear() manually? Isn't this something that should be done automatically by hibernate since " -- mainly, hibernate does it for at commit time. Methods flush() and clear() are independent of using batch_size, you can call them despite do you have batch mode or not.

You might have a case when inside on dao method you call N times flush() - when you need synchronization between entity and db level, and call flush() - when you don't work with entity anymore, and want to clean session.

From your example, you have 1000000 elements. Without calling flush and clear you keep information in first level cache for all 1000000 elements. You add one by one new entity into session context every new iteration in cycle, but you don't need this information after batch is ready / prepared, that's why you should call flush, clear - to delete information that you don't need anymore.

Eyde answered 10/7, 2017 at 9:16 Comment(3)
I understand all of this. What I can't understand is why should I set the value of hibernate.jdbc.batch_size in my configuration file? What is it used for if I'm doing all the batching manually? I mean what if I didn't set this value? Would anything change?Enquire
How am I exactly giving a mark for hibernate to execute this batch? I'm doing flush() and clear() manually every time I have saved batch_size number of elements, I don't see anywhere in my code that a batch is created or handled except for the fact that batches are being created because of my if statement and my flush() and clear() calls inside it. It's still pretty strange to me why should this be done. I mean what if I didn't set this value on hibernate's configuration file and my code stays the same? would anything change at all?Enquire
I saw your update but it seems like you didn't understand my question. I understand that without using flush() and clear() I would get OutOfMemory error, but that's not my question. My question is why should I set the value of batch_size in my configuration file? What is the use of it? You mentioned that its used for commit time. Does this mean that using flush() and clear() is independent from batching? So I use flush() and clear() to prevent OutOfMemory error? and I use batching with batch_size to commit my changes more efficiently in the form of batches?Enquire
F
0

Answer to your questions you asked in the description, as I have studied it, flush()-ing the batch/transaction is different than commit()-ing the transaction.

You are flushing the transactions after every 50 chunk, means you are synchronizing the transactions to the database as batch of 50. The chunk of 50 has been synchronized with the db but yet to commit.
But when you define the batch-size in configuration file, you are telling the Hibernate to commit the batch of 40 (assume you have set the batch size 40 in conf file.)

Fleshly answered 8/5, 2019 at 20:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.