The use specifying a JDBC batch_size value in the configuration versus manually controlling the flush/clear of the persistence context are two independent strategies and serve very different purposes.
The primary goal for using flush()
paired with clear()
is to minimize the memory consumption on the java application side used by the PersistenceContext as you save your student records. It's important to remember that when you're using a stateful Session
as your example illustrates, Hibernate maintains an attached/managed copy of the entity in memory and so it's important to clear and flush this to the database at regular intervals to avoid running out of memory or impacting performance.
The JDBC batch_size setting itself influences how frequent the actual driver flushes statements to the database in order to improve performance. Let's take a slightly modified example:
Session session = sessionFactory.openSession();
try {
session.getTransaction().begin();
for ( int i = 0; i < 10000; ++i ) {
Student student = new Student();
...
session.save( student );
}
session.getTransaction().commit();
}
catch( Throwable t ) {
if ( session.getTransaction().getStatus() == TransactionStatus.ACTIVE ) {
session.getTransaction().rollback();
}
throw t;
}
finally {
session.close();
}
As you can see, we're not using flush()
or clear()
here.
What happens here is that as Hibernate performs the flush at commit time, the driver will send batch_size number of inserts to the database in bulk rather than each one individually. So rather than 10,000 network packets being sent, if batch_size were 250 it would only send 40 packets.
Now what is important to recognize is there are factors that can disable batching such as using identity based identifiers like IDENTITY
or AUTO_INCREMENT
. Why?
That is because in order for Hibernate to store the entity in the PersistenceContext, it must know the entity's ID and the only way to obtain that value when using IDENTITY
based identifier generation is to actually query the database for the value after each insert operation. Therefore, inserts cannot be batched.
This is precisely why people doing bulk insert operations often observe poor performance because they don't realize the impact the identifier generation strategy they pick can have.
It's best to use some type of cached sequence generator or some manually application assigned identifier instead when you want to optimize batch loading.
Now going back to your example using flush()
and clear()
, the same problems hold true with identifier generation strategy. If you want those operations to be bulk/batch sent to the database, be mindful of the identifier strategy you're using for Student
.
hibernate.jdbc.batch_size
in my configuration file? What is it used for if I'm doing all the batching manually? I mean what if I didn't set this value? Would anything change? – Enquire