Hibernate batch size confusion
Asked Answered
N

3

38

This program does tens of thousands of consecutive inserts one after the other. I've never used Hibernate before. I'm getting extremely slow performance (if I just connect and execute the SQL manually I am 10-12x quicker. My batch_size is set to 50 as per many hibernate tutorials.

Here is a log from a single insert - perhaps you could help me understand exactly what is happening:

START INSERT
11:02:56.121 [main] DEBUG org.hibernate.impl.SessionImpl - opened session at timestamp: 13106053761                                                                                            
11:02:56.121 [main] DEBUG o.h.transaction.JDBCTransaction - begin                                                                                                                               
11:02:56.121 [main] DEBUG org.hibernate.jdbc.ConnectionManager - opening JDBC connection                                                                                                             
11:02:56.121 [main] TRACE o.h.c.DriverManagerConnectionProvider - total checked-out connections: 0                                                                                                    
11:02:56.121 [main] TRACE o.h.c.DriverManagerConnectionProvider - using pooled JDBC connection, pool size: 0                                                                                          
11:02:56.121 [main] DEBUG o.h.transaction.JDBCTransaction - current autocommit status: false                                                                                                    
11:02:56.121 [main] TRACE org.hibernate.jdbc.JDBCContext - after transaction begin                                                                                                             
11:02:56.121 [main] TRACE org.hibernate.impl.SessionImpl - setting flush mode to: MANUAL                                                                                                       
11:02:56.121 [main] TRACE o.h.e.def.DefaultLoadEventListener - loading entity: [com.xyzcompany.foo.edoi.ejb.msw000.MSW000Rec#component[keyW000]{keyW000=F000    ADSUFC}]                           
11:02:56.121 [main] TRACE o.h.e.def.DefaultLoadEventListener - creating new proxy for entity                                                                                                       
11:02:56.122 [main] TRACE o.h.e.d.DefaultSaveOrUpdateEventListener - saving transient instance                                                                                                           
11:02:56.122 [main] DEBUG o.h.e.def.AbstractSaveEventListener - generated identifier: component[keyW000]{keyW000=F000    ADSUFC}, using strategy: org.hibernate.id.CompositeNestedGeneratedValueGenerator
11:02:56.122 [main] TRACE o.h.e.def.AbstractSaveEventListener - saving [com.xyzcompany.foo.edoi.ejb.msw000.MSW000Rec#component[keyW000]{keyW000=F000    ADSUFC}]                                    
11:02:56.123 [main] TRACE o.h.e.d.AbstractFlushingEventListener - flushing session                                                                                                                    
11:02:56.123 [main] DEBUG o.h.e.d.AbstractFlushingEventListener - processing flush-time cascades                                                                                                      
11:02:56.123 [main] DEBUG o.h.e.d.AbstractFlushingEventListener - dirty checking collections                                                                                                          
11:02:56.123 [main] TRACE o.h.e.d.AbstractFlushingEventListener - Flushing entities and processing referenced collections                                                                             
11:02:56.125 [main] TRACE o.h.e.d.AbstractFlushingEventListener - Processing unreferenced collections                                                                                                 
11:02:56.125 [main] TRACE o.h.e.d.AbstractFlushingEventListener - Scheduling collection removes/(re)creates/updates                                                                                   
11:02:56.126 [main] DEBUG o.h.e.d.AbstractFlushingEventListener - Flushed: 1 insertions, 0 updates, 0 deletions to 62 objects                                                                         
11:02:56.126 [main] DEBUG o.h.e.d.AbstractFlushingEventListener - Flushed: 0 (re)creations, 0 updates, 0 removals to 0 collections                                                                                                                                                                                       
11:02:56.132 [main] TRACE o.h.e.d.AbstractFlushingEventListener - executing flush                                                                                                                     
11:02:56.132 [main] TRACE org.hibernate.jdbc.ConnectionManager - registering flush begin                                                                                                             
11:02:56.132 [main] TRACE o.h.p.entity.AbstractEntityPersister - Inserting entity: [com.xyzcompany.foo.edoi.ejb.msw000.MSW000Rec#component[keyW000]{keyW000=F000    ADSUFC}]                         
11:02:56.132 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - about to open PreparedStatement (open PreparedStatements: 0, globally: 0)                                                           
11:02:56.132 [main] DEBUG org.hibernate.SQL - insert into MSW000 (W000_DATA_REC, W000_FILE_FLAGS, KEY_W000) values (?, ?, ?)                                                      
11:02:56.132 [main] TRACE org.hibernate.jdbc.AbstractBatcher - preparing statement                                                                                                                 
11:02:56.132 [main] TRACE o.h.p.entity.AbstractEntityPersister - Dehydrating entity: [com.xyzcompany.foo.edoi.ejb.msw000.MSW000Rec#component[keyW000]{keyW000=F000    ADSUFC}]                       
11:02:56.132 [main] TRACE org.hibernate.type.StringType - binding '    ADSUFCA                                                                                                                                                                                                                                                                                  ' to parameter: 1
11:02:56.132 [main] TRACE org.hibernate.type.StringType - binding ' ' to parameter: 2                                                                                                         
11:02:56.132 [main] TRACE org.hibernate.type.StringType - binding 'F000    ADSUFC' to parameter: 3                                                                                            
11:02:56.132 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - Executing batch size: 1                                                                                                             
11:02:56.133 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - about to close PreparedStatement (open PreparedStatements: 1, globally: 1)                                                          
11:02:56.133 [main] TRACE org.hibernate.jdbc.AbstractBatcher - closing statement                                                                                                                   
11:02:56.133 [main] TRACE org.hibernate.jdbc.ConnectionManager - registering flush end                                                                                                               
11:02:56.133 [main] TRACE o.h.e.d.AbstractFlushingEventListener - post flush                                                                                                                          
11:02:56.133 [main] DEBUG o.h.transaction.JDBCTransaction - commit                                                                                                                              
11:02:56.133 [main] TRACE org.hibernate.impl.SessionImpl - automatically flushing session                                                                                                      
11:02:56.133 [main] TRACE org.hibernate.jdbc.JDBCContext - before transaction completion                                                                                                       
11:02:56.133 [main] TRACE org.hibernate.impl.SessionImpl - before transaction completion                                                                                                       
11:02:56.133 [main] DEBUG o.h.transaction.JDBCTransaction - committed JDBC Connection                                                                                                           
11:02:56.133 [main] TRACE org.hibernate.jdbc.JDBCContext - after transaction completion                                                                                                        
11:02:56.133 [main] DEBUG org.hibernate.jdbc.ConnectionManager - transaction completed on session with on_close connection release mode; be sure to close the session to release JDBC resources!     
11:02:56.133 [main] TRACE org.hibernate.impl.SessionImpl - after transaction completion                                                                                                        
11:02:56.133 [main] TRACE org.hibernate.impl.SessionImpl - closing session                                                                                                                     
11:02:56.133 [main] TRACE org.hibernate.jdbc.ConnectionManager - performing cleanup                                                                                                                  
11:02:56.133 [main] DEBUG org.hibernate.jdbc.ConnectionManager - releasing JDBC connection [ (open PreparedStatements: 0, globally: 0) (open ResultSets: 0, globally: 0)]                            
11:02:56.133 [main] TRACE o.h.c.DriverManagerConnectionProvider - returning connection to pool, pool size: 1                                                                                          
11:02:56.133 [main] TRACE org.hibernate.jdbc.JDBCContext - after transaction completion                                                                                                        
11:02:56.133 [main] DEBUG org.hibernate.jdbc.ConnectionManager - transaction completed on session with on_close connection release mode; be sure to close the session to release JDBC resources!     
11:02:56.134 [main] TRACE org.hibernate.impl.SessionImpl - after transaction completion                                                                                                        
FINISH INSERT
Nitid answered 14/7, 2011 at 1:11 Comment(0)
R
115

When you call session.save(), hibernate will generate an INSERT SQL. This INSERT SQL will be appended to be issued to the DB during flushing (i.e session.flush()) .

During flushing, if hibernate.jdbc.batch_size is set to some non-zero value, Hibernate will use the batching feature introduced in the JDBC2 API to issue the batch insert SQL to the DB .

For example , if you save() 100 records and your hibernate.jdbc.batch_size is set to 50. During flushing, instead of issue the following SQL 100 times :

insert into TableA (id , fields) values (1, 'val1');
insert into TableA (id , fields) values (2, 'val2');
insert into TableA (id , fields) values (3, 'val3');
.........................
insert into TableA (id , fields) values (100, 'val100');

Hiberate will group them in batches of 50 , and only issue 2 SQL to the DB, like this:

insert into TableA (id , fields) values (1, 'val1') , (2, 'val2') ,(3, 'val3') ,(4, 'val4') ,......,(50, 'val50')
insert into TableA (id , fields) values (51, 'val51') , (52, 'val52') ,(53, 'val53') ,(54, 'val54'),...... ,(100, 'val100')  

Please note that Hibernate would disable insert batching at the JDBC level transparently if the primary key of the inserting table isGenerationType.Identity.

From your log: you save() only one record and then flush(), so there is only one appending INSERT SQL to be processed for every flush. That's why Hibernate cannot help you to batch inserting as there is only one INSERT SQL to be processed. You should save() up to the certain amount of records before calling flush() instead of calling flush() for every save().

The best practise of batch inserting is something like this:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for  ( int i=0; i<888888; i++ ) {
  TableA record = new TableA();
    record.setXXXX();
    session.save(record)
    if ( i % 50 == 0 ) { //50, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();

You save and flush the records batch by batch. In the end of each batch you should clear the persistence context to release some memory to prevent memory exhaustion as every persistent object is placed into the first level cache (your JVM's memory). You could also disable the second-level cache to reduce the unnecessary overhead.


Reference:

Repetend answered 14/7, 2011 at 6:6 Comment(10)
Fantastic answer. If I could upvote you more I would. I managed to get it solved, but thanks anyway!Nitid
I wonder why you flush and clear the session before incrementing the loop index i. I guess you should drop i++ from for loop and change if conditional to ++i % 50 == 0. (Error in Hibernate documentation?)Spirit
@dylanyi That is just an example. Maybe the index i is related to scanning a source array (see record.setXXX: where does it take data from?). Flushing at 0 on the beginning doesn't make such a difference when you deal with millions of recordsUnderscore
AFAIK by default Hibernates only generates one INSERT statement, then use the JDBC batch abilities to bind array of values for each parameter, instead of one value per parameter. If the provider (e.g. PostgreSQL, MySQL, FireBird or SQLite3) does not support array binding, it is emulated by running the rows as individual INSERT. Note that some providers expect rewriteBatchedStatements=true in the URI to change this default behavior, and enable multi-INSERT.Fontanez
Very informative. BTW if I use Spring @Transactional then do I need to call flush manually as your example? Say I need to save 2,000 entries in a batch.Bedside
@Bedside Yes. You have toRepetend
Thanks. Could you please tell why ? Isn't The @Transactional doing the flush, commit etc for me ? And if so, how would I determine that after how many iteration I need to flush ?Bedside
@Transactional will help you commit (which trigger flush) when the method complete successfully . So if you rely only on @Transactional , it will only flush one time when TX commits. What we want to do is to flush the session batch by batch manually.Repetend
AFAIK the 2nd snippent means bulk, not batch. It's not the same.Atalanta
"Please note that Hibernate would disable insert batching at the JDBC level transparently if the primary key of the inserting table is GenerationType.Identity" I'm wondering if batching still works with GenerationType.AUTO ? At least Hibernate docs are not stating anything about it...Aesthesia
P
7

If you must use hibernate for huge batch jobs StatelessSession is the way to go. It strips things down to the most basic converting-objects-to-SQL-statements mapping and eliminates all of the overhead of the ORM features you're not using when just cramming rows into the DB wholesale.

It would also be much easier to make suggestions on your actual code than the log :)

Poitiers answered 14/7, 2011 at 1:37 Comment(1)
I know it would. The code is actually naca generated java (from COBOL) so it's incredibly hard to debug and get your head around. Still yet to find where the insert is sighNitid
F
3
11:02:56.133 [main] DEBUG o.h.transaction.JDBCTransaction - commit      

This is saying that the database is committing after every insert. Ensure you are not committing your transaction / closing your session inside the insert loop. Do this once at the end instead.

Ful answered 14/7, 2011 at 1:29 Comment(2)
you should, however, flush and clear your session every few 1000 records or so.Bolyard
@Bolyard yes very good point, otherwise Hibernate will run out of memoryFul

© 2022 - 2024 — McMap. All rights reserved.