high performance hibernate insert
Asked Answered
G

3

16

I am working on a latency sensitive part of an application, basically i will receive a network event transform the data and then insert all the data into the DB. After profiling i see that basically all my time is spent trying to save the data. here is the code

private void insertAllData(Collection<Data> dataItems)
{
    long start_time = System.currentTimeMillis();
    long save_time = 0;
    long commit_time = 0;
    Transaction tx = null;
    try
    {
        Session s = HibernateSessionFactory.getSession();
        s.setCacheMode(CacheMode.IGNORE);
        s.setFlushMode(FlushMode.NEVER);
        tx = s.beginTransaction();
        for(Data data : dataItems)
        {
            s.saveOrUpdate(data);
        }
        save_time = System.currentTimeMillis();
        tx.commit();
        s.flush();
        s.clear();
    }
    catch(HibernateException ex)
    {
        if(tx != null)
            tx.rollback();
    }
    commit_time = System.currentTimeMillis();
    System.out.println("Save: " + (save_time - start_time));
    System.out.println("Commit: " + (commit_time - save_time));
    System.out.println();
}

The size of the collection is always less than 20. here is the timing data that i see:

Save: 27
Commit: 9

Save: 27
Commit: 9

Save: 26
Commit: 9

Save: 36
Commit: 9

Save: 44
Commit: 0

This is confusing to me. I figure that the save should be quick and all the time should be spent on commit. but clearly I'm wrong. I have also tried removing the transaction (its not really necessary) but i saw worse times... I have set hibernate.jdbc.batch_size=20...

I can expect to get as many as 500 messages/sec so i need single message handling to be under 20 milliseconds.

i need this operation to be as fast as possible, ideally there would only be one roundtrip to the database. How can i do this?

Grieg answered 12/6, 2010 at 0:6 Comment(3)
BTW, aren't you supposed to commit() after the flush() when using FlushMode#NEVER?Anodic
@Pascal Thivent. I don't know :-)Grieg
Well, read the javadoc of Transaction#commit() :)Anodic
K
18

Move your primary key generation away from a server side auto-increment. Your Java code must be responsible for the PK generation to avoid round trips.

For decent bulk insert performance, you need a method that won't need to hit the database on every single call to saveOrUpdate. Using UUIDs as the primary key, or implementing HiLo can help achieve this. Otherwise, there's no bulk insert actually going on.

To have both performance and interoperability with other external systems, the pooled or the pooled-lo optimizers are the best choice.

Kami answered 12/6, 2010 at 3:21 Comment(5)
im currently using an oracle sequence to generate ids. is this not feasible?Grieg
That was exactly it!, I removed the sequence and added a query to the startup to figure out where to start the sequence and bam, 7.5X speedup putting it well below my threshold.Grieg
Hibernate has the ability to generate ID sequences, you don't have to write that code your self. If you are using annotations you can add @Id with @GeneratedValue.Redfin
Justin makes an important point, I should have mentioned this ... Hibernate handles both methods out of the box with only a little configuration/annotation (no code required). If you want hilo all you need is to create a table and do a little config, Hibernate handles the rest docs.jboss.org/hibernate/stable/core/reference/en/html/…Kami
FYI if you have a serial and want to keep it, that should be fine as long as you have some other way of identifying the record. just don't map the serial ie in hibernate ignore it exists at all, the db will still generate and store it. of course you'll need to map something else as the id for hibernate.Organist
A
3

Honestly, I don't know what can be reasonably concluded from your test and from the "measures" you're showing (I suspect much overhead from the warmup, the collection is very small, and the sample is very small).

Anyway, I can tell you that your current code won't scale and you are very likely going to explode the Session when passing a bigger collection. You need to flush and clear the session at regular intervals (each 20 records if the batch size is 20).

Actually, I recommend reading the whole Chapter 13. Batch processing.

Anodic answered 12/6, 2010 at 2:13 Comment(1)
i am flushing and clearing the session in the code above. collections will never be bigger than 20.Grieg
R
0

Some basic stuff:

  • Do you have triggers, or foreign key constraints with no index?
  • Do you have batching drivers?
  • Are your drivers in batch mode (see hibernate.jdbc.batch_size from Pascal's reference)?
  • Any indexes on your tables (if you have a lot of indexes, it can sometimes slow down insert)?

Batching is part of JDBC 2.0, it allows you to execute several statements in a 'batch'; the idea is to reduce round trip latency (you can execute multiple batches per transaction).

Statement stmt = dbCon.createStatement("insert into DataTable values (?,?,?)");
stmt.setInt(1, x1); stmt.setInt(2, x2), stmt.setString(3, "some value");
stmt.addBatch();
...
stmt.setInt(1, x2); stmt.setInt(2, x3), stmt.setString(3, "some other value");
stmt.addBatch();

stmt.executeBatch();
dbCon.commit();

You can probably use this as a benchmark test. I would also look at the SQL that hibernate generates, to see if it is performing a query per insert to get the generated Ids.

Redfin answered 12/6, 2010 at 2:52 Comment(1)
how can i tell if i have a batching driver?Grieg

© 2022 - 2024 — McMap. All rights reserved.