Oracle CLOB performance
Asked Answered
H

4

9

I am running queries against an Oracle 10g with JDBC (using the latest drivers and UCP as DataSource) in order to retrieve CLOBs (avg. 20k characters). However the performance seems to be pretty bad: the batch retrieval of 100 LOBs takes 4s in average. The operation is also neither I/O nor CPU nor network bound judging from my observations.

My test setup looks like this:

PoolDataSource dataSource = PoolDataSourceFactory.getPoolDataSource();
dataSource.setConnectionFactoryClassName("...");
dataSource.setConnectionPoolName("...");
dataSource.setURL("...");
dataSource.setUser("...");
dataSource.setPassword("...");

dataSource.setConnectionProperty("defaultRowPrefetch", "1000");
dataSource.setConnectionProperty("defaultLobPrefetchSize", "500000");

final LobHandler handler = new OracleLobHandler();
JdbcTemplate j = new JdbcTemplate(dataSource);

j.query("SELECT bigClob FROM ...",

        new RowCallbackHandler() {

            public void processRow(final ResultSet rs) throws SQLException {

                String result = handler.getClobAsString(rs, "bigClob");

            }

        });

}

I experimented with the fetch sizes but to no avail. Am I doing something wrong? Is there a way to speed up CLOB retrieval when using JDBC?

Hinkley answered 6/10, 2009 at 13:59 Comment(7)
How have you determined it's not network bound? You're talking about setting up a new JDBC connection (expensive), 2Mb worth of data to read from disk, send it over the network and the overhead of the query (which isn't specified). I don't know if 4s is all that bad depending on your network layout and database setup.Scratchboard
Clarification: I measure in units of 100 so the initial penalty of connecting does not count. The total network throughput stay below 2Mbit/s so I suppose it's not network bound.Hinkley
How long does the actual query take?Scratchboard
About 3 seconds. I do not think that the query complexity has something to do with the throughput, though (I do not use a first_rows hint or anything esoteric). The total size of the result set is in the ten thousands - measured over the span of the whole retrieval the initial costs (JDBC connection, query return) are not relevant.Hinkley
Can you try to measure the difference between retrieving the CLOB column and a more "normal" non-indexed column, but using the same tables, joins and where-clauses? This should give you an indication weather the problem is caused by the CLOB or the row access.Electromagnetism
Just fetching the primary key of the CLOB containing table takes about .3s for 10k rows.Hinkley
Did you try selecting them using SQLPLUS or any other application (Oracle SQL Developer ?!). Does querying itself take time. Have you tried `explain plan for select bigClob from ...' ?Germain
H
2

Thanks for all the helpful suggestions. Despite being flagged as answer to the problem my answer is that there seems to be no good solution. I tried using parallel statements, different storage characteristics, presorted temp. tables and other things. The operation seems not to be bound to any characteristic visible through traces or explain plans. Even query parallelism seems to be sketchy when CLOBs are involved.

Undoubtedly there would be better options to deal with with large CLOBs (especially compression) in an 11g environment but atm. I am stuck with 10g.

I have opted now for an additional roundtrip to the database in which I'll preprocess the CLOBs into a size optimized binary RAW. In previous deployments this has always been a very fast option and will likely be worth the trouble of maintaining an offline computed cache. The cache will be invalided and update using a persistent process and AQ until someone comes up with a better idea.

Hinkley answered 12/10, 2009 at 19:58 Comment(2)
Looks like a good workaround as if you retrieve 100 rows, you will pay addition roundtrip but save 100 roundtrip, totally you save 100-1 =99 round trips. But how did you implement it?Forgo
By fetching only CLOBs + some key to identify them later and writing them into a local key-value store. I used Oracle Berkeley DB but you could easily use SQLite or anything else I suppose.Hinkley
T
7

The total size of the result set is in the ten thousands - measured over the span of the whole retrieval the initial costs

Is there an Order By in the query? 10K rows is quite a lot if it has to be sorted.

Also, retrieving the PK is not a fair test versus retrieving the entire CLOB. Oracle stores the table rows with probably many in a block, but each of the CLOBs (if they are > 4K) will be stored out of line, each in a series of blocks. Scanning the list of PK's is therefore going to be fast. Also, there is probably an index on the PK, so Oracle can just quickly scan the index blocks and not even access the table.

4 seconds does seem a little high, but it is 2MB that needs to be possible read from disk and transported over the network to your Java program. Network could be an issue. If you perform an SQL trace of the session it will point you at exactly where the time is being spent (disk reads or network).

Teller answered 7/10, 2009 at 13:32 Comment(0)
A
6

My past experience of using oracle LOB type data to store large data has not been good. It is fine when it is under 4k since it store it locally like varchar2. Once it is over 4k, you start seeing performance degrade. Perhaps, things may have improved since I last tried it a couple of years ago, but here are the things I found in the past for your information:

As clients need to get LOBs via oracle server, you may consider the following interesting situation.

  • lob data will compete limited SGA cache with other data type if oracle decide to cache it. As clob data are general big, so it may push other data
  • lob data get poor disk read if oracle decide not to cache it, and stream the data to the client.
  • fragmentation is probably something that you haven't encountered yet. You will see if your applications delete lobs, and oracle tries to reuse the lob. I don't know if oracle support online defragmenting the disk for lob (they have for indexes, but it takes long time when we tried it previous).

You mentioned 4s for 100 lobs of avg 20k, so it's 40ms per lobs. Remember each lob needs to have to retrieved via separate Lob locater (it is not in the result set by default). That is an additional round trip for each lob, I assume (I am not 100% sure on this since it was a while ago) If that is the case, I assume that will be at least 5ms extra time per round trip in serial order, right? If so, your performance is already first limited by sequential lob fetches. You should be able to verify this by tracking the time spent in sql execution vs lob content fetching. Or you can verify this by excluding the lob column as suggested by the previous answer in the post, which should tell you if it is lob related.

Good luck

Asphaltite answered 7/10, 2009 at 17:52 Comment(0)
P
6

I had a similar issue and found the JDBC Lobs making a network call when accessin the lobs.

As of Oracle 11.2g JDBC Driver you can use a prefetch. This speeded up access by 10 times...

statement1.setFetchSize(1000);
if (statement1 instanceof OracleStatement) {
    ((OracleStatement) statement1).setLobPrefetchSize(250000);
}
Prowl answered 17/4, 2014 at 15:52 Comment(0)
H
2

Thanks for all the helpful suggestions. Despite being flagged as answer to the problem my answer is that there seems to be no good solution. I tried using parallel statements, different storage characteristics, presorted temp. tables and other things. The operation seems not to be bound to any characteristic visible through traces or explain plans. Even query parallelism seems to be sketchy when CLOBs are involved.

Undoubtedly there would be better options to deal with with large CLOBs (especially compression) in an 11g environment but atm. I am stuck with 10g.

I have opted now for an additional roundtrip to the database in which I'll preprocess the CLOBs into a size optimized binary RAW. In previous deployments this has always been a very fast option and will likely be worth the trouble of maintaining an offline computed cache. The cache will be invalided and update using a persistent process and AQ until someone comes up with a better idea.

Hinkley answered 12/10, 2009 at 19:58 Comment(2)
Looks like a good workaround as if you retrieve 100 rows, you will pay addition roundtrip but save 100 roundtrip, totally you save 100-1 =99 round trips. But how did you implement it?Forgo
By fetching only CLOBs + some key to identify them later and writing them into a local key-value store. I used Oracle Berkeley DB but you could easily use SQLite or anything else I suppose.Hinkley

© 2022 - 2024 — McMap. All rights reserved.