What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?
Asked Answered
P

9

71

I have this really big table with some millions of records every day and in the end of every day I am extracting all the records of the previous day. I am doing this like:

String SQL =  "select col1, col2, coln from mytable where timecol = yesterday";
Statement.executeQuery(SQL);

The problem is that this program takes like 2GB of memory because it takes all the results in memory then it processes it.

I tried setting the Statement.setFetchSize(10) but it takes exactly the same memory from OS it does not make any difference. I am using Microsoft SQL Server 2005 JDBC Driver for this.

Is there any way to read the results in small chunks like the Oracle database driver does when the query is executed to show only a few rows and as you scroll down more results are shown?

Prehistoric answered 23/8, 2009 at 11:34 Comment(0)
A
87

In JDBC, the setFetchSize(int) method is very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used for ResultSet processing.

Inherently if setFetchSize(10) is being called and the driver is ignoring it, there are probably only two options:

  1. Try a different JDBC driver that will honor the fetch-size hint.
  2. Look at driver-specific properties on the Connection (URL and/or property map when creating the Connection instance).

The RESULT-SET is the number of rows marshalled on the DB in response to the query. The ROW-SET is the chunk of rows that are fetched out of the RESULT-SET per call from the JVM to the DB. The number of these calls and resulting RAM required for processing is dependent on the fetch-size setting.

So if the RESULT-SET has 100 rows and the fetch-size is 10, there will be 10 network calls to retrieve all of the data, using roughly 10*{row-content-size} RAM at any given time.

The default fetch-size is 10, which is rather small. In the case posted, it would appear the driver is ignoring the fetch-size setting, retrieving all data in one call (large RAM requirement, optimum minimal network calls).

What happens underneath ResultSet.next() is that it doesn't actually fetch one row at a time from the RESULT-SET. It fetches that from the (local) ROW-SET and fetches the next ROW-SET (invisibly) from the server as it becomes exhausted on the local client.

All of this depends on the driver as the setting is just a 'hint' but in practice I have found this is how it works for many drivers and databases (verified in many versions of Oracle, DB2 and MySQL).

Abdulabdulla answered 21/3, 2013 at 22:52 Comment(5)
I know this is old answer. But I have a question. What is difference between fetchsize and scroll ? As per my understanding scroll is used so that we dont fetch all results at a time.Guerdon
Scrolling is the process of moving forward/backward through the result set. To scroll, one inherently fetches or may have already fetched into JVM memory. Cursors in various databases may be forward-only despite options in JDBC to scroll backward through a fetched data set. So fetch-size is still the way one sets how much data us pulled from the DB across the network. Scroll settings would not affect this in JDBC.Abdulabdulla
I don't think OP is saying that setFetchSize is being ignored. You can use small fetch sizes but the driver is still going to be pulling in the entire result set which may be big and would use a considerable amount of memory.Eventide
The OP stated fetchSize() method was being called but the row-set returned was still consuming 2GB of RAM (fetch size effectively being ignored). Most drivers will honor the fetchSize() call, thereby not "pulling the entire result set" (into client side / memory in JVM). See entire answer and discussion on result-set, row-set and how JDBC drivers are setup to make this efficient. Suffice to say, drivers that do not honor the call arguably are not spec-compliant or perhaps are being overridden in vendor-proprietary ways.Abdulabdulla
Just for reference, the MSSQL Doc itself references use of setFetchSize() to "reduce the number of rows fetched". learn.microsoft.com/en-us/sql/connect/jdbc/…Abdulabdulla
R
33

The fetchSize parameter is a hint to the JDBC driver as to many rows to fetch in one go from the database. But the driver is free to ignore this and do what it sees fit. Some drivers, like the Oracle one, fetch rows in chunks, so you can read very large result sets without needing lots of memory. Other drivers just read in the whole result set in one go, and I'm guessing that's what your driver is doing.

You can try upgrading your driver to the SQL Server 2008 version (which might be better), or the open-source jTDS driver.

Run answered 30/12, 2009 at 19:56 Comment(3)
Absolutely correct. For MSSQL the jTDS driver is a better choice.Cytologist
How would one set responseBuffering to adaptive on the jTDS driver, and I mean, not on the driver level, but on the query level?Sunbreak
The MSSQL driver is its own beast and has a bunch of discussion about BLOB data types as its adaptive buffering but buried in there is the JDBC setFetchSize() method to "reduce the number of rows fetched". See my answer. learn.microsoft.com/en-us/sql/connect/jdbc/…Abdulabdulla
W
19

You need to ensure that auto-commit on the Connection is turned off, or setFetchSize will have no effect.

dbConnection.setAutoCommit(false);

Edit: Remembered that when I used this fix it was Postgres-specific, but hopefully it will still work for SQL Server.

Weakly answered 26/8, 2009 at 1:35 Comment(6)
I don't know Postgres but ... setting auto-commit should not have any reference on a SELECT statement and/or query fetch-size (they are really unrelated).Abdulabdulla
@Weakly Is this If auto-commit on the Connection is turned on, then setFetchSize will have no effect. also true for Oracle 11g ?Pseudonym
Commit is a function of data-manipulation-language (DML) statements like INSERT, UPDATE, DELETE in two phase commit (2PC) database. This has nothing to do with queries, which is what "fetch" is related to "SELECT". Thus the value of auto-commit (DML statement is auto-committed on execution versus having to separately execute a commit thereafter) has nothing to do with fetching in Oracle, MySQL or SQLServer and probably any other RDBMS.Abdulabdulla
It's years ago but I can't let this pass - you're really claiming that commit is only relevant to 2PC? And that commit has no relevance to Queries? You need to review ACID principles. Commit commits DML in any transaction. Re queries: set isolation level to "serializable" in any RDBMS that supports it. Do a query, observe the results but do not commit. In another transaction do a similar query. Second transaction will block until the first commits (in case the first query makes updates).Coincide
The postgres docs confirm this, the answer given here is correct: "The Connection must not be in autocommit mode. The backend closes cursors at the end of transactions, so in autocommit mode the backend will have closed the cursor before anything can be fetched from it. *The Statement must be created with a ResultSet type of ResultSet.TYPE_FORWARD_ONLY. This is the default, so no code will need to be rewritten to take advantage of this, but it also means that you cannot scroll backwards or otherwise jump around in the ResultSet."Coincide
ACID guarantees are relevant to query results but unless standard DB configs are modified, a SELECT sratement (not SELECT-FOR-UPDATE) has no blocking behavior. Otherwise DBs around the world would be pausing on queries for every DML operation.Abdulabdulla
S
6

Statement interface Doc

SUMMARY: void setFetchSize(int rows) Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed.

Read this ebook J2EE and beyond By Art Taylor

Stanton answered 23/8, 2009 at 11:42 Comment(0)
S
3

Sounds like mssql jdbc is buffering the entire resultset for you. You can add a connect string parameter saying selectMode=cursor or responseBuffering=adaptive. If you are on version 2.0+ of the 2005 mssql jdbc driver then response buffering should default to adaptive.

http://msdn.microsoft.com/en-us/library/bb879937.aspx

Schiro answered 30/12, 2009 at 19:52 Comment(2)
Would be useful to know what "adaptive" property actually does when enabled. At some (socket/network) level, there is some chunk of the result-set being fetched from server by the client (affecting performance/JVM memory requirements as discussed earlier.Abdulabdulla
The MSSQL Doc itself references using "setFetchSize()" method. learn.microsoft.com/en-us/sql/connect/jdbc/…Abdulabdulla
A
2

It sounds to me that you really want to limit the rows being returned in your query and page through the results. If so, you can do something like:

select * from (select rownum myrow, a.* from TEST1 a )
where myrow between 5 and 10 ;

You just have to determine your boundaries.

Alameda answered 23/8, 2009 at 11:59 Comment(2)
Related - some databases support query terms like "LIMIT=x" to limit the number of rows in the result-set at a fixed cap. However, depending on row ordering, etc ... this is a bit of an (arbitrary) hammer.Abdulabdulla
Read setFetchSize(size) of MsSQL JDBC Driver: SqlServerJab
F
1

Try this:

String SQL = "select col1, col2, coln from mytable where timecol = yesterday";

connection.setAutoCommit(false);
PreparedStatement stmt = connection.prepareStatement(SQL, SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY, SQLServerResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(2000);

stmt.set....

stmt.execute();
ResultSet rset = stmt.getResultSet();

while (rset.next()) {
    // ......
Fabrianna answered 4/7, 2013 at 0:52 Comment(0)
S
1

I had the exact same problem in a project. The issue is that even though the fetch size might be small enough, the JDBCTemplate reads all the result of your query and maps it out in a huge list which might blow your memory. I ended up extending NamedParameterJdbcTemplate to create a function which returns a Stream of Object. That Stream is based on the ResultSet normally returned by JDBC but will pull data from the ResultSet only as the Stream requires it. This will work if you don't keep a reference of all the Object this Stream spits. I did inspire myself a lot on the implementation of org.springframework.jdbc.core.JdbcTemplate#execute(org.springframework.jdbc.core.ConnectionCallback). The only real difference has to do with what to do with the ResultSet. I ended up writing this function to wrap up the ResultSet:

private <T> Stream<T> wrapIntoStream(ResultSet rs, RowMapper<T> mapper) {
    CustomSpliterator<T> spliterator = new CustomSpliterator<T>(rs, mapper, Long.MAX_VALUE, NON-NULL | IMMUTABLE | ORDERED);
    Stream<T> stream = StreamSupport.stream(spliterator, false);
    return stream;
}
private static class CustomSpliterator<T> extends Spliterators.AbstractSpliterator<T> {
    // won't put code for constructor or properties here
    // the idea is to pull for the ResultSet and set into the Stream
    @Override
    public boolean tryAdvance(Consumer<? super T> action) {
        try {
            // you can add some logic to close the stream/Resultset automatically
            if(rs.next()) {
                T mapped = mapper.mapRow(rs, rowNumber++);
                action.accept(mapped);
                return true;
            } else {
                return false;
            }
        } catch (SQLException) {
            // do something with this Exception
        }
    }
}

you can add some logic to make that Stream "auto closable", otherwise don't forget to close it when you are done.

Saccule answered 14/2, 2018 at 20:24 Comment(1)
This is a quintessential Spring template example of what not to do (by making it more complicated, less efficient and missing the point entirely). Apologies... not personal - there is some apparent Spring indoctrination process where Java gets twisted up into knots to make it fit a Template. In short, the JDBC Template (like all templates) is just for Hello World 101 type calls to fetch one row or something. It does not out-of-the-box allow for setting the fetchSize() - same problem with Hibernate. Streams and Splitters here are not solving the problem of managing the result-set size fetched.Abdulabdulla
B
0

For Big Data developers:

Take a look at what Exasol's documentation say about the setFetchSize():

Reading big data volumes Via the parameter "fetchsize", you can determine the data volume which should be fetched from the database per communication round. If this value is too low, the data transfer can take too long. If this value is too high, the JVM can run out of memory. We recommend a fetch size of 1000-2000.

Badillo answered 26/2 at 16:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.