Handling large records in a Java EE application
Asked Answered
C

3

3

There is a table phonenumbers with two columns: id, and number. There are about half a million entries in the table. Database is MySQL.

The requirement is to develop a simple Java EE application, connected to that database, that allows a user to download all numbervalues in comma separated style by following a specific URL.

If we get all the values in a huge String array and then concatenate them (with comma in between all the values) in a String and then send it down to the user, does it sound a proper solution?

The application is not public and will be used by a limited no. of people.

Caber answered 1/2, 2010 at 23:27 Comment(3)
Having half a million comma separated list of values does not sound good to me. May be newline separated is okay so that user can easily open this file in text editor if required. But this depends on how users wants to use this list of values. Can you please elaborate on user requirements?Marcelline
CSV is a user requirement because it is supported by MS Excel. I am not sure how they are going to use it, probably they would use it for reporting purpose.Caber
CSV's map to columns in excel. If you have half a million CSV entries, then how is going to open correctly in excel? Excel does not have half a million columns. You need to as users how are they going to use it. Most of the time users are not sure of their requirements.Marcelline
B
9

Your best bet is to not store the data in Java's memory in any way, but just write the obtained data to the response immediately as the data comes in. You also need to configure the MySQL JDBC driver to serve the resultset row-by-row by Statement#setFetchSize() as per the MySQL JDBC driver documentation, otherwise it will cache the whole thing in memory.

Assuming you're familiar with Servlets, here's a kickoff example which takes that all into account:

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    response.setContentType("text/plain");
    response.setHeader("Content-Disposition", "attachment;filename=numbers.txt"); // Force download popup.

    Connection connection = null;
    Statement statement = null;
    ResultSet resultSet = null;
    Writer writer = response.getWriter();

    try {
        connection = database.getConnection();
        statement = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
        statement.setFetchSize(Integer.MIN_VALUE);
        resultSet = statement.executeQuery("SELECT number FROM phonenumbers");

        while (resultSet.next()) {
            writer.write(resultSet.getString("number"));
            if (!resultSet.isLast()) {
                writer.write(",");
            }
        }
    } catch (SQLException e) {
        throw new ServletException("Query failed!", e);
    } finally { 
        if (resultSet != null) try { resultSet.close; } catch (SQLException logOrIgnore) {}
        if (statement != null) try { statement.close; } catch (SQLException logOrIgnore) {}
        if (connection != null) try { connection.close; } catch (SQLException logOrIgnore) {}
    }
}
Borzoi answered 2/2, 2010 at 0:45 Comment(2)
I agree on "write the obtained data to the response immediately as the data comes in"Marcelline
Thank you BalusC. This is exactly what I wanted!Caber
B
1

There's a bit more to properly formatting CSV output. It would be easiest to use an existing library such as this one to generate the output file.

You can generate output to a file on disk (on the web server) and then redirect the browser to that file (with a cron job or whatever to clean up old data) or just stream the result directly back to the user.

If you are streaming directly be sure and set the MIME type to something that will trigger a download in the user's browser (e.g. text/csv or text/comma-separated-values)

Blind answered 1/2, 2010 at 23:35 Comment(3)
I agree but just want to add supercsv.sourceforge.net as a free alternative CSV reader/writer libraryYon
I never worked with CSV before. Can you please elaborate why do we need a full library for this purpose, as apparently it only seems to be writing values separated by commas with a Writer (as can be seen in BalusC's example)?Caber
In the general case your data can contain commas, which throw off the field order if not properly handled since fields are separated by commas. To handle that, you quote your fields that might contain commas. Then again, your data might contain quote characters, which throws off the quoting if not properly handled. You have to escape the quotes. Rather than do this by hand, I find it easier to use a pre-existing library. If you are 100% confident your data will never contain a quote or comma (including typos) you can just write your fields separated by commas.Blind
C
0

If using Mysql 5.1+, I would simply use the proprietary syntax to dump the file somewhere and stream it in a Servlet response.

SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  FROM test_table;

http://dev.mysql.com/doc/refman/5.1/en/select.html

For so many records, if you still want to use JDBC, you may try the following:

  • fetch the number of records fetch few records( using a query limit ) and write them
  • if you reach the number of records in a chunk, you fetch another one until you reach the maximum number of records
Colcothar answered 2/2, 2010 at 0:48 Comment(3)
Assuming the web server shares a file system with the DB. In many deployment scenarios, it does not.Blind
In my case as well, DB and application are running on different machines.Caber
I believe my JDBC suggestion is among the same lines as the fetch size suggestion for the query to avoid caching the all thing in memory and avoid the filesystem sharing issue.Colcothar

© 2022 - 2024 — McMap. All rights reserved.