Streaming a POI workbook to the servlet output stream
Asked Answered
A

6

22

I build a very large POI workbook, on my web server. Holding the entire workbook in memory , will not scale for multiple concurrent requests. Is there a way i can progressively write the workbook to the servlet output stream. That should reduce the response time , as well as make the process memory efficient.

Ammerman answered 20/4, 2010 at 8:49 Comment(0)
T
10

If you are about to generate Excel 2007 (xslx) then you could adapt the approach of BigGridDemo.java as described here: http://web.archive.org/web/20110821054135/http://www.realdevelopers.com/blog/code/excel

The solution is to let POI generate a container xslx as a template only and stream the actual spreadsheet data as XML into a zip output stream. Streamlining the XML generation is then up to you.

Tajuanatak answered 19/6, 2010 at 16:39 Comment(1)
This works like a charm and is the best answer here so far, because this approach does not require neither excessive memory, nor flushing main contents to disk, nor does it cause TTFB timeouts on the client. No need to know the full document contents in advance.Josuejosy
L
9

The situation has improved considerably since the rest of the answers were written - Streaming is now part of Apache Poi.

See the SXSSFWorkbook class, and the documentation here. It uses a streaming window over the sheet, flushing old rows outside the window to temporary files.

This is based on the BigGridDemo approach used in hlg's answer, but now part of the official distribution.

Here's the example from the documentation:

public static void main(String[] args) throws Throwable {
    // keep 100 rows in memory, exceeding rows will be flushed to disk
    SXSSFWorkbook wb = new SXSSFWorkbook(100); 
    Sheet sh = wb.createSheet();
    for(int rownum = 0; rownum < 1000; rownum++){
        Row row = sh.createRow(rownum);
        for(int cellnum = 0; cellnum < 10; cellnum++){
            Cell cell = row.createCell(cellnum);
            String address = new CellReference(cell).formatAsString();
            cell.setCellValue(address);
        }

    }

    // Rows with rownum < 900 are flushed and not accessible
    for(int rownum = 0; rownum < 900; rownum++){
      Assert.assertNull(sh.getRow(rownum));
    }

    // ther last 100 rows are still in memory
    for(int rownum = 900; rownum < 1000; rownum++){
        Assert.assertNotNull(sh.getRow(rownum));
    }

    FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
    wb.write(out);
    out.close();

    // dispose of temporary files backing this workbook on disk
    wb.dispose();
}
Letterpress answered 21/4, 2017 at 0:42 Comment(7)
This example seems to still keep the whole workbook in memory before streaming it out. Interesting, whether the wb.write(out) call can be placed into the for loop in order to address the initial question.Josuejosy
The problem is also not only memory, but the time needed sometimes to prepare the entire document. It seems reasonable to start streaming the data out as they become available, instead of waiting for all of them first.Josuejosy
@SergeyShcherbakov Thanks for the comment, but I'm afraid that's not correct. SXSSFWorkbook flushes temporary data to disk behind the scenes, so this example does not keep the whole workbook in memory.Letterpress
A temporary format is necessary because the xlsx format can't be written as a stream unless you know the whole document first (it has summary information and forward references) - and this temporary format is handled by the SXSSFWorkbook. For more information, see the class documentation.Letterpress
I have successfully implemented streaming of a very big Excel document to a web client using the approach from the winning answer here. web.archive.org/web/20110821054135/http://… No excessive memory, no flushing to disk, no TTFB timeouts on the client.Josuejosy
so "can't be written as a stream unless you know the whole document first" is not really true. Some preparation is needed, but I was able to stream to an open OutputStream without knowing the document contents in advance. In fact I was dynamically generating the contents.Josuejosy
Glad you got it working! My understanding is that some documents still need some information about the whole document to be known ahead of time (but clearly not all documents or all information, as you pointed out). The example at that link is based on the BigGridDemo example, which became the SXSSFWorkbook class. The workbook class is pretty general (it supports limited random access), and still allows streaming a POI document without keeping the whole thing in memory. If your use case is more specific, a custom class like yours could absolutely guarantee better performance.Letterpress
R
3

Unfortunately, that's impossible when there's no means of sequential data. I'd suggest to look for another format, e.g. CSV or XML. Both can be written out sequentially. If it's coming from a DB, it can even be done more efficient since a decent DB has builtin facilities to efficiently export to those formats. You just have to stream the bytes from one to other side.

Ruminate answered 20/4, 2010 at 12:15 Comment(0)
P
1

Did you tried with the write method direct to the HttpServletResponse.getOutputStream()?

Please take a look at the following example:

 HSSFWorkbook wb = new HSSFWorkbook();
 HSSFSheet sheet = wb.createSheet("new sheet");
 ...
 OutputStream out = response.getOutputStream();
 wb.write(out);
 out.close();
Puritanism answered 20/10, 2010 at 12:49 Comment(3)
-1 This does not answer the question. The OP explicitly states that the issue he is trying to solve is how to avoid creating everything and then writing it out all at once.Marsupium
-1 You cant stream a xlxs. Just like the OP mentionedEpexegesis
This will still keep in memory till all rows are written to wbIphagenia
M
0

If you use JExcel It has sample code to read stream code to and from a Servlet. http://jexcelapi.sourceforge.net/resources/faq/

The only downside to this API looks like it only supports up to Excel 2003 inclusive.

Using POI - Can you not create the file and serve the file's bytes to the servlet output stream ?

Madrigal answered 21/4, 2010 at 1:17 Comment(2)
I am writing the POI workbook to the servlet outputstream. What happens behind the scenes, is it writes thte bytes into the outputstream. My question is, I don't know how to write it progressively. I have to wait until the entire workbook is created, and only then write it to I/O. And it takes around a minute to write to I/O. Don't know if that is justified.Ammerman
How big is your spreadsheet ? Do you create the spreadsheet, or is it given to you ? If it given to you, do you modify it ? What is taking so much time to write to I/O ?Madrigal
S
0

Since version 5.0.0 there is a DeferredSXSSFWorkBook class available that uses a generator function to process rows. This way you can stream directly to your HTTP Response. I took the code below from the DeferredGeneration.java example. The API is considered experimental. It works fine in my use case.

// ExcelWriter.java
public class ExcelWriter {
  public static void write(OutputStream out) throws IOException {
    try (DeferredSXSSFWorkbook wb = new DeferredSXSSFWorkbook()) {
      DeferredSXSSFSheet sheet1 = wb.createSheet("new sheet");

      // cell styles should be created outside the row generator function
      CellStyle cellStyle = wb.createCellStyle();
      cellStyle.setAlignment(HorizontalAlignment.CENTER);

      sheet1.setRowGenerator((ssxSheet) -> {
          for (int i = 0; i < 10; i++) {
              Row row = ssxSheet.createRow(i);
              Cell cell = row.createCell(1);
              cell.setCellStyle(cellStyle);
              cell.setCellValue("value " + i);
          }
      });

      // wb.write(out);
      // writeAvoidingTempFiles was added as an experimental change in POI 5.1.0
      wb.writeAvoidingTempFiles(out);

      // the dispose call is necessary to ensure temp files are removed
      // supposedly writeAvoidingTempFiles doesn't guarantee that temp files are NOT created
      wb.dispose();
    }
  }
}

This is how I would implement it in Quarkus 3.8.3 w/ RestEasy Reactive. Note, it is still a blocking call.

// SomeResource.java
@GET
@Path("my-endpoint/xlsx")
@Produces(MediaType.APPLICATION_OCTET_STREAM)
@ResponseHeader(name="content-disposition", value="attachment; filename=file.xlsx")
public StreamingOutput getFile() {
 
  return new StreamingOutput() {
    @Override
    public void write(OutputStream out) throws IOException, WebApplicationException {
      ExcelWriter.write(out);
    }
  };
}

Stithy answered 27/3, 2024 at 13:12 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.