To answer your question, yes, POI will always use very large amount of memory when working on large XLSX files, which is much larger than the size of the XLSX files.
I don't think this will change anytime soon, and there are pretty obvious reasons for that: XLSX is basically a bunch of zipped XML files, and XML is very well compressed (around 10x). Getting this XML just to sit in memory uncompressed would already increase the memory consumption tenfold, so if you add all the overhead of data structures, there's no way you should expect a 10% increase in memory consumption over the XLSX file size.
Now, the good news is that as mentioned in the comments, Apache POI introduced SXSSF for streaming very large amount of data in a spreadsheet with very good performance and low memory usage. XLSX files generated this way are still streamed on the hard disk where they can end up taking quite a bit of space, but at least you don't risk OOME when writing hundreds of thousands of rows.
The problem for you is that you won't be able to get JETT to directly work with SXSSF, as it needs the whole document loaded in memory for performing template filling. JETT author quickly discussed this topic here.
I had the same problem, and ended up doing a two-step XLSX creation:
A standard JETT XLSX template to generate headers and formatting. The last row of the first sheet contains cells with $$tokens$$, one per cell. I don't use JETT to insert the large amount of rows.
Once JETT did its work, I reopen the workbook, read then delete the $$tokens$$ on the last line of the first spreadsheet, and start streaming data with SXSSF row by row.
Of course, there are limitations to this approach:
- You cannot use JETT on any of the streamed rows during rows insertion (but you can before, to dynamically pick the order of the $$tokens$$ for example)
- Cells format won't be copied unless you take care of it yourself with POI API. I personally prefer to format whole columns in my XLSX file, and it will apply to the streamed data.
This also works if you want to show charts using data inserted with SXSSF: You can define a Named Range with functions OFFSET and COUNTA, then create a Pivot table & Pivot Chart that will be refreshed when the XLSX is opened in Excel.
.xlsx
will need more memory or temp files than.xls
, as the overhead of all the XML and the compression are higher than the binary formats – Libb.xlsx
file a lot more easily than with.xls
. Also, for large files,xlsx
use less memory when done properly. – Declarative