I have users who cannot or do not want to connect with relational databases, but instead prefer to work with data exported to excel files. The recordsets exported from these database data can become rather large. (I also export to CSV files).
My question is related to this one: Handling java.lang.OutOfMemoryError when writing to Excel from R.
As recommended in the accepted anser to this question (or rather the first comment), I now use the Rcpp-based openxlsx
package to export some views from the database.
It works when the export has ~67000 rows, but it does not work for larger datasets (~1 million rows, ~20 params, all numeric except a few datetimes).
openxlsx::write.xlsx(data, file = "data.2008-2016.xlsx") # 800000 rows
Error: zipping up workbook failed. Please make sure Rtools is installed or a zip application is available to R.
Try installr::install.rtools() on Windows
(I'm using a Linux PC, and /usr/bin/zip is available to R)
Can I give the openxlsx package more memory? Or set some tuneable options to perform better with large datasets?
For openxlsx, is there something like the options(java.parameters = "-Xmx1000m")
for the java-based xlsx package?
The openxlsx vignette does not mention any options. But maybe there are some undocumented ways or options? (e.g. showing a progress bar during saving)
At this point I proceed like this: close all unneeded apps, restart Rstudio, keep few/no large objects around in the global environment, query db, then run write.xlsx()
.
With a "clean slate" like this, it succeeded in exporting the 800000 row dataset to a 93MB-xlsx-file.
options(error=recover)
ordebug(openxlsx::write.xlsx)
to track down the precise problem? – Joellejoellen