How to write 1-million-row Excel Files with openxlsx Package in R
Asked Answered
A

1

6

I have users who cannot or do not want to connect with relational databases, but instead prefer to work with data exported to excel files. The recordsets exported from these database data can become rather large. (I also export to CSV files).

My question is related to this one: Handling java.lang.OutOfMemoryError when writing to Excel from R.

As recommended in the accepted anser to this question (or rather the first comment), I now use the Rcpp-based openxlsx package to export some views from the database. It works when the export has ~67000 rows, but it does not work for larger datasets (~1 million rows, ~20 params, all numeric except a few datetimes).

openxlsx::write.xlsx(data, file = "data.2008-2016.xlsx") # 800000 rows

Error: zipping up workbook failed. Please make sure Rtools is installed or a zip application is available to R.
         Try installr::install.rtools() on Windows

(I'm using a Linux PC, and /usr/bin/zip is available to R)

Can I give the openxlsx package more memory? Or set some tuneable options to perform better with large datasets?

For openxlsx, is there something like the options(java.parameters = "-Xmx1000m") for the java-based xlsx package?

The openxlsx vignette does not mention any options. But maybe there are some undocumented ways or options? (e.g. showing a progress bar during saving)

At this point I proceed like this: close all unneeded apps, restart Rstudio, keep few/no large objects around in the global environment, query db, then run write.xlsx(). With a "clean slate" like this, it succeeded in exporting the 800000 row dataset to a 93MB-xlsx-file.

Abdu answered 15/3, 2016 at 12:55 Comment(5)
Excel has row limits... support.office.com/en-gb/article/…Rowen
@cory, thanks for linking to the specifications. I was aware that limitations exist but I thought they were higher; in particular I thought that "Worksheet size" was also "Limited by available memory and system resources" but that's obviously a misconception. (Worksheet size limit is 1,048,576 rows by 16,384 columns...)Abdu
Excel 2007 had a limit of 65,536 rows. Those days were rough...Rowen
can you use options(error=recover) or debug(openxlsx::write.xlsx) to track down the precise problem?Joellejoellen
You can also try Sys.setenv(R_ZIPCMD="/path_to_zip_tool") and see if making that more specific helps. Otherwise take a look at stat.ethz.ch/R-manual/R-devel/library/utils/html/zip.html.Cumber
C
5

Your problem isn't the memory. openxlsx requires installing RTools or similar to save larger excel files.

I had the same problem and same error you're seeing just yesterday. Below is a link for the windows installer:

https://cran.r-project.org/bin/windows/Rtools/index.html

The following site further explains the requirements:

https://www.r-project.org/nosvn/pandoc/openxlsx.html

Cumber answered 22/12, 2016 at 14:56 Comment(3)
This is the correct answer based on your error message. You need a version of zip that is compatible with openxlsx. See also here. r-project.org/nosvn/pandoc/openxlsx.htmlLatinalatinate
I'm on Linux. However RTools is available for Windows only - I did not say so explicity, but implicitly when I wrote "(/usr/bin/zip is available to R)" - so this Answer does not apply to my use case. Upvoted thoughAbdu
Thanks for the upvote nonetheless! Idea 1: Is there any chance it's the zip tool that's failing on memory? Odd that you get the "zipping up workbook failed" message. Idea 2: Given it works with a "clean slate", maybe, your machine doesn't have the available memory needed. If you watch your memory usage, does it increase and cap out when running your R script?Cumber

© 2022 - 2024 — McMap. All rights reserved.