Read excel file with formulas in cells into R
Asked Answered
A

1

5

I was trying to read an excel spreadsheet into R data frame. However, some of the columns have formulas or are linked to other external spreadsheets. Whenever I read the spreadsheet into R, there are always many cells becomes NA. Is there a good way to fix this problem so that I can get the original value of those cells?

The R script I used to do the import is like the following:

options(java.parameters = "-Xmx8g")
library(XLConnect)
# Step 1 import the "raw" tab
path_cost = "..."
wb = loadWorkbook(...)
raw = readWorksheet(wb, sheet = '...', header = TRUE, useCachedValues = FALSE)
Anglonorman answered 14/8, 2015 at 16:29 Comment(5)
Not necessarily what you want but you could in Excel use copy then paste as values.Audette
Problem you say? it depends on what you are trying to achieve, which is nor clear to my point of view.Tinkling
@Tinkling I want to use the values in those cells whose values are available in the original excel file but now become NA (not available)Anglonorman
@Anglonorman ok, please do update your question with these details in order to allow user to better help you and next time try to provide all details in advance and do no wait that user ask for them.Tinkling
openexls is very fast try this if you are using windows #46914803Thaliathalidomide
K
11

UPDATE: read_excel from the readxl package looks like a better solution. It's very fast (0.14 sec in the 1400 x 6 file I mentioned in the comments) and it evaluates formulas before import. It doesn't use java, so no need to set any java options.

 # sheet can be a string (name of sheet) or integer (position of sheet)
 raw = read_excel(file, sheet=sheet)

For more information and examples, see the short vignette.

ORIGINAL ANSWER: Try read.xlsx from the xlsx package. The help file implies that by default it evaluates formulas before importing (see the keepFormulas parameter). I checked this on a small test file and it worked for me. Formula results were imported correctly, including formulas that depend on other sheets in the same workbook and formulas that depend on other workbooks in the same directory.

One caveat: If an externally linked sheet has changed since the last time you updated the links on the file you're reading into R, then any values read into R that depend on external links will be the old values, not the latest ones.

The code in your case would be:

library(xlsx)

options(java.parameters = "-Xmx8g") # xlsx also uses java

# Replace file and sheetName with appropriate values for your file
# keepFormulas=FALSE and header=TRUE are the defaults. I added them only for illustration.
raw = read.xlsx(file, sheetName=sheetName, header=TRUE, keepFormulas=FALSE)
Karinkarina answered 14/8, 2015 at 16:51 Comment(7)
I tried read.xlsx following your code, it's very slow, after nearly 10mins the code is still running. There are 12378 rows in the spread sheet, should that be a problem?Anglonorman
I'm surprised it's that slow with a sheet of that size (do you have lots of columns). Try read.xlsx2. It's faster than read.xlsx. See the read.xlsx help for differences between read.xlsx and read.xlsx2.Karinkarina
I just timed the two functions on a test file with about 1400 rows and 6 columns. Median read times: read.xlsx=5.9 sec; read.xlsx2=0.13 sec.Karinkarina
Correction: read.xlsx2 results in NA for cells with formluas, so that won't work. You'll need to stick with read.xlsx.Karinkarina
Thanks a lot for your help! Yes, read.xlsx2 do generates NA for cells with formulas. I have 40 columns, which I believe is also not a big number.Anglonorman
I just find that the read.xls() function in gdata package is a good solution!Anglonorman
I tried read_excel(), it is indeed faster than read.xls() from gdata. Thanks a lot for your help!Anglonorman

© 2022 - 2024 — McMap. All rights reserved.