How to open an .xlsb file in R?

Asked 23/2, 2015 at 21:52 Answered 24/4 at 11:17

I'm trying to open an .xlsb file in R and keep getting similar errors.

Any recommendations on how to solve this issue without having to download the data and save it in a different formate?

read.xlsx(paste(OutputLocation,FileName, sep=""), sheetName = "Master Data")

Error messages: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : org.apache.poi.xssf.XLSBUnsupportedException: .XLSB Binary Workbooks are not supported

rigDataWB<-loadWorkbook(paste(OutputLocation,FileName, sep=""))

Error messages: Error: XLSBUnsupportedException (Java): .XLSB Binary Workbooks are not supported

Please note:

I cannot install Perl libraries.

I'm running 64bit R.

Reference: http://www.milanor.net/blog/?p=779

My data is from: http://phx.corporate-ir.net/phoenix.zhtml?c=79687&p=irol-reportsother

Shemikashemite answered 23/2, 2015 at 21:52 Comment(0)

Use the RODBC package:

library(RODBC)
wb <- "D:\\Data\\Masked Data.xlsb" # Give the file name
con2 <- odbcConnectExcel2007(wb)
data <- sqlFetch(con2, "Sheet1$") # Provide name of sheet
nrow(data)

Theatricals answered 9/8, 2016 at 11:32 Comment(0)

One way could be to use ODBC:

require(RODBC)
if (any(grepl("*.xlsb", odbcDataSources(), fixed = TRUE))) {
  download.file(url = "http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NTcwMjI1fENoaWxkSUQ9MjcxMjIxfFR5cGU9MQ==&t=1", 
                destfile = file.path(tempdir(), "test.xlsb"), 
                mode = "wb")
  conn <- odbcConnectExcel2007( file.path(tempdir(), "test.xlsb")) 
  df <- sqlFetch(conn, sub("'(.*)\\$'", "\\1", sqlTables(conn)$TABLE_NAME)[4]) # read 4th sheet in the table name list
  head(df, 10)
  #                                             F1          F2         F3       F4        F5 F6
  # 1                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 2                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 3                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 4                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 5  Baker Hughes Gulf of Mexico Oil / Gas Split        <NA>       <NA>     <NA>      <NA> NA
  # 6                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 7                                         <NA> US Offshore Total\nGoM Gas\nGoM Oil \nGoM NA
  # 8                                       1/7/00         127        123      116         7 NA
  # 9                                      1/14/00         125        121      116         5 NA
  # 10                                     1/21/00         125        121      116         5 NA
  close(conn) 
}

Kristankriste answered 23/2, 2015 at 22:54 Comment(3)

Just a quick note, if you're getting an error, update your Windows drivers here and the solution works. – Shorthorn 3/11, 2016 at 16:36

Hi. Is this answer obsolete now? There doesn't seem to be a odbcConnectExcel2007 in RODBC. Or is this Windows only? – Tremble 15/9, 2021 at 11:56

I'm using windows and odbcConnectExcel2007 does exist. – Grozny 8/2, 2023 at 20:21

readxlsb package can read Excel binary (.xlsb) files into R. Here are some info taken from the package vignettes:

read_xlsb(path, sheet, range, col_names, col_types, na, trim_ws, skip, ...)

sheet:

Either a name, or the index of the sheet to read. Index of the first sheet is 1. If the sheet name is embedded in the range argument, or implied if range is a named range, then this argument is ignored

range:

range can be specified as

A named range. Named ranges are not case sensitive

In Sheet!A1 notation

In Sheet!R1C1 notation

As a cellranger::cell_limits object

col_names

TRUE: The first row is used for column names. Empty cells result in a column name of the form ‘column.i’

FALSE: Column names will be ‘column.i’

Character vector: vector containing column names.

col_types

Can be implied from the spreadsheet or specified in advanced. When specifying types, options are

“logical” (or “boolean”), “numeric” (or “double”), “integer”, “date” and “string” (or “character”)

Use “skip” (or “ignore”) to skip a column

na

A character string that is interpret as NA. This does not effect the implied data type for a column.

trim_ws

Should leading and trailing whitespaces be trimmed from character strings?

skip

The number of rows to skip before reading data.

library(readxlsb)

res = read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"), 
                range = "PORTFOLIO", 
                debug = TRUE)

ls(res$env)
#> [1] "content"      "named_ranges" "sheets"       "stream"

res$env$named_ranges
#>             name                     range sheet_idx first_column first_row
#> 1   INFO_RELEASE          FirstSheet!$A$11         0            1        11
#> 2        OUTLOOK 'My SecondTab'!$A$1:$C$13         1            1         1
#> 3      PORTFOLIO      FirstSheet!$A$3:$C$9         0            1         3
#> 4 SAVED_DATETIME          FirstSheet!$C$13         0            3        13
#> 5          TITLE           FirstSheet!$A$1         0            1         1
#>   last_column last_row
#> 1           1       11
#> 2           3       13
#> 3           3        9
#> 4           3       13
#> 5           1        1

^{Created on 2020-07-07 by the reprex package (v0.3.0)}

Grunion answered 7/7, 2020 at 20:0 Comment(5)

This package is eye-wateringly slow for XLSB files of any meaningful size, and often does not complete successfully. Not that there is a real packaged alternative, but users should be mindful of this when they write code that depends on readxlsb. – Bantu 23/8, 2020 at 17:29

This package doesn't work and freezes the R Studio. – Rosado 25/8, 2020 at 3:16

This is the only answer that worked for me on linux. Very slow for 100k records but got there. – Tremble 15/9, 2021 at 12:6

This package has very limited features, for instance I cant skip NA or Blank cells while reading. – Blasted 6/10, 2021 at 22:42

@Duffer: you can try Python packages (pyxlsb or pyxlsb2 via reticulate) to work around some of the limitations of the readxlsb package. I haven't tested them though. – Grunion 9/10, 2021 at 21:43

If you get the following error in R trying to connect to .xlsb:

[RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified

then, you are probably are missing to install the AccessDatabaseEngine_X64.exe from Microsoft. I had this problem today, and after installing this file I've got no more error messages.

Leakage answered 22/5, 2019 at 11:31 Comment(0)

try

xl.read.file(file_path, h=T, top.left.cell = "A1") from library(excel.link)

Shutter answered 21/12, 2020 at 17:28 Comment(1)

Could you add an explanation on how your code works? – Biquadrate 21/12, 2020 at 17:59

There is an R package to read xlsb

install.packages('readxlsb')
library(readxlsb)
read_xlsb(filename, sheet = "Master Data")

Conlee answered 24/4 at 11:17 Comment(0)

Recommended topics

Hot tags