Read all worksheets in an Excel workbook into an R list with data.frames
Asked Answered
T

12

112

I understand that XLConnect can be used to read an Excel worksheet into R. For example, this would read the first worksheet in a workbook called test.xls into R.

library(XLConnect)
readWorksheetFromFile('test.xls', sheet = 1)

I have an Excel Workbook with multiple worksheets.

How can all worksheets in a workbook be imported into a list in R where each element of the list is a data.frame for a given sheet, and where the name of each element corresponds to the name of the worksheet in Excel?

Thinkable answered 18/10, 2012 at 1:6 Comment(2)
apart from xlconnect and readxl, package xlsx permits to manipulate excel files in R (all sheets or just some)Keynesianism
Very good tutorial: dominicroye.github.io/en/2019/import-excel-sheets-with-rShoffner
T
160

Updated answer using readxl (22nd June 2015)

Since posting this question the readxl package has been released. It supports both xls and xlsx format. Importantly, in contrast to other excel import packages, it works on Windows, Mac, and Linux without requiring installation of additional software.

So a function for importing all sheets in an Excel workbook would be:

library(readxl)    
read_excel_allsheets <- function(filename, tibble = FALSE) {
    # I prefer straight data.frames
    # but if you like tidyverse tibbles (the default with read_excel)
    # then just pass tibble = TRUE
    sheets <- readxl::excel_sheets(filename)
    x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
    if(!tibble) x <- lapply(x, as.data.frame)
    names(x) <- sheets
    x
}

This could be called with:

mysheets <- read_excel_allsheets("foo.xls")

Old Answer

Building on the answer provided by @mnel, here is a simple function that takes an Excel file as an argument and returns each sheet as a data.frame in a named list.

library(XLConnect)

importWorksheets <- function(filename) {
    # filename: name of Excel file
    workbook <- loadWorkbook(filename)
    sheet_names <- getSheets(workbook)
    names(sheet_names) <- sheet_names
    sheet_list <- lapply(sheet_names, function(.sheet){
        readWorksheet(object=workbook, .sheet)})
}

Thus, it could be called with:

importWorksheets('test.xls')
Thinkable answered 18/10, 2012 at 1:29 Comment(11)
excellent, just learning that readxl has an excel_sheets method. wonderful.Tindle
What's the best way to add a filepath into this function?Besant
@user7071759 if I understand you, then you should just be able to include path in filename. E.g., read_excel_allsheets("my/path/to/file/example.xls")Thinkable
Thanks! Your solution worked better than the others listed in this page.Chela
It seems that this new answer gives me a a list within a list, and not a data frame within a list.Rhona
@Erosennin I'm not sure; it doesn't do that for me. I've added some code that removes the tibble class by default. Perhaps the tibble stuff was causing surprising results.Thinkable
I am getting lists and not dataframes as well.Kreager
A simpler version of this function would be lapply(excel_sheets(file.path), function(x) read_excel(file.path, x)). If you prefer the non-tibble simply wrap read_excel in as.data.frame(). Names would have to be assigned after.Midis
The tidyverse solution is more elegant linkPseudoscope
hi i tried to use your code but when using xlconnect i get error :> library("XLConnect") Loading required package: XLConnectJars Error: package or namespace load failed for ‘XLConnectJars’: .onLoad failed in loadNamespace() for 'rJava', details: call: fun(libname, pkgname) error: JAVA_HOME cannot be determined from the Registry Error: package ‘XLConnectJars’ could not be loadedMonocoque
I have made use of the function here on several occasions. It however has caused some issues when rbinding the resulting sheets as the readxl package seems to insert random blank rows which causes a bit of a headache. I altered your function to use the xlsx package which does not have this issue. read_xlsx_allsheets <- function(filename) { sheets <- openxlsx::getSheetNames(filename) x <- lapply(sheets, function(X) openxlsx::read.xlsx(filename, sheet = X)) names(x) <- sheets x } Gallager
T
46

Note that most of XLConnect's functions are already vectorized. This means that you can read in all worksheets with one function call without having to do explicit vectorization:

require(XLConnect)
wb <- loadWorkbook(system.file("demoFiles/mtcars.xlsx", package = "XLConnect"))
lst = readWorksheet(wb, sheet = getSheets(wb))

With XLConnect 0.2-0 lst will already be a named list.

Turd answered 18/10, 2012 at 6:31 Comment(5)
For me, this works require(XLConnect) wb <- loadWorkbook("excel.xlsx") lst = readWorksheet(wb, sheet = getSheets(wb))Among
Me too. The solution answer did not work for me, could not locate the file even though it existsSaba
The call with system.file() did Not work for me too.Redd
instead using loadWorkbook(system.file("demoFiles/mtcars.xlsx", package = "XLConnect")), directly use loadWorkbook("demoFiles/mtcars.xlsx") to load workbook. It works.Caracas
This is the best answer for a password protected xlsxMancunian
I
42

I stumbled across this old question and I think the easiest approach is still missing.

You can use rio to import all excel sheets with just one line of code.

library(rio)
data_list <- import_list("test.xls")

If you're a fan of the tidyverse, you can easily import them as tibbles by adding the setclass argument to the function call.

data_list <- import_list("test.xls", setclass = "tbl")

Suppose they have the same format, you could easily row bind them by setting the rbind argument to TRUE.

data_list <- import_list("test.xls", setclass = "tbl", rbind = TRUE)
Iiette answered 23/3, 2018 at 14:25 Comment(1)
rio::import_list is a great option. It can also import a subset of the sheets e.g. import_list("test.xls", which = c(1, 2)) which can be really useful. I agree it's the easiest option here.Crematorium
S
42

From official readxl (tidyverse) documentation (changing first line):

library(readxl)
library(tidyverse)

path <- "data/datasets.xlsx"

path %>% 
  excel_sheets() %>% 
  set_names() %>% 
  map(read_excel, path = path)

Details at: http://readxl.tidyverse.org/articles/articles/readxl-workflows.html#iterate-over-multiple-worksheets-in-a-workbook

Siblee answered 23/3, 2018 at 19:58 Comment(2)
To keep sheet names, df <- path %>% excel_sheets() %>% set_names() %>% map_dfr(read_excel, path = path, .id = "SheetName")Shoffner
excel_sheets() does not existNoli
P
19

Since this is the number one hit to the question: Read multi sheet excel to list:

here is the openxlsx solution:

filename <-"myFilePath"

sheets <- openxlsx::getSheetNames(filename)
SheetList <- lapply(sheets,openxlsx::read.xlsx,xlsxFile=filename)
names(SheetList) <- sheets
Poisoning answered 18/1, 2018 at 9:50 Comment(0)
F
11

Adding to Paul's answer. The sheets can also be concatenated using something like this:

data = path %>% 
excel_sheets() %>% 
set_names() %>% 
map_df(~ read_excel(path = path, sheet = .x), .id = "Sheet")

Libraries needed:

if(!require(pacman))install.packages("pacman")
pacman::p_load("tidyverse","readxl","purrr")
Frontward answered 30/7, 2019 at 15:28 Comment(2)
Good detail adding a new column using the sheetname!Stanwood
I agree. A very good side effect of having a new column based on the sheetname.Folsom
Q
8

You can load the work book and then use lapply, getSheets and readWorksheet and do something like this.

wb.mtcars <- loadWorkbook(system.file("demoFiles/mtcars.xlsx", 
                          package = "XLConnect"))
sheet_names <- getSheets(wb.mtcars)
names(sheet_names) <- sheet_names

sheet_list <- lapply(sheet_names, function(.sheet){
    readWorksheet(object=wb.mtcars, .sheet)})
Quiberon answered 18/10, 2012 at 1:14 Comment(1)
Nice one Martin!!Galagalactagogue
L
6

To read multiple sheets from a workbook, use readxl package as follows:

library(readxl)
library(dplyr)

final_dataFrame <- bind_rows(path_to_workbook %>%
                              excel_sheets() %>%
                              set_names() %>%
                              map(read_excel, path = path_to_workbook))

Here, bind_rows (dplyr) will put all data rows from all sheets into one data frame, and path_to_workbook is the location of your data: "dir/of/the/data/workbook".

Lobotomy answered 25/5, 2018 at 19:57 Comment(0)
D
4

Just for simplifying the very useful response of @Jeromy Anglim:

allsheets <- sapply(readxl::excel_sheets("your_file.xlsx"), simplify = F, USE.NAMES = T,
            function(X) readxl::read_excel("your_file.xlsx", sheet = X))
Durham answered 29/10, 2021 at 8:2 Comment(0)
C
3

excel.link will do the job.

I actually found it easier to use compared to XLConnect (not that either package is that difficult to use). Learning curve for both was about 5 minutes.

As an aside, you can easily find all R packages that mention the word "Excel" by browsing to http://cran.r-project.org/web/packages/available_packages_by_name.html

Caracal answered 2/4, 2014 at 21:36 Comment(0)
P
1

I tried the above and had issues with the amount of data that my 20MB Excel I needed to convert consisted of; therefore the above did not work for me.

After more research I stumbled upon openxlsx and this one finally did the trick (and fast) Importing a big xlsx file into R?

https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf

Petaliferous answered 23/1, 2017 at 19:29 Comment(0)
A
0

Here is another approach based on the RDCOMClient :

library(RDCOMClient)

dir_Path <- "D:\\Dropbox\\Reponses_Stackoverflow\\stackoverflow_294\\"
excel_File <- "test_File.xlsx"
path_Excel_File <- paste0(dir_Path, excel_File)

xlApp <- COMCreate("Excel.Application")
xlApp[["Visible"]] <- TRUE
xlWbk <- xlApp$Workbooks()$Open(path_Excel_File)
Sheets <- xlWbk$Sheets() 
nb_Sheets <- Sheets$count() 

list_DF_By_Sheet <- list()

for(l in 1 : nb_Sheets)
{
  obj_Range <- Sheets[[l]]$Range("A:B")
  val <- obj_Range$value()
  nb_Val <- length(val)
  list_Vector <- list()
  
  for(i in 1 : nb_Val)
  {
    list_Vector[[i]] <- unlist(val[[i]])
  }
  
  df_Sheet <- do.call("cbind", list_Vector)
  list_DF_By_Sheet[[l]] <- df_Sheet
}

list_DF_By_Sheet
Arcade answered 7/4, 2023 at 21:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.