Use R and Openxlsx to output a list of dataframes as worksheets in a single Excel file
Asked Answered
A

3

11

I have a set of CSV files. I want to package them up and export the data to a single Excel file that contains multiple worksheets. I read in the CSV files as a set of data frames.

My problem is how to construct the command in openxlsx, I can do it manually, but I am having a list construction issue. Specifically how to add a data frame as a subcomponent of a named list and then pass as a parameter to write.xlsx()

Example

Ok, so I first list the CSV files on disk and generate a set of data frames in memory...

# Generate a list of csv files on disk and shorten names... 
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, pattern = filePattern, full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)

datasets <- gsub("*.csv$", "", fileList)
datasets <- gsub("sample_", "S", datasets)
datasets

# Now generate the dataframes for each csv file...
list2env(
  lapply(setNames(fileListwithPath, make.names(datasets)),
         read.csv), envir = .GlobalEnv)

Example Output:

dput(datasets)
c("S10000_R3.3.2_201703301839", "S10000_T4.3.0_201703301843", 
"S20000_R3.3.2_201703301826", "S20000_T4.3.0_201703301832", "S280000_R3.3.2_201704020847", 
"S280000_T4.3.0_201704021100", "S290000_R3.3.2_201704020447", 
"S290000_T4.3.0_201704020702", "S30000_R3.3.2_201703301803", 
"S30000_T4.3.0_201703301817", "S310000_R3.3.2_201704012331", 
"S310000_T4.3.0_201704020242", "S320000_R3.3.2_201704011827", 
"S320000_T4.3.0_201704012128", "S330000_R3.3.2_201704011304", 
"S330000_T4.3.0_201704011546", "S340000_R3.3.2_201704010652", 
"S340000_T4.3.0_201704011010", "S350000_R3.3.2_201704010020", 
"S350000_T4.3.0_201704010404", "S360000_R3.3.2_201703311819", 
"S360000_T4.3.0_201703312134", "S370000_R3.3.2_201703310914", 
"S370000_T4.3.0_201703311301", "S380000_R3.3.2_201703310134", 
"S380000_T4.3.0_201703310509", "S390000_R3.3.2_201703301846", 
"S390000_T4.3.0_201703302252", "S40000_R3.3.2_201703301738", 
"S40000_T4.3.0_201703301752", "S50000_R3.3.2_201703301707", "S50000_T4.3.0_201703301724", 
"S60000_R3.3.2_201703301624", "S60000_T4.3.0_201703301647", "S70000_R3.3.2_201703301535", 
"S70000_T4.3.0_201703301602", "S80000_R3.3.2_201703301430", "S80000_T4.3.0_201703301508", 
"S90000_R3.3.2_201703301324", "S90000_T4.3.0_201703301400")

Now we have a set of data frames and we wish to create a single excel file with multiple worksheets...

wb <- createWorkbook()
saveWorkbook(wb, 'output.xlsx')

lapply(names(myList), function(x) write.xlsx(myList[[x]], 'output.xlsx', sheetName=x, append=TRUE))

Problem:

The problem is I can create the list structure manually and can confirm it works BUT I cannot seem to construct the list automatically.

myList <- sapply(datasets,function(x) NULL)
names(myList)
str(myList)
myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)

thus:

> str(myList)
List of 40
 $ S10000_R3.3.2_201703301839 :'data.frame':    43 obs. of  4 variables:
  ..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
  ..$ user     : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
  ..$ system   : num [1:43] 0.63 0.065 0.001 0.004 0 ...
  ..$ elapsed  : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
 $ S10000_T4.3.0_201703301843 : NULL
 $ S20000_R3.3.2_201703301826 : NULL
 ...

Specific Problem: How to append each data frame to the list...

myList <- lapply( myList, function(x) eval(x) )

what am I doing wrong with lapply here? The above lapply() does not iterate through the list and append the data frame to the name list entry.

i.e. myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
> str(myList)
    List of 40
     $ S10000_R3.3.2_201703301839 :'data.frame':    43 obs. of  4 variables:
      ..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
      ..$ user     : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
      ..$ system   : num [1:43] 0.63 0.065 0.001 0.004 0 ...
      ..$ elapsed  : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
     $ S10000_T4.3.0_201703301843 : NULL
     $ S20000_R3.3.2_201703301826 : NULL
     ...

What am I missing? All help gratefully appreciated. Yes, I am pretty certain I am missing something obvious... but... I am stumped.

Althaalthea answered 3/4, 2017 at 3:54 Comment(0)
P
5

I don't have your data frames, so I can't test this, but the code below is similar to the approach I use when I need to read and write Excel files. The code below uses the xlsx package, as that's what I'm familiar with, but hopefully you can adapt it if you need to use openxlsx.

library(xlsx)

First, read the files into a list. Something like this:

filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, 
                              pattern = filePattern, 
                              full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
fileListwithPath = setNames( fileListwithPath, 
                             list.files(path = filePath, pattern = filePattern))
df.list = lapply(fileListwithPath, read.csv)

# Now we rename the List Names for use in worksheets...
# Remove .csv and sample_ prefix used in filenames...
# Reult in workbook S<size>_<R version>_<date>
names(df.list) <- gsub("\\.csv$","", names(df.list))
names(df.list) <- gsub("sample_","S", names(df.list))

You now have a list in which each element is a data frame and each element's name is the name of the file. Now, let's write each data frame to a different worksheet in the same Excel workbook and then save the file as an xlsx file:

wb = createWorkbook()

lapply( names(df.list), 
        function(df) {
          sheet = createSheet(wb, df)
          addDataFrame(df.list[[df]], sheet = sheet, row.names = FALSE)
          } )

saveWorkbook(wb, "My_workbook.xlsx")

I've separated reading and writing the csv files for illustration, but you can combine them into a single function that reads each individual csv file and writes it to a new sheet in a single Excel workbook.

Paleoasiatic answered 3/4, 2017 at 4:29 Comment(4)
Thank you. The code you supplied worked - much appreciated. I need to spend a little time familiarising myself with list manipulation. If I fix my original code, I'll post an explanation. You have allowed me to move forward - your help is very much appreciated. Good karma and gratitude flying your way.Althaalthea
I am interested to see how I might change the name of the target worksheet from df.list$sample_10000_R3.3.2_201703301839.csv to df.list$S10000_R3.3.2_201703301839 - simple I know but...Althaalthea
See update. I've added a line to remove the ".csv" at the end of each file name.Paleoasiatic
eipi10 - Thanks, I updated your example to reflect final code. I have to change the df.list names after reading in the files with the lapply(). Your help is very much appreciated :-)Althaalthea
C
3

Here's the solution with openxlsx:

## create data;
dataframes <- split(iris, iris$Species)

# create workbook
wb <- createWorkbook()
 
#Iterate the same way as PavoDive, slightly different (creating an anonymous function inside Map())
Map(function(data, nameofsheet){     

    addWorksheet(wb, nameofsheet)
    writeData(wb, nameofsheet, data)
 
}, dataframes, names(dataframes))
         
## Save workbook to excel file 
saveWorkbook(wb, file = "file.xlsx", overwrite = TRUE)

.. however, openxlsx is also able to use it's function openxlsx::write.xlsx for this, so you can just give the object with your list of dataframes and the filepath, and openxlsx is smart enough to create the list as sheets within the xlsx-file. The code I post here with Map() is if you want to format the sheets in a specific way.

Cellulous answered 4/9, 2018 at 8:57 Comment(0)
B
2

I think it may be worth adding solution using imap function from the purrr package as it offers a convenient mechanism to access name and index of the list element in one call:

imap_xxx(x, ...), an indexed map, is short hand for map2(x, names(x), ...) if x has names, or map2(x, seq_along(x), ...) if it does not. This is useful if you need to compute on both the value and the position of an element.

imap solution

On dummy data for reproducibility.

lst_data <- list(cars = mtcars, air = airmiles)
wb <- openxlsx::createWorkbook()
purrr::imap(
    .x = lst_data,
    .f = function(df, object_name) {
        openxlsx::addWorksheet(wb = wb, sheetName = object_name)
        openxlsx::writeData(wb = wb, sheet = object_name, x = df)
    }
)
t_file <- tempfile(pattern = "test_df_export", fileext = ".xlsx")
saveWorkbook(wb = wb, file = t_file)
Bradbury answered 9/1, 2020 at 14:45 Comment(1)
you can use iwalk to avoid any message on consoleGraduation

© 2022 - 2024 — McMap. All rights reserved.