Using R to read out excel-colorinfo
Asked Answered
I

2

20

Is there any way to read out the color-index of cells from excel files with R?

While I can set the cell color with packages like XLConnect or XLSX, I have found no way to extract the color-information from existing workbooks.

Inanna answered 23/3, 2017 at 16:54 Comment(2)
https://mcmap.net/q/663579/-using-r-to-filter-keep-colored-cells-in-excel-fileSheerlegs
a related readxl version of the question: #53210186Emmerie
S
24

R-Bloggers provided a function that will do the job for you. I am including the answer here for future reference.

Read the excel file using xlsx package:

library(xlsx)
wb     <- loadWorkbook("test.xlsx")
sheet1 <- getSheets(wb)[[1]]

# get all rows
rows  <- getRows(sheet1)
cells <- getCells(rows)

This part extracts the information that later will be used for getting background color (or other style information) of the cells:

styles <- sapply(cells, getCellStyle) #This will get the styles

This is the function that identifies/extracts the cell background color:

cellColor <- function(style) 
   {
    fg  <- style$getFillForegroundXSSFColor()
    rgb <- tryCatch(fg$getRgb(), error = function(e) NULL)
    rgb <- paste(rgb, collapse = "")
    return(rgb)
   }

error will handle the cells with no background color.

Using sapply you can get the background color for all of the cells:

sapply(styles, cellColor)

You can also categorize/identify them by knowing the RGb codes:

mycolor <- list(green = "00ff00", red = "ff0000")
m     <- match(sapply(styles, cellColor), mycolor)
labs  <-names(mycolor)[m]

You can read more and learn how to apply it at R-bloggers

You can get the RGB codes from RapidTables.com

Sheerlegs answered 22/4, 2017 at 19:52 Comment(4)
In case anyone is interested in filtering values out of a dataset based on the vector of colors from @M--'s helpful answer, see: https://mcmap.net/q/663579/-using-r-to-filter-keep-colored-cells-in-excel-filePapism
I can't get this to work for all xlsx files. It works for the test file linked in the R-bloggers article, but if I save that from LibreOffice as .ods, then load that, then save that as .xlsx from LibreOffice, the spreadsheet looks identical but the colours are all empty strings. This also happens with a spreadsheet sourced from someone who is probably only using Excel, so its not just a LibreOffice problem. Can supply my test file if anyone wants to work on this.Acetify
Is there a way to get the font color rather than the fill color?Coauthor
@Coauthor I am sure there's an argument within getCellStyle for that. You need to dig through the documentation.Sheerlegs
R
3

Old question but maybe it can help someone in the future.

There is a strange behavior in the POI (java) library (at least on my computer). It is not getting the colors correctly. The code provided in the @M--'s answer works well when the color is a basic color (indexed color), but does not work when the color is, for example, in grayscale. To get around you can use the following code using the getTint () function. Tint is a number between -1 (dark) and 1 (light), and combining it with the RGB (getRgb ()) function, you can completely recover the color.

cell_color <- function(style){
  fg  <- style$getFillForegroundXSSFColor()

  hex <- tryCatch(fg$getRgb(), error = function(e) NULL)
  hex <- paste0("#", paste(hex, collapse = ""))
  tint <- tryCatch(fg$getTint(), error = function(e) NULL)

  if(!is.null(tint) & !is.null(hex)){   # Tint varies between -1 (dark) and 1 (light)
    rgb_col <- col2rgb(col = hex)

    if(tint < 0) rgb_col <- (1-abs(tint))*rgb_col
    if(tint > 0) rgb_col <- rgb_col + (255-rgb_col)*tint

    hex <- rgb(red = rgb_col[1, 1], 
               green = rgb_col[2, 1], 
               blue = rgb_col[3, 1], 
               maxColorValue = 255)
  }

  return(hex)
}

Some references to help:

https://poi.apache.org/apidocs/dev/org/apache/poi/hssf/usermodel/HSSFExtendedColor.html#getTint--

https://bz.apache.org/bugzilla/show_bug.cgi?id=50787

Getting Excel fill colors using Apache POI

Retaretable answered 1/10, 2019 at 18:45 Comment(1)
According to Excel : R: 83, G: 141, B: 213 converts to: #538DD5... however your function returns #F4F3EC. Please advise. ThanksTriumvir

© 2022 - 2024 — McMap. All rights reserved.