How to detect TIME when reading from an excel sheet using R
Asked Answered
L

5

10

The issue is that when I read from an excel sheet into R using read.xlsx from openxlsx package, the TIME column is converted into a fraction.

Here is an example,

dfin <-
DATE          TIME
15/02/2015    8:00 AM
22/01/2014    10:00 PM

library(openxlsx)
test <-  read.xlsx("dfin.xlsx", sheet = 1,
                 detectDates=TRUE, skipEmptyRows = TRUE,
                 skipEmptyCols = TRUE, rows = NULL, cols = NULL, check.names = FALSE,
                 namedRegion = NULL, na.strings = "NA", fillMergedCells = FALSE) 

Output:

  DATE        TIME
  2015-02-15  0.3333333
  2014-01-22  0.9166667

I am not sure why it does that and whether there is a way to fix that as I need to use both DATE and TIME to do some calculations.

Lauraine answered 1/2, 2019 at 6:56 Comment(0)
G
8

R doesn't have a time format really, so I suggest reading it in using read_excel, which automatically detects the column type. This will turn it into a date-time format with a random date, which you can then remove, before converting it into a proper timestamp.

library(readxl)
library(lubridate)

test <- read_excel('dfin.xlsx',trim_ws = TRUE) %>%
  #return the TIME column to the way it is written in Excel
  mutate(TIME = as.character(gsub(".* ","",TIME)),
  #format the date column
     DATE = dmy(DATE),
  #turn it into a timestamp
     TIMESTAMP = as.POSIXct(paste(DATE,TIME)))
Gild answered 1/2, 2019 at 7:25 Comment(0)
W
6

My first guess was that read.xlsx() is trying to guess the date-looking columns in .xlsx while it reads the file, and weirdly converts the time from %I:%M %p format into fractions of 24 hours (because e.g. 0.3333333 * 24 = 7.999999 which is exactly 8.0). But latter I noticed that if I change the parameter detectDates into FALSE nothing really changes - it outputs the same data frame. So it guess nothing, it just reads the TIME as it is.

If you try to edit 10:00 PM within Excel workbook, you'll see that it is really stored as 22:00:00. So why at the end it is represented as a fraction of 24?! I don't know, and I hope someone can explain that.

@Randall approach is really good alternative comparing to openxlsx::read.xlsx(). Note that read_xlsx() recognizes TIME as %H:%M:%S, and converts it into the dummy POSIXct/POSIXt object, i.e. 1899-12-31 08:00:00 and 1899-12-31 22:00:00.

Surprisingly, read_xlsx() doesn't recognize that DATE has %d-%m-%Y format, and interpret it as a character. Meaning that we need to convert both variables into appropriate format in order to obtain desired output.

I don't think we need to use gsub to get the 12-hour clock time from POSIXct object, it is much easire to use format for this purpose. And conversion of DATE from %d-%m-%Y into %Y-%m-%d format is even an easier task:

library(dplyr)
library(readxl)

read_xlsx("myfile.xlsx") |>
  mutate(
    DATE = as.Date(DATE, "%d/%m/%Y"), 
    TIME = format(TIME, "%I:%M %p")   # “That’s what I do: I drink and I know things.”
 )

Which produces:

# A tibble: 2 x 2
  DATE       TIME    
  <date>     <chr>   
1 2015-02-15 08:00 AM
2 2014-01-22 10:00 PM
Wivina answered 1/2, 2019 at 16:4 Comment(0)
J
2

I faced the same problem and solved it as follows - quick and dirty:

  • Read the data using readxl:read_excel().

  • Without loss of generality, we just look at obtaining 17:20 from "0.72222222222222" instead of a column containing the time data. Be aware of reading values out of an excel file could have unwanted types, but for calculations we need numerics.

x <- as.numeric("0.72222222222222")*24 

minutes <- round((x %% 1)*60, digits = 0) 
hours <- round(x - minutes/60, digits = 0)
if (minutes < 10){ #if minutes is a single digit need to insert a preceding 0
    minutes= paste0("0",minutes)
}
paste0(hours, ":", minutes)
#17:20
Josiejosler answered 13/9, 2021 at 14:36 Comment(0)
S
1

Using the times() function in the chron package will give the solution:

library(chron)

times(0.111)

[1] 02:39:50

Shiverick answered 9/1, 2023 at 22:47 Comment(0)
S
0
  1. Read the time values as "date":

     test <- read_excel('dfin.xlsx', col_types = c("date"), ...)
    
  2. Then trim it with substr() to get the time:

     test <- substr(test,12,16)
    
Saberio answered 1/5, 2022 at 10:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.