My first guess was that read.xlsx()
is trying to guess the date-looking columns in .xlsx while it reads the file, and weirdly converts the time from %I:%M %p
format into fractions of 24 hours (because e.g. 0.3333333 * 24 = 7.999999
which is exactly 8.0). But latter I noticed that if I change the parameter detectDates
into FALSE
nothing really changes - it outputs the same data frame. So it guess nothing, it just reads the TIME
as it is.
If you try to edit 10:00 PM
within Excel workbook, you'll see that it is really stored as 22:00:00
. So why at the end it is represented as a fraction of 24
?! I don't know, and I hope someone can explain that.
@Randall approach is really good alternative comparing to openxlsx::read.xlsx()
. Note that read_xlsx()
recognizes TIME
as %H:%M:%S
, and converts it into the dummy POSIXct/POSIXt
object, i.e. 1899-12-31 08:00:00
and 1899-12-31 22:00:00
.
Surprisingly, read_xlsx()
doesn't recognize that DATE
has %d-%m-%Y
format, and interpret it as a character
. Meaning that we need to convert both variables into appropriate format in order to obtain desired output.
I don't think we need to use gsub
to get the 12-hour clock time from POSIXct
object, it is much easire to use format
for this purpose. And conversion of DATE
from %d-%m-%Y
into %Y-%m-%d
format is even an easier task:
library(dplyr)
library(readxl)
read_xlsx("myfile.xlsx") |>
mutate(
DATE = as.Date(DATE, "%d/%m/%Y"),
TIME = format(TIME, "%I:%M %p") # “That’s what I do: I drink and I know things.”
)
Which produces:
# A tibble: 2 x 2
DATE TIME
<date> <chr>
1 2015-02-15 08:00 AM
2 2014-01-22 10:00 PM