I've imported a CSV file to R using RStudio where I am trying to plot points per game against minutes per game. However the minutes per game is in the format mm: ss and I'm having a hard time finding how to convert it to decimal form.
Given that you start with a character vector, this is relatively easy :
minPerGame <- c("4:30","2:20","34:10")
sapply(strsplit(minPerGame,":"),
function(x) {
x <- as.numeric(x)
x[1]+x[2]/60
}
)
gives
[1] 4.500000 2.333333 34.166667
Make sure you checked that you used read.csv()
with the option as.is=TRUE
. Otherwise you'll have to convert using as.character()
.
Do you need to decimalise it? If you store the data in the correct format, for example as an object of class POSIXlt
, one of R's date-time classes, R will handle the correct handling of the times in numeric fashion. Here is an example of what I mean:
First we create some dummy data for illustration purposes:
set.seed(1)
DF <- data.frame(Times = seq(as.POSIXlt("10:00", format = "%M:%S"),
length = 100, by = 10),
Points = cumsum(rpois(100, lambda = 1)))
head(DF)
Ignore the fact that there are dates here, it is effectively ignored when we do the plot as all observations have the same date part. Next we plot this using R's formula interface:
plot(Points ~ Times, data = DF, type = "o")
Which produces this:
POSIXt
classes take 31/12/1969 23:59:59 as zero, but add the current date when converting. So a naive mean(as.numeric(Times))
will give a wrong result today, and a different wrong result tomorrow... –
Fluoridate Some tuning of first solution:
minPerGame <- paste(sample(1:89,100000,T),sample(0:59,100000,T),sep=":")
f1 <- function(){
sapply(strsplit(minPerGame,":"),
function(x) {
x <- as.numeric(x)
x[1]+x[2]/60
}
)
}
#
f2<- function(){
w <- matrix(c(1,1/60),ncol=1)
as.vector(matrix(as.numeric(unlist(strsplit(minPerGame,":"))),ncol=2,byrow=TRUE)%*%w)
}
system.time(f1())
system.time(f2())
system.time(f1()) user system elapsed 0.88 0.00 0.86
system.time(f2()) user system elapsed 0.25 0.00 0.27
I had data with times like so:
- 22:49:20+1100
- 19:29:11+1000
- 20:01:26+0930
And this seemed to work for me:
my_df <- my_df %>%
separate(col = eventTime, into = c("H", "M", "S"), sep = "\\:", remove = FALSE) %>%
separate(col = S, into = c("S", "Z"), sep = "\\+", remove = TRUE) %>%
separate(col = Z, into = c("ZH", "ZM"), sep = 2, remove = TRUE) %>%
mutate(H = as.numeric(H)/24) %>%
mutate(M = as.numeric(M)/24/60) %>%
mutate(S = as.numeric(S)/24/60/60) %>%
mutate(ZH = as.numeric(ZH)/24) %>%
mutate(ZM = as.numeric(ZM)/24/60) %>%
mutate(H = H-ZH) %>%
mutate(M = M-ZM) %>%
mutate(time_num = H+M+S)
H:hours, M:minutes, S:seconds, Z:zone, ZH:zone hours, ZM:zone minutes
If you don't care about the timezones then this:
my_df <- my_df %>%
separate(col = eventTime, into = c("H", "M", "S"), sep = "\\:", remove = FALSE) %>%
separate(col = S, into = c("S", "Z"), sep = "\\+", remove = TRUE) %>%
mutate(H = as.numeric(H)/24) %>%
mutate(M = as.numeric(M)/24/60) %>%
mutate(S = as.numeric(S)/24/60/60) %>%
mutate(time_num = H+M+S)
The first method you may end up with negatives. The second method you should get values between 0 and 1 with the time_num being the portion of the day.
For example:
22:49:20+1100 = 0.950925926
07:26:10+1100 = 0.309837963
It should be noted my time data was all from a timezone with a positive +
I like lubridate for this. The same logic could be used for hours+minutes, as well, by adjusting to use hm
in place of ms
, etc.
minPerGame <- c("4:30","2:20","34:10")
library(lubridate)
minPerGame_ms <- ms(minPerGame)
(minPerGame_dec = minute(minPerGame_ms) + second(minPerGame_ms)/60)
[1] 4.500000 2.333333 34.166667
© 2022 - 2025 — McMap. All rights reserved.