I am trying to convert a irregular time series of a data table into a regular time series. My data looks like this
library(data.table)
dtRes <- data.table(time = c(0.1, 0.8, 1, 2.3, 2.4, 4.8, 4.9),
abst = c(1, 1, 1, 0, 0, 3, 3),
farbe = as.factor(c("keine", "keine", "keine", "keine", "keine", "rot", "blau")),
gier = c(2.5, 2.5, 2.5, 0, 0, 3, 3),
goff = as.factor(c("haus", "maus", "toll", "maus", NA, "maus", "maus")),
huft = as.factor(c(NA, NA, NA, "wolle", "wolle", "holz", "holz")),
mode = c(4, 4, 4, 2.5, NA, 3, 3))
How is it possible to aggregate the observations in chunks by taking a chunk size of like 1 second? (with a variable number of rows - even 0 if there are no rows within a 1 second period) The result should be the mean for the numeric columns (NAs omitted) and for the factors a whole duplicated row if there is more than 1 unique value. If this is not possible for factors or doesn't make sense to you, it is also fine to just take the first value of the specific second in the factor column. This way it would be real regular time series without any duplicated times. If there is no value for an interval (like in the example for the 2nd second), the result is NA.
In the end the result can look like this (depends on duplicated rows or not):
with duplicates:
wiDups <- data.table(time = c(1, 1, 2, 3, 4, 5, 5),
abst = c(1, 1, NA, 1, NA, 5, 5),
farbe = as.factor(c("keine", "keine", NA, "keine", NA, "rot", "blau")),
gier = c(2.5, 2.5, NA, 0, NA, 4.5, 4.5),
goff = as.factor(c("haus", "maus", NA, "maus", NA, "maus", "maus")),
huft = as.factor(c(NA, NA, NA, "wolle", NA, "holz", "holz")),
mode = c(5, 5, NA, 2.5, NA, 4, 4))
and without duplicates:
noDups <- data.table(time = c(1, 2, 3, 4, 5),
abst = c(1, NA, 1, NA, 5),
farbe = as.factor(c("keine", NA, "keine", NA, "rot")),
gier = c(2.5, NA, 0, NA, 4.5),
goff = as.factor(c("haus", NA, "maus", NA, "maus")),
huft = as.factor(c(NA, NA, "wolle", NA, "holz")),
mode = c(5, NA, 2.5, NA, 4))
Is it better to convert it into a time series object?
In '[.data.table'(dtRes, , lapply(.SD, function(x) if (is.numeric(x)) mean(x, : Item 5 of j's result for group 1 is zero length. This will be filled with 3 NAs to match the longest column in this result. Later groups may have a similar problem but only the first is reported to save filling the warning buffer.
I don't know if this could be problematic because I don't know exactly what it means – Echoism