Adding missing rows
Asked Answered
M

4

6

The format of my excel data file is:

 day                 value
 01-01-2000 00:00:00    4
 01-01-2000 00:01:00    3
 01-01-2000 00:02:00    1
 01-01-2000 00:04:00    1

I open my file with this:

ts = read.csv(file=pathfile, header=TRUE, sep=",")

How can I add additional rows with zero number in column “value” into the data frame. Output example:

 day                  value
 01-01-2000 00:00:00    4
 01-01-2000 00:01:00    3
 01-01-2000 00:02:00    1
 01-01-2000 00:03:00    0
 01-01-2000 00:04:00    1
Mannerless answered 24/5, 2013 at 19:57 Comment(0)
A
1

Try:

ts = read.csv(file=pathfile, header=TRUE, sep=",", stringsAsFactors=F)
ts.tmp = rbind(ts,list("01-01-2000 00:03:00",0))
ts.out = ts.tmp[order(ts.tmp$day),]

Notice that you need to force load the strings in first column as character and not factors otherwise you will have issue with the rbind. To get the day column to be a factor after than just do:

ts.out$day = as.factor(ts.out$day)
Alternation answered 24/5, 2013 at 20:16 Comment(0)
A
13

This is now completely automated in the padr package. Takes only one line of code.

original <- data.frame(
  day = as.POSIXct(c("01-01-2000 00:00:00",
                     "01-01-2000 00:01:00",
                     "01-01-2000 00:02:00",
                     "01-01-2000 00:04:00"), format="%m-%d-%Y %H:%M:%S"),
  value = c(4, 3, 1, 1))

library(padr)
library(dplyr) # for the pipe operator
original %>% pad %>% fill_by_value(value)

See vignette("padr") or this blog post for its working.

Allopath answered 20/1, 2017 at 15:37 Comment(2)
It only works when there's a variable of class Date, POSIXct, or POSIXlt in the data. If the time dimension is int, can it work too?Taligrade
There is the function padr::pad_int for that.Allopath
S
3

I think this is a more general solution, which relies on creating a sequence of all timestamps, using that as the basis for a new data frame, and then filling in your original values in that df where applicable.

# convert original `day` to POSIX
ts$day <- as.POSIXct(ts$day, format="%m-%d-%Y %H:%M:%S", tz="GMT")

# generate a sequence of all minutes in a day
minAsNumeric <- 946684860 + seq(0,60*60*24,by=60) # all minutes of your first day
minAsPOSIX <- as.POSIXct(minAsNumeric, origin="1970-01-01", tz="GMT") # convert those minutes to POSIX

# build complete dataframe
newdata <- as.data.frame(minAsPOSIX)
newdata$value <- ts$value[pmatch(newdata$minAsPOSIX, ts$day)] # fill in original `value`s where present
newdata$value[is.na(newdata$value)] <- 0 # replace NAs with 0
Somniloquy answered 24/5, 2013 at 22:49 Comment(0)
A
1

Try:

ts = read.csv(file=pathfile, header=TRUE, sep=",", stringsAsFactors=F)
ts.tmp = rbind(ts,list("01-01-2000 00:03:00",0))
ts.out = ts.tmp[order(ts.tmp$day),]

Notice that you need to force load the strings in first column as character and not factors otherwise you will have issue with the rbind. To get the day column to be a factor after than just do:

ts.out$day = as.factor(ts.out$day)
Alternation answered 24/5, 2013 at 20:16 Comment(0)
D
0

Tidyr offers the nice complete function to generate rows for implicitly missing data. You can use the fill parameter to convert NA to 0.

library(tidyr)

ts |>
  complete(day=seq.POSIXt(min(day), max(day), by="min"), fill = list(value = 0))

Notice that I set the granularity of the dates to minutes since your dataset expects a row every minute.

Dissatisfactory answered 3/2, 2022 at 8:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.