creating time series for data sampled daily in R
Asked Answered
K

3

2

I don't understand how time series objects are created in R. I have data: data = c(101,99,97,95,93,91,89,87,85,83,81) (smaller dataset for the sake of brevity). This data is taken once every day for 11 days starting from 2016-07-05 to 2016-07-15. According to the docs, the frequency for data sampled daily should be 7. But I do not understand the values for start and end parameters. For start, docs say: the time of the first observation. Either a single number or a vector of two integers, which specify a natural time unit and a (1-based) number of samples into the time unit. I do not understand what 1-based number of samples means. I tried to google it but it didn't help.

If I just use 2016,7 as the start and end date, I just get:

Time Series:
Start = c(2016, 7) 
End = c(2016, 7) 
Frequency = 7 
[1] 101

If I use 2016,7,1 and 2016,7,11 as the start and end date, I still get the same output.

What am I doing wrong?

Kester answered 1/8, 2016 at 16:42 Comment(0)
M
1

I think the best way is to switch to xts or zoo, since according to another question here, ts() struggles with daily observations, since the number of days varies between years.

Mossberg answered 2/8, 2016 at 8:22 Comment(2)
I am using time series for forecasting. I tried using xts, and it keeps the data in the format I expected (like the timestamp and the value for that timestamp). But the output, that I got from calling forecast using the xts object, is a ts object which does no longer contain those timestamps. I just see the values.Kester
The only way of fixing this is by adding the dates manually back to the ts object as described here https://mcmap.net/q/515085/-forecasting-time-series-data Don't know if it would be easier to just use a data.frame with a Date column.Mossberg
P
1

As I understood it, in the ts() function the unit is year. Therefore, here frequency should be set to 365 (days per year). Accordingly, start and end should represent days as well. However, (I believe that) to get the timing right, start and end should be the difference in days of the desired interval from the beginning of the year (in your specific case, 186 and 196 respectively). The appropriateness of these numbers can be checked with:

as.numeric(as.Date("2016-07-05") - as.Date("2016-01-01"))
[1] 186
as.numeric(as.Date("2016-07-15") - as.Date("2016-01-01"))
[1] 196

Embedding these information into your code the call to ts() should be:

data = c(101,99,97,95,93,91,89,87,85,83,81)
ts(data, start = c(2016, 186), end = c(2016, 196), frequency = 365)
# which yielded
Time Series:
Start = c(2016, 186) 
End = c(2016, 196) 
Frequency = 365 
 [1] 101  99  97  95  93  91  89  87  85  83  81

HTH

Porter answered 25/6, 2017 at 18:36 Comment(0)
H
0

Frequency parameter in a tsobject determines how many samples your series has among units. Therefore, when you choose a frequency, it assumes a unit. This unit is the one that will be used to set the startand end parameters.

For instance, if you set frequency = 365, you are assuming that the unit is year and that there are 365 points sampled between units. Let us identify the first point of you series at 2016-07-05 in this unit. Year is clearly 2016 and within that year I take as.Date("2016-07-05") - as.Date("2016-01-01") + 1, that is 187. Note that I'm assuming the first sampled day has index 1 instead of 0 as used in the solution of @Helloworld. Therefore, start = c(2016, 187).

data <- c(101,99,97,95,93,91,89,87,85,83,81) 
start_dt <- "2016-07-05" 
end_dt <- "2016-07-15"

ts_365 <- ts(data, start = c(216, 187), frequency = 365)
ts_365
#> Time Series:
#> Start = c(216, 187) 
#> End = c(216, 197) 
#> Frequency = 365 
#>  [1] 101  99  97  95  93  91  89  87  85  83  81

On the other hand, if we want to use frequency = 7, we need to use week as unit and use it to specify start. Indeed, we can obtain the week (isoweek, but other criteria will work also) and the day of the week, assuming Monday is the first day (again, you can change this criteria)

strftime(start_dt, "isoweek: %V weekday: %u (%A)")
#> [1] "isoweek: 27 weekday: 2 (martes)"

With frequency = 7, we will define the time series as

ts_7 <- ts(data, start = c(27, 2), frequency = 7)
ts_7
#> Time Series:
#> Start = c(27, 2) 
#> End = c(28, 5) 
#> Frequency = 7 
#>  [1] 101  99  97  95  93  91  89  87  85  83  81

If you plot any of the ts objects above, you will get a numeric axis based on the unit chosen. My recommendation is that you set x axis to represent actual dates

x_dates <- seq.Date(from = as.Date(start_dt), to = as.Date(end_dt), by = "day")
plot(x_dates, data)

Created on 2023-03-05 with reprex v2.0.2

The frequency you chose is relevant when you apply some functions to ts objects because frequency can be taken as the period to look for seasonality (decompose() function, for instance)

Hyperopia answered 5/3, 2023 at 16:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.