R - time series hourly
Asked Answered
C

1

7

I have the following dataset of incoming calls per day within the hours from 3 p.m. to 10 p.m. which looks like this:

Date        hour  Count  Year  Month  Day
01.01.2001  15    69     2001  1      1
01.01.2001  16    12     2001  1      1
01.01.2001  17    56     2001  1      1
01.01.2001  18    34     2001  1      1
01.01.2001  19    44     2001  1      1
01.01.2001  20    91     2001  1      1
01.01.2001  21    82     2001  1      1
01.01.2001  22    49     2001  1      1
...
17.08.2003  22    103    2003  8      17

what needs to be done is a time series analysis including forecasts, exponential smoothing, moving average and so forth.

the problem that I'm facing now is how to declare the ts function? I only have the peak hours from 3 p.m to 10 p.m available, so I can't declare the frequency as 24.

Can anybody help me out?

many thanks cheers,

Crossgarnet answered 6/1, 2015 at 15:59 Comment(0)
W
8

1) Assuming that the series starts at 3pm, that days are consecutive and all hours from 3pm to 10pm are present:

tser <- ts(DF[-1], freq = 8)

giving:

> tser
Time Series:
Start = c(1, 1) 
End = c(1, 8) 
Frequency = 8 
      hour Count Year Month Day
1.000   15    69 2001     1   1
1.125   16    12 2001     1   1
1.250   17    56 2001     1   1
1.375   18    34 2001     1   1
1.500   19    44 2001     1   1
1.625   20    91 2001     1   1
1.750   21    82 2001     1   1
1.875   22    49 2001     1   1

This will represent the index for day 1 3pm as 1.0, day 1 4pm as 1+1/8, day 1 5pm as 1+2/8, ..., day1 10pm as 1+7/8, day 2 3pm as 2, day 2 4pm as 2+1/8, etc.

2) This is the same but the days start at the number of days since 1970-01-01 instead of starting at 1:

tser <- ts(DF[-1], start = as.Date("2001-01-01"), freq = 8)

giving:

> tser
Time Series:
Start = c(11323, 1) 
End = c(11323, 8) 
Frequency = 8 
         hour Count Year Month Day
11323.00   15    69 2001     1   1
11323.12   16    12 2001     1   1
11323.25   17    56 2001     1   1
11323.38   18    34 2001     1   1
11323.50   19    44 2001     1   1
11323.62   20    91 2001     1   1
11323.75   21    82 2001     1   1
11323.88   22    49 2001     1   1

That is, this would represent each day as the number of days since 1970-01-01 plus, as before, 0, 1/8, ..., 7/8 for the hours.

If you later need to regenerate the date/time then:

library(chron)
tt <- as.numeric(time(tser))
as.chron(tt %/% 1) + (8 * tt%%1 + 15)/24

giving:

[1] (01/01/01 15:00:00) (01/01/01 16:00:00) (01/01/01 17:00:00)
[4] (01/01/01 18:00:00) (01/01/01 19:00:00) (01/01/01 20:00:00)
[7] (01/01/01 21:00:00) (01/01/01 22:00:00)

3) zoo If its not important to keep them equally spaced then you could try this:

library(zoo)
library(chron)
z <- zoo(DF[-1], as.chron(format(DF$Date), "%d.%m.%Y") + DF$hour/24)

giving:

> z
                    hour Count Year Month Day
(01/01/01 15:00:00)   15    69 2001     1   1
(01/01/01 16:00:00)   16    12 2001     1   1
(01/01/01 17:00:00)   17    56 2001     1   1
(01/01/01 18:00:00)   18    34 2001     1   1
(01/01/01 19:00:00)   19    44 2001     1   1
(01/01/01 20:00:00)   20    91 2001     1   1
(01/01/01 21:00:00)   21    82 2001     1   1
(01/01/01 22:00:00)   22    49 2001     1   1

The zoo approach does not require that all hours be present nor is it required that the days be consecutive.

Note: I am not sure that you really need all the date and hour fields broken out separately since they can easily be generated on the fly so this might be enough.

Count <- z$Count

Year can be recovered via as.numeric(format(time(Count), "%Y")) and month, day and hour can be recovered by using %m, %d or %H in place of %Y.

A list of the month, day and year columns can also be generated using month.day.year(time(Count)).

years(time(Count)), months(time(Count)), days(time(Count)) and hours(time(Count)) will produce factors of the indicated quantities.

Windage answered 6/1, 2015 at 16:12 Comment(4)
the zoo approach seems to be perfect. the weird thing is that the date is displayed as (NA NA) when the z DF is printed. is this a known issue?Crossgarnet
As you can see from the output its printed properly. You must be doing something differently from what was described in the answer.Windage
got it. just needed to make the following change in the format from %d.%m.%Y to %d/%m/%Y. how am I supposed to plot e.g. the number of calls from 3-10 p.m. for year/month/day? sorry to bother you with simple questions, but I'm brand new to R. many thanksCrossgarnet
I gather you only have calls from 3pm to 10pm so: ag <- aggregate(z$Count, as.Date, sum); plot(ag)Windage

© 2022 - 2024 — McMap. All rights reserved.