What is the best method to bin intraday volume figures from a stock price timeseries using XTS / ZOO etc in R?
Asked Answered
C

2

6

For instance, let's say you have ~10 years of daily 1 min data for the volume of instrument x as follows (in xts format) from 9:30am to 4:30pm :

    Date.Time               Volume        
    2001-01-01 09:30:00     1200
    2001-01-01 09:31:00     1110
    2001-01-01 09:32:00     1303

All the way through to:

    2010-12-20 16:28:00     3200
    2010-12-20 16:29:00     4210
    2010-12-20 16:30:00     8303

I would like to:

  • Get the average volume at each minute for the entire series (ie average volume over all 10 years at 9:30, 9:31, 9:32...16:28, 16:29, 16:30)

How should I best go about:

  • Aggregating the data into one minute buckets
  • Getting the average of those buckets
  • Reconstituting those "average" buckets back to a single xts/zoo time series?

I've had a good poke around with aggregate, sapply, period.apply functions etc, but just cannot seem to "bin" the data correctly.

It's easy enough to solve this with a loop, but very slow. I'd prefer to avoid a programmatic solution and use a function that takes advantage of C++ architecture (ie xts based solution)

Can anyone offer some advice / a solution?

Thanks so much in advance.

Cynth answered 24/2, 2012 at 6:27 Comment(0)
S
5

First lets create some test data:

library(xts) # also pulls in zoo
library(timeDate)
library(chron) # includes times class

# test data
x <- xts(1:3, timeDate(c("2001-01-01 09:30:00", "2001-01-01 09:31:00", 
    "2001-01-02 09:30:00")))

1) aggregate.zoo. Now try converting it to times class and aggregating using this one-liner:

aggregate(as.zoo(x), times(format(time(x), "%H:%M:%S")), mean)

1a) aggregate.zoo (variation). or this variation which converts the shorter aggregate series to times to avoid having to do it on the longer original series:

ag <- aggregate(as.zoo(x), format(time(x), "%H:%M:%S"), mean)
zoo(coredata(ag), times(time(ag)))

2) tapply. An alternative would be tapply which is likely faster:

ta <- tapply(coredata(x), format(time(x), "%H:%M:%S"), mean)
zoo(unname(ta), times(names(ta)))

EDIT: simplified (1) and added (1a) and (2)

Submaxillary answered 24/2, 2012 at 13:25 Comment(1)
Thank you for posting this very elegant solution.Cynth
M
3

Here is a solution with ddply, but you can probably also use sqldf, tapply, aggregate, by, etc.

# Sample data
minutes <- 10 * 60
days <- 250 * 10
d <- seq.POSIXt( 
  ISOdatetime( 2011,01,01,09,00,00, "UTC" ), 
  by="1 min", length=minutes 
)
d <- outer( d, (1:days) * 24*3600, `+` )
d <- sort(d)
library(xts)
d <- xts( round(100*rlnorm(length(d))), d )

# Aggregate
library(plyr)
d <- data.frame( 
  minute=format(index(d), "%H:%M"), 
  value=coredata(d) 
)
d <- ddply( 
  d, "minute", 
  summarize, 
  value=mean(value, na.rm=TRUE) 
)

# Convert to zoo or xts
zoo(x=d$value, order.by=d$minute) # The index does not have to be a date or time
xts(x=d$value, order.by=as.POSIXct(sprintf("2012-01-01 %s:00",d$minute), "%Y-%m-%d %H:%M:%S") )
Mccarty answered 24/2, 2012 at 7:18 Comment(4)
Thanks for this. I had sqldf in mind, but it seemed like a "cheat" for what I was trying to achieve. Now to your code. This is working well up to the use of ddply (ie, i've built the data frame with minute and value (structured as chr and num respectively). However, it just returns "NA" for the (mean) value column. Any ideas?Cynth
Sorry - I should state that your model code works fine throughout. It is not, however, working on my data. a str() call on the volume data returns: num [1:976638, 1] 46 32 24 7 34 27 9 18 2 24 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr "Volume" /// and the index of my data: Formal class 'timeDate' [package "fCalendar"] with 3 slots ..@ Data : POSIXct[1:976638], format: "2001-07-02 09:51:00" "2001-07-02 09:52:00" "2001-07-02 09:53:00" "2001-07-02 09:54:00" ... ..@ format : chr "%Y-%m-%d %H:%M:%S"Cynth
The error being returned in my adaptation is: In mean.default(value, na.rm = TRUE) : argument is not numeric or logical: returning NACynth
I tried to use timeDate objects for the index (library(timeDate); d <- seq(Sys.timeDate(), by=60, length=10); d <- xts(...)), but it works fine. However, your timeDate objects seem to come from the obsolete fCalendar package: it has been replaced with timeDate.Mccarty

© 2022 - 2024 — McMap. All rights reserved.