While loop in R quantstrat code - how to make it faster?
Asked Answered
P

1

6

In quantstrat package I have located one of the main culprits for slowness of the applyRule function and wonder if there is more efficient to write the while loop. Any feedback would be helpful. For anyone experience wrapping this part into Parallel R.

As an option apply would work instead while? Or should I re-write this part into new function such as ruleProc and nextIndex? I am also dveling on Rcpp but that may be a streach. Any help and constructive advice is much appreciated?

   while (curIndex) {
    timestamp = Dates[curIndex]
    if (isTRUE(hold) & holdtill < timestamp) {
        hold = FALSE
        holdtill = NULL
    }
    types <- sort(factor(names(strategy$rules), levels = c("pre",
        "risk", "order", "rebalance", "exit", "enter", "entry",
        "post")))
    for (type in types) {
        switch(type, pre = {
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules$pre, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        }, risk = {
            if (length(strategy$rules$risk) >= 1) {
              ruleProc(strategy$rules$risk, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        }, order = {
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules[[type]], timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr,)
            } else {
              if (isTRUE(path.dep)) {
                timespan <- paste("::", timestamp, sep = "")
              } else timespan = NULL
              ruleOrderProc(portfolio = portfolio, symbol = symbol,
                mktdata = mktdata, timespan = timespan)
            }
        }, rebalance = , exit = , enter = , entry = {
            if (isTRUE(hold)) next()
            if (type == "exit") {
              if (getPosQty(Portfolio = portfolio, Symbol = symbol,
                Date = timestamp) == 0) next()
            }
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules[[type]], timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
            if (isTRUE(path.dep) && length(getOrders(portfolio = portfolio,
              symbol = symbol, status = "open", timespan = timestamp,
              which.i = TRUE))) {
            }
        }, post = {
            if (length(strategy$rules$post) >= 1) {
              ruleProc(strategy$rules$post, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        })
    }
    if (isTRUE(path.dep))
        curIndex <- nextIndex(curIndex)
    else curIndex = FALSE
}
Pehlevi answered 10/10, 2011 at 12:28 Comment(4)
I'd doubt the problem is inherent in the while, but whatever happens inside the while. Your code doesn't seem to do anything - there is no assignment of values or a return value. Therefore you must be running this because of the side-effects of ruleProc. If one of the side-effects is that your assigning values elsewhere, I'd start by refactoring that. It's generally faster to use lapply and do.call on the resulting list than selective assignment using [<-, in my experience.Slosberg
If you posted toy examples of each of the underlying data structures presented then I think that that for could be vectorized. Also, functions like nextIndex, and ruleProc need to be revealed as well. Only after that could someone make a good assessment of the while loop.Vogler
OK, just looked up this code... try getting ruleProc vectorized first. In fact, if ruleProc is an example of how terribly slow this code is you can have easy large speedups for almost no thinking. Go through ruleProc and just move everything out of the giant for loop that wraps the whole function. It's lots. Then post it for vectorization if you can't do that part yourself. But do the move first.Vogler
I've provided an answer below, which it would be nice if the OP would accept. I'll also provide a comment here that is really a hypothesis that I haven't had time to test. I suspect that the state machine code in applyRules could be sped up significantly by use of the compiler package introduced with R 2.13.0. This could be used to byte-compile some/all of the applyRules function in quantstrat.Dedra
D
7

Garrett's answer does point to the last major discussion on the R-SIG-Finance list where a related question was discussed.

The applyRules function in quantstrat is absolutely where most of the time is spent.

The while loop code copied in this question is the path-dependent part of the applyRules execution. I believe all of this is covered in the documentation, but I'll briefly review for SO posterity.

We construct a dimension reduction index inside applyRules so that we don't have to observe every timestamp and check it. We only take note of specific points in time where the strategy may reasonably be expected to act on the order book, or where orders may reasonably be expected to get filled.

This is state-dependent and path-dependent code. Idle talk of 'vectorization' doesn't make any sense in this context. If I need to know the current state of the market, the order book, and my position, and if my orders may be modified in a time-dependent manner by other rules, I don't see how this code can be vectorized.

From a computer science perspective, this is a state machine. State machines in almost every language I can think of are usually written as while loops. This isn't really negotiable or changeable.

The question asks if use of apply would help. apply statements in R are implemented as loops, so no, it wouldn't help. Even a parallel apply such as mclapply or foreach can't help because this is inside a state dependent part of the code. Evaluating different time points without regard to state doesn't make any sense. You'll note that the non-state-dependent parts of quantstrat are vectorized wherever possible, and account for very little of the running time.

The comment made by John suggests removing the for loop in ruleProc. All that the for loop does is check each rule associated with the strategy at this point in time. The only compute-intensive part of that loop is the do.call to call the rule function. The rest of the for loop is simply locating and matching arguments for these functions, and from code profiling, doesn't take much time at all. It would not make much sense to use a parallel apply here either, since the rule functions are applied in type order, so that cancels or risk directives can be applied before new entry directives. Much as mathematics has an order of operations, or a bank has a deposit/withdrawal processing order, quantstrat has a rule type evaluation order, as laid out in the documentation.

To speed up execution, there are four main things that can be done:

  1. write a non-path dependent strategy: this is supported by the code, and simple strategies may be modeled this way. In this model you would write a custom rule function that calls addTxn directly when you think you should get your fills. It could be a vectorized function operating on your indicators/signals, and should be very fast.
  2. preprocess your signals:if there are fewer places where the state machine needs to evaluate the state of the order book/rules/portfolio to see if it needs to do something, the speed increase is nearly linear with the reduction in signals. This is the area most users neglect, writing signal functions that don't really do evaluation of when action may be required that would modify positions or the order book.
  3. explicitly parallelize parts of your analysis problem: I commonly write explicitly parallel wrappers to separate out different parameter evaluations or symbol evaluations, see applyParameter for an example using foreach
  4. rewrite the state machine inside applyRules in C/C++: Patches welcome, but do see the link Garrett posted for additional details.

I can assure you that most strategies can run in a fraction of a core-minute per symbol per day per core on tick data, if a little care is taken to the signal generation functions. Running large backtests on a laptop is not recommended.

Ref: quantstrat - applyRules

Dedra answered 12/10, 2011 at 12:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.