Performance of recursion vs accumulator style
Asked Answered
W

4

12

We have two functions that compute the factorial of a given number. The first one, !, uses an accumulator style. The second, fact, uses natural recursion.

(define (! n0)
  (local (;; accumulator is the product of all natural numbers in [n0, n)
      (define (!-a n accumulator)
        (cond
          [(zero? n) accumulator]
          [else (!-a (sub1 n) (* n accumulator))])))
    (!-a n0 1)))

and

(define (fact n)
  (cond
    [(= 0 n) 1]
    [else (* n (fact (- n 1)))]))

At the bottom of Section 31, HtDP states that the naturally recursive version is often as fast if not faster than the accumulator version, but does not state the reasons why. I did some reading on this and it seems that the answer is 'tail call optimization/elimination', but the Wikipedia article seems to be at odds with what HtDP says, at least with respect to performance. Why is this so?


At work, the recursive style is faster. At home, the accumulator style is faster. Is there no general heuristic to guide a choice as to which style is generally preferred? I understand that the accumulator-style is more memory-efficient, but if we restrict the discussion to performance alone, it is unclear, at least to me, which is the better choice.


I've thought about this a little further, and would have to side with the Wikipedia article on the superiority of the accumulator-style recursion in the general case. Not only does it reduce usage of stack/heap space, memory access is always going to be behind register access and can only be made more evident now that multicore is here. Still, HtDP proves that actual testing is required in all cases.

Winchell answered 19/1, 2011 at 9:12 Comment(0)
H
11

The answer will depend on the details of the Racket system. Here's my take on it.

There are two major differences between the naturally recursive version and the accumulator version. First, the accumulator version is written in a fashion that allows tail-call optimization. This helps make the accumulator version faster, as fewer stack frames need to be created. But that is the opposite of what is discussed in HtDP and that you've seen on your work computer.

The other difference is the order of multiplication. The naturally recursive version multiples the numbers from 1 to 20 in ascending order, that is

((((1 * 2) * 3) * … * 19) * 20)

The accumulator version multiplies the same numbers in descending order, that is

((((20 * 19) * 18) * … * 2) * 1)

Mathematically, these are the same, and the two factorial functions will give the same result. Nonetheless, this difference can matter. In particular, at any intermediate multiplication, the intermediate result will be larger for the latter calculation than for the former calculation.

The factorial of 20 is a big number. It won't fit in a 32 bit integer. That means that racket will need to use an arbitrary length integer (a "bignum") to represent the answer, and some of the intermediate results. Arbitrary precision arithmetic, including multiplication involving bignums, is slower than fixed precision arithmetic.

Since the intermediate results in the accumulator version are always larger than for the naturally recursive version, the accumulator version will require a bignum earlier than the recursive version. In short, while both versions require the same number of multiplications, the accumulator version requires more arbitrary precision multiplications. This makes the accumulator version slower. Apparently, the additional cost of the arithmetic outweighs the saving from reducing the number of stack frames.

So why wouldn't the same trend show up on your home computer? You said it was an Intel iMac, so it is probably a 64 bit system. While 20! is a big number, it is small compared to what will fit in a 64 bit integer, so your home computer isn't doing any arbitrary precision arithmetic and the order doesn't matter. HtDP is old enough that it would have used a 32 bit system, as would Windows XP on your work computer.

More useful to explore the differences would be a function that calculates the product of a list of numbers, either

(define (product numlist)
  (* (car numlist) (product (cdr numlist)))

or an accumulator version. You could then feed in the numbers in either ascending or descending order, independent of whether you're using a naturally recursive or accumulator-based approach.

Highkeyed answered 21/1, 2011 at 12:37 Comment(1)
Thank you for the insight. That was very well explained. What would be ideal is if someone could verify it on a 64-bit Windows version of Racket.Winchell
L
2

I don't know the inners of the Racket compiler, but I'll speculate.

Tail calls are generally more expensive than normal calls (this is true in .NET, up to 7 times slower), but in some cases the tail call can be eliminated and it ends up being a while(1) { ... } C-style loop, hence there will be no extra calls to make, just a simply local jump, effectively eliminating procedure application overhead.

Laing answered 19/1, 2011 at 9:24 Comment(5)
Hello leppie... thanks for answering. I ran the same code on my home computer (an Intel Core 2 Duo iMac) and interestingly, the accumulator version turned in better timings consistently, which is opposite from the Windows XP PC at work. DrRacket v5.0 at home, will find out the version at work tomorrow.Winchell
@Greenhorn: Do note that those cant be compared. The latter is a very naive implementation. The former will result in much less 'recursive' calls even without tail call elimination. Try rewrite the first in a way that the recursive call is not in the tail position. This may be hard depending on how good the optimizer in Racket is.Laing
@leppie... here at work DrRacket is v.5.0.2. The situation is opposite with the iMac at home. I understand that CPUs and OS are different, but to see vastly different results with no code changes tells me that I should be more careful about writing code that could impact any specific platform adversely.Winchell
@Greenhorn: The complexity is different between the 2. I do not know how the 2nd one can be faster on any PC. Perhaps add some tracing or profile both methods (dunno what Racket provides in terms of that).Laing
But be careful that you don't affect the execution too much by using debugging techniques!Nelle
E
0

A good compiler would turn the recursive fac into a tail recursive one. So there should be no difference in compiled code.

Empirin answered 20/1, 2011 at 22:42 Comment(3)
My takeaway from all this is that the same code at the source-level but compiled to different hardware/platforms can yield significantly different outcomes in the choice of the algorithm. In other words, the hardware and platform are important considerations, which makes sense.Winchell
No knivil, the compiler/optimizer cannot change an algorithm for you, that would be insane. Imagine you had to guess what the result was every time!Laing
Yes, we can! The recursive fact is nothing more than a fold-right over the natural numbers. Multiplication is commutative. Therefore you can replace it with fold-left (which is tail recursive). This could be turned into a normal loop. Try it by yourself in C or C++ and look at the asm code generated by gcc (with optimization).Empirin
D
0

Many good points above. I love the analysis of what should be versus why it didn't. That is the stuff of which Project Euler success is made. Too early a trip from fixnums can be problematic.

The sequence of numbers can be multiplied from large to small or vice versa. We also have the 'do' command that does iteration directly and similarly.

(define (fact n)  (if (= n 1) 1 (* n (fact (- n 1)))))

(define (fact1 n) (do ([n n (- n 1)] [p 1 (* p n)]) ((= n 1) p)))

(define (fact2 n) (do ([i 1 (+ i 1)] [p 1 (* p i)]) ((< n i) p)))

(define (fact3 n) (let f ((n n) (p 1)) (if (= n 1) p (f (- n 1) (* p n)))))

(define (fact4 n) (let f ((i 1) (p 1)) (if (< n i) p (f (+ i 1) (* p i)))))
Disarmament answered 28/2, 2011 at 5:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.