Speeding up Julia's poorly written R examples
Asked Answered
C

2

83

The Julia examples to compare performance against R seem particularly convoluted. https://github.com/JuliaLang/julia/blob/master/test/perf/perf.R

What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?

## mandel

mandel = function(z) {
    c = z
    maxiter = 80
    for (n in 1:maxiter) {
        if (Mod(z) > 2) return(n-1)
        z = z^2+c
    }
    return(maxiter)
}

mandelperf = function() {
    re = seq(-2,0.5,.1)
    im = seq(-1,1,.1)
    M = matrix(0.0,nrow=length(re),ncol=length(im))
    count = 1
    for (r in re) {
        for (i in im) {
            M[count] = mandel(complex(real=r,imag=i))
            count = count + 1
        }
    }
    return(M)
}

assert(sum(mandelperf()) == 14791)

## quicksort ##

qsort_kernel = function(a, lo, hi) {
    i = lo
    j = hi
    while (i < hi) {
        pivot = a[floor((lo+hi)/2)]
        while (i <= j) {
            while (a[i] < pivot) i = i + 1
            while (a[j] > pivot) j = j - 1
            if (i <= j) {
                t = a[i]
                a[i] = a[j]
                a[j] = t
            }
            i = i + 1;
            j = j - 1;
        }
        if (lo < j) qsort_kernel(a, lo, j)
        lo = i
        j = hi
    }
    return(a)
}

qsort = function(a) {
  return(qsort_kernel(a, 1, length(a)))
}

sortperf = function(n) {
    v = runif(n)
    return(qsort(v))
}

sortperf(5000)
Corbicula answered 1/4, 2012 at 21:39 Comment(6)
For a start, rtricks.blogspot.ca/2007/04/…Cornelison
For goodness sakes... get R programmers to program R.Roband
(1) Here is an example of fibonacci in R johnmyleswhite.com/notebook/2012/03/31/julia-i-love-you and it seems they are using that to conclude Julia was faster but checking my comments below the blog post I was able to rewrite the R solution (still with only pure R) & got it to run 2000x faster. (2) Many can be gotten to run 3x-4x faster in R by byte compiling & that does not even require that you change the code. (3) Many of the examples are stacked against R from the start as they use recursion which R is no good at. Including problems in the mix that are readily vectorized would be fairer.Redundant
@G.Grothendieck You should post your comment as an Answer Gabor; plenty of pertinent points there. +1Vadnais
Might be interesting to see all this benchmarking extended to Radford Neal's pqR too.Bronchitis
For a very simple Rcpp mandelbrot version (multithreaded using OpenMP) also have a look here: #48070490 (not really answering the question though)Ctn
P
47

Hmm, in the Mandelbrot example the matrix M has its dimensions transposed

M = matrix(0.0,nrow=length(im), ncol=length(re))

because it's filled by incrementing count in the inner loop (successive values of im). My implementation creates a vector of complex numbers in mandelperf.1 and operates on all elements, using an index and subsetting to keep track of which elements of the vector have not yet satisfied the condition Mod(z) <= 2

mandel.1 = function(z, maxiter=80L) {
    c <- z
    result <- integer(length(z))
    i <- seq_along(z)
    n <- 0L
    while (n < maxiter && length(z)) {
        j <- Mod(z) <= 2
        if (!all(j)) {
            result[i[!j]] <- n
            i <- i[j]
            z <- z[j]
            c <- c[j]
        }
        z <- z^2 + c
        n <- n + 1L
    }
    result[i] <- maxiter
    result
}

mandelperf.1 = function() {
    re = seq(-2,0.5,.1)
    im = seq(-1,1,.1)
    mandel.1(complex(real=rep(re, each=length(im)),
                     imaginary=im))
}

for a 13-fold speed-up (the results are equal but not identical because the original returns numeric rather than integer values).

> library(rbenchmark)
> benchmark(mandelperf(), mandelperf.1(),
+           columns=c("test", "elapsed", "relative"),
+           order="relative")
            test elapsed relative
2 mandelperf.1()   0.412  1.00000
1   mandelperf()   5.705 13.84709

> all.equal(sum(mandelperf()), sum(mandelperf.1()))
[1] TRUE

The quicksort example doesn't actually sort

> set.seed(123L); qsort(sample(5))
[1] 2 4 1 3 5

but my main speed-up was to vectorize the partition around the pivot

qsort_kernel.1 = function(a) {
    if (length(a) < 2L)
        return(a)
    pivot <- a[floor(length(a) / 2)]
    c(qsort_kernel.1(a[a < pivot]), a[a == pivot], qsort_kernel.1(a[a > pivot]))
}

qsort.1 = function(a) {
    qsort_kernel.1(a)
}

sortperf.1 = function(n) {
    v = runif(n)
    return(qsort.1(v))
}

for a 7-fold speedup (in comparison to the uncorrected original)

> benchmark(sortperf(5000), sortperf.1(5000),
+           columns=c("test", "elapsed", "relative"),
+           order="relative")
              test elapsed relative
2 sortperf.1(5000)    6.60 1.000000
1   sortperf(5000)   47.73 7.231818

Since in the original comparison Julia is about 30 times faster than R for mandel, and 500 times faster for quicksort, the implementations above are still not really competitive.

Puzzlement answered 2/4, 2012 at 0:17 Comment(7)
@gsk3 byte-compilation doesn't really change the execution time of mandel; sortperf.1 becomes about 10x instead of 7x faster than sortperf.Puzzlement
The problem is that qsort_kernel.1 is still doing recursion and R is not good at that. To get it to run faster convert it to loops using a stack.Redundant
@G.Grothendieck the recursion isn't that deep (log N), iterative solutions likely use a stack (also not R-friendly), the speed difference is still large, and the algorithm less intuitive.Puzzlement
Using a good computing math library in R like Intel MKL also helps to decrease the gap between R and Julia.Montherlant
The point of the performance comparisons isn't to see how fast a solution to a problem can be obtained by different languages, but rather compare how the same implementation (i.e. algorithm) performs in different languages.Hair
@Hair For what it's worth, I believe the algorithms are the same; the implementations differ as appropriate for the language. For instance the quicksort algorithm follows the same logic and has the same complexity across the two implementations. I didn't intend for my post to be any sort of benchmark, and in some ways regret it.Puzzlement
I think it spurred interesting conversation. Don't regret itHair
M
109

The key word in this question is "algorithm":

What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?

As in "how fast can you make these algorithms in R?" The algorithms in question here are the standard Mandelbrot complex loop iteration algorithm and the standard recursive quicksort kernel.

There are certainly faster ways to compute the answers to the problems posed in these benchmarks – but not using the same algorithms. You can avoid recursion, avoid iteration, and avoid whatever else R isn't good at. But then you're no longer comparing the same algorithms.

If you really wanted to compute Mandelbrot sets in R or sort numbers, yes, this is not how you would write the code. You would either vectorize it as much as possible – thereby pushing all the work into predefined C kernels – or just write a custom C extension and do the computation there. Either way, the conclusion is that R isn't fast enough to get really good performance on its own – you need have C do most of the work in order to get good performance.

And that's exactly the point of these benchmarks: in Julia you never have to rely on C code to get good performance. You can just write what you want to do in pure Julia and it will have good performance. If an iterative scalar loop algorithm is the most natural way to do what you want to do, then just do that. If recursion is the most natural way to solve the problem, then that's ok too. At no point will you be forced to rely on C for performance – whether via unnatural vectorization or writing custom C extensions. Of course, you can write vectorized code when it's natural, as it often is in linear algebra; and you can call C if you already have some library that does what you want. But you don't have to.

We do want to have the fairest possible comparison of the same algorithms across languages:

  1. If someone does have faster versions in R that use the same algorithm, please submit patches!
  2. I believe that the R benchmarks on the julia site are already byte-compiled, but if I'm doing it wrong and the comparison is unfair to R, please let me know and I will fix it and update the benchmarks.
Mehalek answered 23/5, 2012 at 1:0 Comment(7)
@Stefan Good points. The counter-point, though, is that when you can get several-hundred to several-thousand times speedups simply by writing the code in a way that is natural in a language, the example code is simply poor R code. If someone came on to SO and posted those example codes, they would very quickly be given lessons in how to write R like R is intended to be written. Given that the examples all seem to be picked to involve recursion (which R is admittedly poor at), and then the sample code goes out of its way to avoid vectorization (which R is very good at)....Corbicula
I would still argue that there's nothing wrong with the benchmark code. If there were some R implementation in which iteration and recursion were fast, would the code still be poor? The only conclusion I can draw is that it's the implementation of the language that's faulty, not the code. If, moreover, the design of the R language makes it particularly challenging (or maybe impossible?) to make iteration and recursion fast, then I would say that's neither a fault in this particular code, nor the current R implementation, but rather in the language design itself.Mehalek
@Stefan I agree that using the same algorithm is a fair comparison, but perhaps it would be worthwhile re-writing the Julia-optimized versions of these algorithms the same, or as near to same as can be, as any of the R-optimized algorithms that are appearing (i.e. highly vectorised) and reassess both relative performance, and performance against the original implementations of the benchmark code in Julia? Ultimately if I can achieve the same end result using different code optimised for another language which is faster then I will probably use the faster code. I follow with great interest!Indeterminate
Just from a common wisdom and logical point of view, if you need to achieve X, you choose the best thing you have in a language if the result is the same. So this puristic adherence to 'same algos line-by-line' is just a reflection of conflict of interest from who is responding. And it is not an answer to the question, BTW!Subassembly
I think that the "true" R solution would be ... to build or look for a package that handles recursions well? ;-) Making use of the depth and breadth of the community. Then you can stay within your preferred language i.e. R, have all the number crunching (i.e. the C code) done by the package and voila you are on a level playgriudn again ...Floruit
If it were possible to write a package that makes recursion or iteration fast in R wouldn't someone have done it by now? Fundamental language improvements like that are not something you can "tack on" – they require deep, difficult changes. That doesn't mean that recursion and iteration couldn't be made faster in R, but it wouldn't be "just a package".Mehalek
@V.B.: That all depends on what you are trying to measure. Personally, I'm not at all interested in how fast one can compute Fibonacci numbers. Yet that is one of our benchmarks. Why? Because I am very interested in how well languages support recursion – and the doubly recursive algorithm happens to be a great test of recursion, precisely because it is such a terrible way to compute Fibonacci numbers. So what would be learned by comparing an intentionally slow, excessively recursive algorithm in C and Julia against a tricky, clever, vectorized algorithm in R? Nothing at all.Mehalek
P
47

Hmm, in the Mandelbrot example the matrix M has its dimensions transposed

M = matrix(0.0,nrow=length(im), ncol=length(re))

because it's filled by incrementing count in the inner loop (successive values of im). My implementation creates a vector of complex numbers in mandelperf.1 and operates on all elements, using an index and subsetting to keep track of which elements of the vector have not yet satisfied the condition Mod(z) <= 2

mandel.1 = function(z, maxiter=80L) {
    c <- z
    result <- integer(length(z))
    i <- seq_along(z)
    n <- 0L
    while (n < maxiter && length(z)) {
        j <- Mod(z) <= 2
        if (!all(j)) {
            result[i[!j]] <- n
            i <- i[j]
            z <- z[j]
            c <- c[j]
        }
        z <- z^2 + c
        n <- n + 1L
    }
    result[i] <- maxiter
    result
}

mandelperf.1 = function() {
    re = seq(-2,0.5,.1)
    im = seq(-1,1,.1)
    mandel.1(complex(real=rep(re, each=length(im)),
                     imaginary=im))
}

for a 13-fold speed-up (the results are equal but not identical because the original returns numeric rather than integer values).

> library(rbenchmark)
> benchmark(mandelperf(), mandelperf.1(),
+           columns=c("test", "elapsed", "relative"),
+           order="relative")
            test elapsed relative
2 mandelperf.1()   0.412  1.00000
1   mandelperf()   5.705 13.84709

> all.equal(sum(mandelperf()), sum(mandelperf.1()))
[1] TRUE

The quicksort example doesn't actually sort

> set.seed(123L); qsort(sample(5))
[1] 2 4 1 3 5

but my main speed-up was to vectorize the partition around the pivot

qsort_kernel.1 = function(a) {
    if (length(a) < 2L)
        return(a)
    pivot <- a[floor(length(a) / 2)]
    c(qsort_kernel.1(a[a < pivot]), a[a == pivot], qsort_kernel.1(a[a > pivot]))
}

qsort.1 = function(a) {
    qsort_kernel.1(a)
}

sortperf.1 = function(n) {
    v = runif(n)
    return(qsort.1(v))
}

for a 7-fold speedup (in comparison to the uncorrected original)

> benchmark(sortperf(5000), sortperf.1(5000),
+           columns=c("test", "elapsed", "relative"),
+           order="relative")
              test elapsed relative
2 sortperf.1(5000)    6.60 1.000000
1   sortperf(5000)   47.73 7.231818

Since in the original comparison Julia is about 30 times faster than R for mandel, and 500 times faster for quicksort, the implementations above are still not really competitive.

Puzzlement answered 2/4, 2012 at 0:17 Comment(7)
@gsk3 byte-compilation doesn't really change the execution time of mandel; sortperf.1 becomes about 10x instead of 7x faster than sortperf.Puzzlement
The problem is that qsort_kernel.1 is still doing recursion and R is not good at that. To get it to run faster convert it to loops using a stack.Redundant
@G.Grothendieck the recursion isn't that deep (log N), iterative solutions likely use a stack (also not R-friendly), the speed difference is still large, and the algorithm less intuitive.Puzzlement
Using a good computing math library in R like Intel MKL also helps to decrease the gap between R and Julia.Montherlant
The point of the performance comparisons isn't to see how fast a solution to a problem can be obtained by different languages, but rather compare how the same implementation (i.e. algorithm) performs in different languages.Hair
@Hair For what it's worth, I believe the algorithms are the same; the implementations differ as appropriate for the language. For instance the quicksort algorithm follows the same logic and has the same complexity across the two implementations. I didn't intend for my post to be any sort of benchmark, and in some ways regret it.Puzzlement
I think it spurred interesting conversation. Don't regret itHair

© 2022 - 2024 — McMap. All rights reserved.