Clojure number crunching performance

Asked 26/6, 2010 at 15:11 Answered 9/9, 2013 at 3:17

I'm not sure whether this belongs on StackOverflow or in the Clojure Google group. But the group seems to be busy discussing numeric improvements for Clojure 1.2, so I'll try here:

http://shootout.alioth.debian.org/ has a number of performance benchmarks for various languages.

I noticed that Clojure was missing, so I made a Clojure version of the n-body problem.

The fastest code I was able to produce can be found here, and benchmarking it seems to be saying that for number crunching Clojure is

factor ~10 quicker than Python/Ruby/Perl
factor ~4 slower than C/Java/Scala/Ada
approximately on par with OCaml, Erlang and Go

I'm quite happy with that level of performance.

My question to the Clojure gurus is

Are there obvious improvements I have missed, either in terms of speed or in terms of code brevity or readability (without sacrificing speed)?
Do you consider this to be representative of Clojure performance vs Python/Ruby/Perl on one hand and Java/C on the other?

Update

More Clojure 1.1 benchmark programs for the shootout here, including the n-body problem.

Ascomycete answered 26/6, 2010 at 15:11 Comment(6)

I have to imagine the JVM is going to play a big part here. What JVM are you using? Are you using the same one as the shootout? – Impunity 26/6, 2010 at 15:16

Java 1.6.0, the standard java that ships with OS X 10.6. It's approximately the same JVM (me: 1.6.0_20, shootout: 1.6.0_18) but different computer than the shootout. I ran both Clojure and the shootout Java implementation locally. I estimated relative performance by using Java as the baseline and scaling the shootout results accordingly. – Ascomycete 26/6, 2010 at 15:34

Your code looks pretty good. With the primitive improvements in 1.2 I would expect you to be able to get pretty close to the Java time. Have you tried running it through a profiler? My suspicion is that something somewhere is adding a boxing or function call overhead that is hurting you. – Sybaris 26/6, 2010 at 15:41

Ah, this is a cool project and a great contribution to the currently raging debate which you mention. I'll be waiting for more in-depth answers... I also wonder how your code would fare when taken for a ride on the new branches. Might give that a shot -- or would you? BTW, all this is absolutely of interest to the ggroup, especially now; maybe you could post it to the thread I mention below? – Toh 26/6, 2010 at 15:54

@mikera: According to the profiler there is ~4% CPU to intCast(Object) which I haven't been able to track down yet. Another 5% to doubleCast(double), which it might be possible to eliminate. So perhaps up to ~10% improvements. Apart from that the profiler doesn't show any low-hanging fruits. – Ascomycete 26/6, 2010 at 17:8

@Michal: OK, I'll post it to the thread you linked to. I might try it out on one of the new branches if/when I get the time, but those branches seem to be a target that is moving so quickly that I don't think I can keep up. Feel free to give it a go if you're curious :) – Ascomycete 26/6, 2010 at 17:14

Not a flood of responses here :) but apparently some interest, so I'll try to answer my own question with what I've learned over the past few days:

With the 1.1 optimization approach (Java primitives and mutable arrays) ~4x slower than optimized Java is about as fast as it goes.
The 1.2 constructs definterface and deftype are more than twice as fast, coming within ~1.7x (+70%) of Java with shorter, simpler and cleaner code than for 1.1.

Here are the implementations:

More details including "lessons learned", JVM version and profiling screenshots.

Subjectively speaking, optimizing the 1.2 code was a breeze compared to optimizing 1.1, so this is very good news for Clojure number crunching. (Actually close to amazing :)

The 1.2 testing used the current master branch, I did not try any of the new numeric branches. From what I can gather the new ideas currently being discussed

may make non-optimized numerics faster
may speed up the 1.1 version of this benchmark
will probably not speed up the 1.2 version, it is already as "close to the metal" as it is likely to get.

Disclaimers:

Clojure 1.2 is not released yet, so 1.2 benchmark results are preliminary.
This is one particular benchmark on physics calculations. It is relevant to floating point number crunching, but irrelevant to performance in areas like string parsing, concurrency or web request handling.

Ascomycete answered 29/6, 2010 at 16:55 Comment(7)

You should try the numeric branches, looking over your code I see places where you will get unnecessarily get boxing under the 1.2 master branch. – Inellineloquent 29/6, 2010 at 21:33

Thanks for the tip. The equiv branch had some impact on the 1.1 version, no noticable difference for the 1.2 version. There may be alternative implementations where the difference is more significant. The equiv branch still looks good, especially primitive type hints and :static. – Ascomycete 2/7, 2010 at 13:39

Thank your very much for your highly interesting contribution. I especially enjoyed reading the detailed analysis. I compiled the 1.2 version and compared it with the Java example in several runs and it was on average only 1.5x slower. Two questions have been evoked on my part. As you stated in your analysis the code is not idiomatic because it makes use of mutable variables. How much slower would the code perform if immutable variables were used? How much effort would be involved in parallelizing the mutable or the immutable version? – Snippy 5/7, 2010 at 10:18

Thanks, glad you liked it. 1.5x: Sounds good, we're in the ballpark of each other. Mutable vs. immutable: I haven't tried immutable, feel free to give it a go. I would expect somewhere between "mutable 1.2" and "mutable 1.1", but it's hard to guess. Parallelizing: This particular problem is hard to parallelize in any case. In general, multithreading with mutable values is a lot harder than with immutable values, which is part of the reason Clojure defaults to immutable. – Ascomycete 5/7, 2010 at 11:0

BTW: Here is a Clojure implementation of a much larger benchmark involving concurrency and log file parsing: meshy.org/2009/12/13/widefinder-2-with-clojure.html Well worth a read if you are into large-scale performance. – Ascomycete 5/7, 2010 at 17:46

Do you know if there is any special reason why the Clojure result is not listed in the Widefinder 2 benchmark table? wikis.sun.com/display/WideFinder/Results – Snippy 8/7, 2010 at 17:31

I was wondering the same. Tim Brady's comments seem to suggest that he would like it refactored to be more general tbray.org/ongoing/When/200x/2009/12/15/Osborne-WF2-Clojure I can only guess that the author felt he had spent enough time on it and didn't bother with the refactoring. But no, I don't know the reason. – Ascomycete 8/7, 2010 at 21:20

I wonder if Cantor might be of use to you -- it's a high performance math library for Clojure. Also see this thread on the Google group, which is about a similar project in the context of the new primitive arithmetic stuff.

Digitate answered 26/6, 2010 at 15:46 Comment(1)

Cantor looks useful, I'll have a look. Thanks! – Ascomycete 26/6, 2010 at 17:0

This is a slightly old question and the existing answers are somewhat out of date, so I'd like to add an update as of mid-2013 for those interested in "number crunching" in Clojure

There has been a lot happening in the Clojure Numerical computing space:

Clojure 1.5 is now out, which has a lot of improved support for numerical operations. In most cases, it's now possible to get very close to pure Java speed
A dedicated newsgroup - Numerical Clojure
core.matrix now provides an idiomatic API for matrix maths / numerical computing that supports multiple backend implementations (including native BLAS libraries)

Disclaimer: I'm a maintainer / contributor to several of the above.

Sybaris answered 9/9, 2013 at 3:17 Comment(0)

Recommended topics

Hot tags