Examples where compiler-optimized functional code performs better than imperative code

Asked 16/1, 2013 at 11:57 Answered 11/4, 2013 at 9:52

performance haskell optimization functional-programming imperative-programming

One of the promises of side-effect free, referentially transparent functional programming is that such code can be extensively optimized. To quote Wikipedia:

Immutability of data can, in many cases, lead to execution efficiency, by allowing the compiler to make assumptions that are unsafe in an imperative language, thus increasing opportunities for inline expansion.

I'd like to see examples where a functional language compiler outperforms an imperative one by producing a better optimized code.

Edit: I tried to give a specific scenario, but apparently it wasn't a good idea. So I'll try to explain it in a different way.

Programmers translate ideas (algorithms) into languages that machines can understand. At the same time, one of the most important aspects of the translation is that also humans can understand the resulting code. Unfortunately, in many cases there is a trade-off: A concise, readable code suffers from slow performance and needs to be manually optimized. This is error-prone, time consuming, and it makes the code less readable (up to totally unreadable).

The foundations of functional languages, such as immutability and referential transparency, allow compilers to perform extensive optimizations, which could replace manual optimization of code and free programmers from this trade-off. I'm looking for examples of ideas (algorithms) and their implementations, such that:

the (functional) implementation is close to the original idea and is easy to understand,
it is extensively optimized by the compiler of the language, and
it is hard (or impossible) to write similarly efficient code in an imperative language without manual optimizations that reduce its conciseness and readability.

I apologize if it is a bit vague, but I hope the idea is clear. I don't want to give unnecessary restrictions on the answers. I'm open to suggestions if someone knows how to express it better.

My interest isn't just theoretical. I'd like to use such examples (among other things) to motivate students to get interested in functional programming.

At first, I wasn't satisfied by a few examples suggested in the comments. On second thoughts I take my objections back, those are good examples. Please feel free to expand them to full answers so that people can comment and vote for them.

(One class of such examples will be most likely parallelized code, which can take advantage of multiple CPU cores. Often in functional languages this can be done easily without sacrificing code simplicity (like in Haskell by adding par or pseq in appropriate places). I' be interested in such examples too, but also in other, non-parallel ones.)

Exceptional answered 16/1, 2013 at 11:57 Comment(17)

You might be interested in Neil Mitchell's paper on super compilation. – Inbred 16/1, 2013 at 12:16

Ref.trans. and lazy optimize (from O(n log n) to O(n)) function min = head . sort – Humfried 16/1, 2013 at 12:18

This is a job for a "sufficiently smart" compiler -- which also results in full employment for compiler writers. – Splitlevel 16/1, 2013 at 12:54

As a general point, if the algorithm is identical, then you are doing a comparision of the runtime and the code generator - in that case , sometimes language semantics improve things. More interesting is when the language permits a different algorithm to be used... – Splitlevel 16/1, 2013 at 13:7

@DonStewart Sure, examples where a functional language permits a different, better algorithm are also welcomed. – Exceptional 16/1, 2013 at 13:19

This is better asked in theoretical CS, rather than SO, as you are wondering about fundamental limits of computational models. – Splitlevel 16/1, 2013 at 13:21

isn't @josejuan's example the perfect example. Rather than needing a separate algo to get the minimal element at O(n), in haskell min . head just works. It is a simple example, but impressive non the less. – Tamberg 16/1, 2013 at 13:26

@PetrPudlák, your premise is "suppose we have an algorithm A". My example show how a bad imperative algorithm (sort to take first) is optimized log n times in Haskell. I'm showing how Haskell optimize a naive solution. You can never compare one to one C++ vs Haskell (eg) compilers precisely because are using different strategies. You can compare the result of optimization, not optimization. – Humfried 16/1, 2013 at 14:7

A great question and people vote to close it. What the hell is wrong with this world?! – Dramaturge 16/1, 2013 at 14:39

@jberryman It's stated pretty clear that the comparison should be between the imperative and functional families, i.e. not between C++ and Java, but between either of them and Haskell. The open choice of the imperative language is evidently only given for the convenience of the answerer since in the scope of the question it doesn't matter much - there's nothing unconstructive about that. The algorithm choice is left open because it makes part of the question, which I see as boiling down to: "What are the principal pure algorithms which make the functional language optimizer shine?" – Dramaturge 16/1, 2013 at 16:19

If the algorithms are different, then you're not testing the same thing... – Splitlevel 16/1, 2013 at 17:31

@NikitaVolkov as Don suggests, you're not going to be comparing the same algorithm. For instance quicksort is an imperative algorithm for sorting "in-place". You could implement that in haskell, but you wouldn't be learning anything about referential transparency and efficiency. Or you could implement the usual sort-of-analogous functional flavor of quicksort and compare the two, but then you'd learn nothing because the algorithms really are completely different semantically. – Inveigle 16/1, 2013 at 17:42

I guess my blanket answer to this question is I strongly suspect almost any algorithm with functional semantics willd run faster when implemented in haskell than in any imperative language capable of accurately expressing the algorithm – Inveigle 16/1, 2013 at 17:44

@PetrPudlák "which can be concisely described by O(n) imperative algorithm" - if I'm allowed to arbitrarily pick the imperative algorithm I can always argue it's silly to pick an algorithm that performs worse than the compiled functional code. It's easy to argue that a functional solution is an ugly hack that's unidiomatic, whereas you could say that ugly hacks are always idiomatic in C. Answers to your question are necessarily very slippery eels indeed. – Huda 16/1, 2013 at 22:11

@Humfried I reconsidered idea and I changed my mind, your example is a good one. A variant that gets 10 least elements of a list (take 10 . sort) is even better. Perhaps this could be taken even further, to implement lazy quicksort and use it as an efficient selection algorithm. Maybe you could expand your comment to an answer so that people can vote for it. – Exceptional 17/1, 2013 at 11:53

Over on this question seeking a C equivalent of Haskell's lightweight threads The only answer so far is carefully explaining how it's not possible to make a thread library any faster than OS threads, and Haskell's threads are definitely lighter and more scalable. If the answerer is right, that would be a good answer to this question too, but I think the jury's out still. – Huda 17/1, 2013 at 12:22

@Huda Thanks for the link. I reconsidered the question and its comments a bit and I updated the question accordingly. I think my view was a bit too restricted and I'm open to a broader spectrum of answers now. – Exceptional 17/1, 2013 at 13:32

There are cases where the same algorithm will optimize better in a pure context. Specifically, stream fusion allows an algorithm that consists of a sequence of loops that may be of widely varying form: maps, filters, folds, unfolds, to be composed into a single loop.

The equivalent optimization in a conventional imperative setting, with mutable data in loops, would have to achieve a full effect analysis, which no one does.

So at least for the class of algorithms that are implemented as pipelines of ana- and catamorphisms on sequences, you can guarantee optimization results that are not possible in an imperative setting.

Splitlevel answered 16/1, 2013 at 14:30 Comment(6)

What about memoization? I believe there are plenty of opportunities to utilize that in pure code. – Dramaturge 16/1, 2013 at 14:41

@NikitaVolkov: Memoization is easy to add, the difficult part is deciding when it will actually help instead of just wasting lots of memory. – Alethaalethea 16/1, 2013 at 16:38

@C.A.McCann You have to admit that "lots" is quite a questionable measure, especially with the amounts of available memory these days. Anyway, so does it imply that there are no memoization-based optimizations nor plans to implement any in GHC? Just curious. – Dramaturge 16/1, 2013 at 17:12

@NikitaVolkov, automatic memoization optimizations are hard to get right because you don't know where it is helpful to put them -- but Haskell programmers readily whip out a memoization library when one is needed, and then type memo somewhere in their code. Feels more like a "user-directed optimization" than coding to me (obviously you do have to understand the problem and Haskell's evaluation model to know where to put it though). – Destefano 16/1, 2013 at 18:52

This sorta begs the question of whether the two are the "same algorithm." The expressions look superficially similar, but they are certainly not doing anywhere close to the same thing to the data. – Disproportionate 17/1, 2013 at 4:55

You can do similar stuff in C++ with template metaprogramming (see Eigen, Blitz++, std::valarray, etc), but it ain't pretty. – Mind 17/1, 2013 at 13:39

A very recent paper Haskell beats C using generalised stream fusion by Geoff Mainland, Simon Peyton Jones, Simon Marlow, Roman Leshchinskiy (submitted to ICFP 2013) describes such an example. Abstract (with the interesting part in bold):

Stream fusion [6] is a powerful technique for automatically transforming high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However, some operations, like vector append, still do not perform well within the standard stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in the framework at all.

In this paper we introduce generalized stream fusion, which solves these issues. The key insight is to bundle together multiple stream representations, each tuned for a particular class of stream consumer. We also describe a stream representation suited for efficient computation with SSE instructions. Our ideas are implemented in modified versions of the GHC compiler and vector library. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler- and hand-vectorized C.

Exceptional answered 11/4, 2013 at 9:52 Comment(0)

This is just a note, not an answer: the gcc has a pure attribute suggesting it can take account of purity; the obvious reasons are remarked on in the manual here.

I would think that 'static single assignment' imposes a form of purity -- see the links at http://lambda-the-ultimate.org/node/2860 or the wikipedia article.

Launch answered 28/1, 2013 at 19:6 Comment(0)

make and various build systems perform better for large projects by assuming that various build steps are referentially transparent; as such, they only need to rerun steps that have had their inputs change.

For small to medium sized changes, this can be a lot faster than building from scratch.

Stinnett answered 30/1, 2013 at 21:15 Comment(1)

While this is a good example in general, I'd say it doesn't give an answer to my question Examples where compiler-optimized functional code performs better than imperative code. – Exceptional 30/1, 2013 at 21:29

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags