Examples where compiler-optimized functional code performs better than imperative code
Asked Answered
E

4

31

One of the promises of side-effect free, referentially transparent functional programming is that such code can be extensively optimized. To quote Wikipedia:

Immutability of data can, in many cases, lead to execution efficiency, by allowing the compiler to make assumptions that are unsafe in an imperative language, thus increasing opportunities for inline expansion.

I'd like to see examples where a functional language compiler outperforms an imperative one by producing a better optimized code.

Edit: I tried to give a specific scenario, but apparently it wasn't a good idea. So I'll try to explain it in a different way.

Programmers translate ideas (algorithms) into languages that machines can understand. At the same time, one of the most important aspects of the translation is that also humans can understand the resulting code. Unfortunately, in many cases there is a trade-off: A concise, readable code suffers from slow performance and needs to be manually optimized. This is error-prone, time consuming, and it makes the code less readable (up to totally unreadable).

The foundations of functional languages, such as immutability and referential transparency, allow compilers to perform extensive optimizations, which could replace manual optimization of code and free programmers from this trade-off. I'm looking for examples of ideas (algorithms) and their implementations, such that:

  1. the (functional) implementation is close to the original idea and is easy to understand,
  2. it is extensively optimized by the compiler of the language, and
  3. it is hard (or impossible) to write similarly efficient code in an imperative language without manual optimizations that reduce its conciseness and readability.

I apologize if it is a bit vague, but I hope the idea is clear. I don't want to give unnecessary restrictions on the answers. I'm open to suggestions if someone knows how to express it better.

My interest isn't just theoretical. I'd like to use such examples (among other things) to motivate students to get interested in functional programming.

At first, I wasn't satisfied by a few examples suggested in the comments. On second thoughts I take my objections back, those are good examples. Please feel free to expand them to full answers so that people can comment and vote for them.


(One class of such examples will be most likely parallelized code, which can take advantage of multiple CPU cores. Often in functional languages this can be done easily without sacrificing code simplicity (like in Haskell by adding par or pseq in appropriate places). I' be interested in such examples too, but also in other, non-parallel ones.)

Exceptional answered 16/1, 2013 at 11:57 Comment(17)
You might be interested in Neil Mitchell's paper on super compilation.Inbred
Ref.trans. and lazy optimize (from O(n log n) to O(n)) function min = head . sortHumfried
This is a job for a "sufficiently smart" compiler -- which also results in full employment for compiler writers.Splitlevel
As a general point, if the algorithm is identical, then you are doing a comparision of the runtime and the code generator - in that case , sometimes language semantics improve things. More interesting is when the language permits a different algorithm to be used...Splitlevel
@DonStewart Sure, examples where a functional language permits a different, better algorithm are also welcomed.Exceptional
This is better asked in theoretical CS, rather than SO, as you are wondering about fundamental limits of computational models.Splitlevel
isn't @josejuan's example the perfect example. Rather than needing a separate algo to get the minimal element at O(n), in haskell min . head just works. It is a simple example, but impressive non the less.Tamberg
@PetrPudlák, your premise is "suppose we have an algorithm A". My example show how a bad imperative algorithm (sort to take first) is optimized log n times in Haskell. I'm showing how Haskell optimize a naive solution. You can never compare one to one C++ vs Haskell (eg) compilers precisely because are using different strategies. You can compare the result of optimization, not optimization.Humfried
A great question and people vote to close it. What the hell is wrong with this world?!Dramaturge
@jberryman It's stated pretty clear that the comparison should be between the imperative and functional families, i.e. not between C++ and Java, but between either of them and Haskell. The open choice of the imperative language is evidently only given for the convenience of the answerer since in the scope of the question it doesn't matter much - there's nothing unconstructive about that. The algorithm choice is left open because it makes part of the question, which I see as boiling down to: "What are the principal pure algorithms which make the functional language optimizer shine?"Dramaturge
If the algorithms are different, then you're not testing the same thing...Splitlevel
@NikitaVolkov as Don suggests, you're not going to be comparing the same algorithm. For instance quicksort is an imperative algorithm for sorting "in-place". You could implement that in haskell, but you wouldn't be learning anything about referential transparency and efficiency. Or you could implement the usual sort-of-analogous functional flavor of quicksort and compare the two, but then you'd learn nothing because the algorithms really are completely different semantically.Inveigle
I guess my blanket answer to this question is I strongly suspect almost any algorithm with functional semantics willd run faster when implemented in haskell than in any imperative language capable of accurately expressing the algorithmInveigle
@PetrPudlák "which can be concisely described by O(n) imperative algorithm" - if I'm allowed to arbitrarily pick the imperative algorithm I can always argue it's silly to pick an algorithm that performs worse than the compiled functional code. It's easy to argue that a functional solution is an ugly hack that's unidiomatic, whereas you could say that ugly hacks are always idiomatic in C. Answers to your question are necessarily very slippery eels indeed.Huda
@Humfried I reconsidered idea and I changed my mind, your example is a good one. A variant that gets 10 least elements of a list (take 10 . sort) is even better. Perhaps this could be taken even further, to implement lazy quicksort and use it as an efficient selection algorithm. Maybe you could expand your comment to an answer so that people can vote for it.Exceptional
Over on this question seeking a C equivalent of Haskell's lightweight threads The only answer so far is carefully explaining how it's not possible to make a thread library any faster than OS threads, and Haskell's threads are definitely lighter and more scalable. If the answerer is right, that would be a good answer to this question too, but I think the jury's out still.Huda
@Huda Thanks for the link. I reconsidered the question and its comments a bit and I updated the question accordingly. I think my view was a bit too restricted and I'm open to a broader spectrum of answers now.Exceptional
S
23

There are cases where the same algorithm will optimize better in a pure context. Specifically, stream fusion allows an algorithm that consists of a sequence of loops that may be of widely varying form: maps, filters, folds, unfolds, to be composed into a single loop.

The equivalent optimization in a conventional imperative setting, with mutable data in loops, would have to achieve a full effect analysis, which no one does.

So at least for the class of algorithms that are implemented as pipelines of ana- and catamorphisms on sequences, you can guarantee optimization results that are not possible in an imperative setting.

Splitlevel answered 16/1, 2013 at 14:30 Comment(6)
What about memoization? I believe there are plenty of opportunities to utilize that in pure code.Dramaturge
@NikitaVolkov: Memoization is easy to add, the difficult part is deciding when it will actually help instead of just wasting lots of memory.Alethaalethea
@C.A.McCann You have to admit that "lots" is quite a questionable measure, especially with the amounts of available memory these days. Anyway, so does it imply that there are no memoization-based optimizations nor plans to implement any in GHC? Just curious.Dramaturge
@NikitaVolkov, automatic memoization optimizations are hard to get right because you don't know where it is helpful to put them -- but Haskell programmers readily whip out a memoization library when one is needed, and then type memo somewhere in their code. Feels more like a "user-directed optimization" than coding to me (obviously you do have to understand the problem and Haskell's evaluation model to know where to put it though).Destefano
This sorta begs the question of whether the two are the "same algorithm." The expressions look superficially similar, but they are certainly not doing anywhere close to the same thing to the data.Disproportionate
You can do similar stuff in C++ with template metaprogramming (see Eigen, Blitz++, std::valarray, etc), but it ain't pretty.Mind
E
8

A very recent paper Haskell beats C using generalised stream fusion by Geoff Mainland, Simon Peyton Jones, Simon Marlow, Roman Leshchinskiy (submitted to ICFP 2013) describes such an example. Abstract (with the interesting part in bold):

Stream fusion [6] is a powerful technique for automatically transforming high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However, some operations, like vector append, still do not perform well within the standard stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in the framework at all.

In this paper we introduce generalized stream fusion, which solves these issues. The key insight is to bundle together multiple stream representations, each tuned for a particular class of stream consumer. We also describe a stream representation suited for efficient computation with SSE instructions. Our ideas are implemented in modified versions of the GHC compiler and vector library. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler- and hand-vectorized C.

Exceptional answered 11/4, 2013 at 9:52 Comment(0)
L
3

This is just a note, not an answer: the gcc has a pure attribute suggesting it can take account of purity; the obvious reasons are remarked on in the manual here.

I would think that 'static single assignment' imposes a form of purity -- see the links at http://lambda-the-ultimate.org/node/2860 or the wikipedia article.

Launch answered 28/1, 2013 at 19:6 Comment(0)
S
0

make and various build systems perform better for large projects by assuming that various build steps are referentially transparent; as such, they only need to rerun steps that have had their inputs change.

For small to medium sized changes, this can be a lot faster than building from scratch.

Stinnett answered 30/1, 2013 at 21:15 Comment(1)
While this is a good example in general, I'd say it doesn't give an answer to my question Examples where compiler-optimized functional code performs better than imperative code.Exceptional

© 2022 - 2024 — McMap. All rights reserved.