Efficiency of purely functional programming
Asked Answered
E

4

415

Does anyone know what is the worst possible asymptotic slowdown that can happen when programming purely functionally as opposed to imperatively (i.e. allowing side-effects)?

Clarification from comment by itowlson: is there any problem for which the best known non-destructive algorithm is asymptotically worse than the best known destructive algorithm, and if so by how much?

Expansion answered 2/1, 2010 at 3:2 Comment(7)
The same as when programming imperatively, whatever that is.Dialectal
@jldupont: To return the result of a computation of course. Many side effect free programs exist. They just can't do much other than compute on their input. But that's still useful.Slowdown
I can make it as bad as you like, by writing my functional code badly! *grin* I think what you're asking is "is there any problem which for which the best known non-destructive algorithm is asymptotically worse than the best known destructive algorithm, and if so by how much?"... is that right?Kakapo
could you give an example of the type of slowdown you are interested in. your question is a bit vague.Fulk
@Kakapo - yes, that is what I meant. I agree it was badly phrased.Expansion
A user deleted his answer, but he claimed that the functional version of 8-queens problem ran in over a minute for n = 13. He admitted it wasn't "written very well", so I decided to write my own version of the 8 queens in F#: pastebin.com/ffa8d4c4 . Needless to say, my purely function program computes n = 20 in just over a second.Previdi
Some good comments on reddit: reddit.com/r/programming/comments/c76b8/…Grapefruit
A
553

According to Pippenger [1996], when comparing a Lisp system that is purely functional (and has strict evaluation semantics, not lazy) to one that can mutate data, an algorithm written for the impure Lisp that runs in O(n) can be translated to an algorithm in the pure Lisp that runs in O(n log n) time (based on work by Ben-Amram and Galil [1992] about simulating random access memory using only pointers). Pippenger also establishes that there are algorithms for which that is the best you can do; there are problems which are O(n) in the impure system which are Ω(n log n) in the pure system.

There are a few caveats to be made about this paper. The most significant is that it does not address lazy functional languages, such as Haskell. Bird, Jones and De Moor [1997] demonstrate that the problem constructed by Pippenger can be solved in a lazy functional language in O(n) time, but they do not establish (and as far as I know, no one has) whether or not a lazy functional language can solve all problems in the same asymptotic running time as a language with mutation.

The problem constructed by Pippenger requires Ω(n log n) is specifically constructed to achieve this result, and is not necessarily representative of practical, real-world problems. There are a few restrictions on the problem that are a bit unexpected, but necessary for the proof to work; in particular, the problem requires that results are computed on-line, without being able to access future input, and that the input consists of a sequence of atoms from an unbounded set of possible atoms, rather than a fixed size set. And the paper only establishes (lower bound) results for an impure algorithm of linear running time; for problems that require a greater running time, it is possible that the extra O(log n) factor seen in the linear problem may be able to be "absorbed" in the process of extra operations necessary for algorithms with greater running times. These clarifications and open questions are explored briefly by Ben-Amram [1996].

In practice, many algorithms can be implemented in a pure functional language at the same efficiency as in a language with mutable data structures. For a good reference on techniques to use for implementing purely functional data structures efficiently, see Chris Okasaki's "Purely Functional Data Structures" [Okasaki 1998] (which is an expanded version of his thesis [Okasaki 1996]).

Anyone who needs to implement algorithms on purely-functional data structures should read Okasaki. You can always get at worst an O(log n) slowdown per operation by simulating mutable memory with a balanced binary tree, but in many cases you can do considerably better than that, and Okasaki describes many useful techniques, from amortized techniques to real-time ones that do the amortized work incrementally. Purely functional data structures can be a bit difficult to work with and analyze, but they provide many benefits like referential transparency that are helpful in compiler optimization, in parallel and distributed computing, and in implementation of features like versioning, undo, and rollback.

Note also that all of this discusses only asymptotic running times. Many techniques for implementing purely functional data structures give you a certain amount of constant factor slowdown, due to extra bookkeeping necessary for them to work, and implementation details of the language in question. The benefits of purely functional data structures may outweigh these constant factor slowdowns, so you will generally need to make trade-offs based on the problem in question.

References

Aquifer answered 2/1, 2010 at 4:11 Comment(15)
Pippinger is the undisputed authority on this question. But we should emphasize that his results are theoretical, not practical. When it comes to making functional data structures practical and efficient, you can't do better than Okasaki.Pushcart
Wow, thanks, Brian, this is very interesting to know! Since I struggled with Pippenger's explanation, may I ask you: are these "can't do better" algorithms pathological cases artificially constructed to show where there is an unbridgeable difference between the pure and impure systems, or are they representative of algorithms that are regularly encountered "in the wild"? Thanks!Kakapo
itowlson: I must admit that I did not read enough of Pippenger to answer your question; it was published in a peer reviewed journal, cited by Okasaki, and I read enough of it to determine that his claims are relevant to this question, but not enough to understand the proof. The immediate takeaway that I get for real-world consequences is that it is trivial to convert an O(n) impure algorithm into an O(n log n) pure one, by simply simulating modifiable memory using a balanced binary tree. There are problems that cannot do better than that; I don't know if they're purely theoretical.Aquifer
The Pippenger result makes two important assumptions that limit its scope: it considers "on-line" or "reactive" computations (not the usual model of a computation mapping finite inputs to a single output) and "symbolic" computations where inputs are sequences of atoms, which can be tested only for equality (i.e., the interpretation of the input is extremely primitive).Tungting
Very good answer; I would like to add that for purely functional languages there is no universally agreed upon model for computing complexity, while in the impure world the unit-cost RAM machine is relatively standard (so this makes comparing things more difficult). Also note that the upper bound of a Lg(N) difference in pure/impure can be intuitively explained very easily by looking at an implementation of arrays in a pure language (it costs lg(n) per operation (and you get history)).Gumshoe
The purely functional clean (clean.cs.ru.nl) language allows destructive updates through uniqueness types. Can anyone comment on whether or not uniqueness types work around assumptions built into the results cited?Subtreasury
Never programmed in Clean, but I guess that uniqueness types are similar to the Haskell IO monad, which allows writing imperative code in a referentially transparent way, so they are not affected by these bounds. For instance, in Haskell you can write: var <- readLn: var2 <- readLn, and the two calls to readLn give different results, because of a different (implicit) state parameter. In Clean you can do something similar, and you could have an imperative array library like Haskell. About being "referentially transparent", see (for Haskell) "Tackling the awkward squad..." by Simon Peyton Jones.Greenburg
It's important to note that while many things can be implemented efficiently in a purely functional way, some important things cannot. For example purely functional dictionaries are a couple of times slower than mutable ones (esp. hash tables). Purely functional arrays are a lot slower than mutable ones.Unclose
@Jules: what do you mean with "purely functional arrays"? Using purely functional arrays is clearly a bad idea - the question is whether there is a purely functional replacement which has good enough performance. Maybe that's what you were talking about, but I'm not sure. Both hashmaps and arrays (which share the same problem) can be replaced with more tree-like structures (trees of arrays, informally speaking); for instance, Scala's version of purely functional arrays (Vector) is slower than a mutable array (not hard to believe after staring a bit at the code), but I don't know how much.Greenburg
@Unclose Isn't that what I said at the end of my answer? "Many techniques for implementing purely functional data structures give you a certain amount of constant factor slowdown, due to extra bookkeeping necessary for them to work, and implementation details of the language in question." And the whole answer was about the fact that an asymptotic slowdown is unavoidable in certain cases. I'm not sure what you are trying to add.Aquifer
@Brian your answer clearly establishes that the slowdown is at most logarithmic, but it can be difficult to know how this theoretical fact affects practice. Jules adds a pragmatic remark that in a lot of real-world scenarios, simply replacing a mutable array by a persistent structure will kill the performance of your program. The logarithmic cost, and the higher constant factor, really matter in practice. I myself used to think that the logarithmic slowdown would be neglectible or affordable even in performance-conscious situations, and have often found otherwise.Rebirth
(I do think the matter is much less clear for hash tables, however, especially since the work on Hash Array Mapped Tries by the Clojure people.)Rebirth
I think the real question is whether it's possible for a compiler to determine that the impure and pure approaches are equivalent and translate the latter to the former. Certainly it would be harder going in the other direction.Ringtail
Important point: Painstakingly translating a purely functional specification into a more complicated efficient purely functional implementation is of little benefit if you are going to eventually - either automatically or by hand - translate it into even more efficient impure code. Impurity is not such a big deal if you can keep it in a cage, e.g. by locking it up in an externally-side-effect-free function.Glorianna
@Robin All compilers of functional languages eventually generate impure code. (One of) the goal of (purely) functional languages is to allow programmers to avoid having to deal with concepts difficult to reason with, and difficult to parallelize, such as state and mutation. Sure, you can do somewhat-functional code. But that doesn't actually avoid the problems of having to deal with mutation and state, even if you have a "functional interface" that encapsulate the non-functional code. Of course, there are good, somewhat composable ways of writing imperative code, but purity still has benefits.Jalap
L
44

There are indeed several algorithms and data structures for which no asymptotically efficient purely functional solution (t.i. one implementable in pure lambda calculus) is known, even with laziness.

  • The aforementioned union-find
  • Hash tables
  • Arrays
  • Some graph algorithms
  • ...

However, we assume that in "imperative" languages access to memory is O(1) whereas in theory that can't be so asymptotically (i.e. for unbounded problem sizes) and access to memory within a huge dataset is always O(log n), which can be emulated in a functional language.

Also, we must remember that actually all modern functional languages provide mutable data, and Haskell even provides it without sacrificing purity (the ST monad).

Letterhead answered 23/5, 2010 at 6:17 Comment(4)
If the dataset fits in physical memory, access to it is O(1) in that it is possible to find an absolute upper bound on the amount of time to read any item. If the dataset does not, you're talking about I/O and that will be the dominant factor by far, however the program is written.Nihhi
Well, of course I'm talking about O(log n) operations of access to external memory. However, in any case I was talking bs: external memory can also be O(1)-addressable...Letterhead
I think one of the biggest things which imperative programming gains compared with functional programming is the ability to hold references to many distinct aspects of one state, and generate a new state such that all those references point to the corresponding aspects of the new state. Using functional programming would require direct dereferencing operations to be replaced by lookup operations to find the appropriate aspect of a particular version of the current overall state.Tims
Even the pointer model (O(log n) memory access, loosely speaking) isn't physically realistic at extremely large scales. The speed of light limits how quickly different pieces of computing equipment can communicate with each other, while it is currently believed that the maximum amount of information that can be held in a given region is bounded by its surface area.Osculation
F
36

This article claims that the known purely functional implementations of the union-find algorithm all have worse asymptotic complexity than the one they publish, which has a purely functional interface but uses mutable data internally.

The fact that other answers claim that there can never be any difference and that for instance, the only "drawback" of purely functional code is that it can be parallelized gives you an idea of the informedness/objectivity of the functional programming community on these matters.

EDIT:

Comments below point out that a biased discussion of the pros and cons of pure functional programming may not come from the “functional programming community”. Good point. Perhaps the advocates I see are just, to cite a comment, “illiterate”.

For instance, I think that this blog post is written by someone who could be said to be representative of the functional programming community, and since it's a list of “points for lazy evaluation”, it would be a good place to mention any drawback that lazy and purely functional programming might have. A good place would have been in place of the following (technically true, but biased to the point of not being funny) dismissal:

If strict a function has O(f(n)) complexity in a strict language then it has complexity O(f(n)) in a lazy language as well. Why worry? :)

Fiddlehead answered 2/1, 2010 at 3:17 Comment(0)
Z
4

With a fixed upper bound on memory usage, there should be no difference.

Proof sketch: Given a fixed upper bound on memory usage, one should be able to write a virtual machine which executes an imperative instruction set with the same asymptotic complexity as if you were actually executing on that machine. This is so because you can manage the mutable memory as a persistent data structure, giving O(log(n)) read and writes, but with a fixed upper bound on memory usage, you can have a fixed amount of memory, causing these to decay to O(1). Thus the functional implementation can be the imperative version running in the functional implementation of the VM, and so they should both have the same asymptotic complexity.

Z answered 2/1, 2010 at 4:13 Comment(2)
A fixed upper bound on memory usage isn't how people analyze these sorts of things; you assume an arbitrarily large, but finite memory. When implementing an algorithm, I'm interested in how it will scale from the simplest input up to any arbitrary input size. If you put a fixed upper bound on memory usage, why don't you also put a fixed upper bound on how long you'll allow the computation to take, and say that everything is O(1)?Aquifer
@Z Campbell: That is true. I'm just suggesting that if you wanted to, you could ignore the difference in the constant factor in many cases in practice. One would still need to be mindful of the difference when compromising between memory and time to make sure that using m times more memory decreases your runtime by at least a factor of log(m).Z

© 2022 - 2024 — McMap. All rights reserved.