Are some data structures more suitable for functional programming than others?
Asked Answered
M

9

23

In Real World Haskell, there is a section titled "Life without arrays or hash tables" where the authors suggest that list and trees are preferred in functional programming, whereas an array or a hash table might be used instead in an imperative program.

This makes sense, since it's much easier to reuse part of an (immutable) list or tree when creating a new one than to do so with an array.

So my questions are:

  • Are there really significantly different usage patterns for data structures between functional and imperative programming?
  • If so, is this a problem?
  • What if you really do need a hash table for some application? Do you simply swallow the extra expense incurred for modifications?
Mariellemariellen answered 1/3, 2009 at 2:51 Comment(2)
OCAML uses a mutable hashtable for determining scope of variables in the language itself. So, it depends on the functional language, and how comfortable you are with functional programming in that style.Gillette
FWIW, the Glasgow Haskell Compiler also makes extensive use of hash tables in several different parts of the compiler.Enigmatic
V
15

The book Purely Functional Data Structures covers your questions in depth, and includes a great mix of theory and implementations primarily in ML - the appendix also contains Haskell implementations so you should be able to follow along with a bit of extra page turning. It is a pretty good (though difficult in parts) read if you are really interested in a thorough answer to your questions. Having said that I think ephemient gave a superb short answer.

edit: Steven Huwig provided a link to the thesis that the book started as. While I haven't read through it the only big thing missing (judging from the table of contents) are the Haskell implementations.

Vlissingen answered 1/3, 2009 at 4:44 Comment(1)
This is indeed a great book, covering not only a number of interesting data structures but also a then-new approach to proving amortized bounds for persistent data structures. However, if my memory serves me right, the Haskell code in the appendix doesn't provide worst-case bounds for any of the data structures, as (IIRC) it just gives a rough translation from the ML without the necessary strictness annotations.Solicit
T
13
  • Yes. Typically tuples, lists, and partially-evaluated functions are very common data structures in functional programming languages. Mutable data structures, like arrays and (real) hash tables, are used much less because they don't fit in as well with Haskell. SML (which is also functional, but not lazy) can use arrays more naturally than Haskell, but lists are still more common because they map well to recursive algorithms.
  • I'm not sure how to answer this. A problem for who?
  • There exist implementations of associative arrays ("hash table" equivalent) which can continue to share most of their underlying structure even after different updates. I believe GHC's Data.Map does; also, Edison has quite a few lazy/functional-friendly data structures.
Tomb answered 1/3, 2009 at 4:2 Comment(2)
By "Is this a problem," I meant "Is this a problem for functional programmers?" How does SML manage to use Arrays more naturally? I thought that the immutability was the biggest problem for arrays, rather than the laziness, so how does SML make it work?Mariellemariellen
Mutability is a lot friendlier when your language is strict (SML) instead of lazy (Haskell).Tomb
S
12

Speaking as someone who's been doing OO for many years, and recently has been building a largeish project requiring a good deal of speed (a real-time automated options trading system) in Haskell:

  • Are there really significantly different usage patterns for data structures between functional and imperative programming?

If you're talking about Haskell, yes, very much so. A large part of this is due to purity, however; the differences are somewhat less in other functional languages where mutable data is more often used. That said, as others have pointed out, recursive code and structures are much more heavily used in all or nearly all functional languages.

  • If so, is this a problem?

It hasn't been one for me, aside from having to spend some time to learn the new way of working. In particular, performance has certainly not been an issue: the system I'm working on runs considerably faster than the previous Java implementation, for example.

  • What if you really do need a hash table for some application? Do you simply swallow the extra expense incurred for modifications?

Generally the problem is not that "you really do need a hash table," but that you need access to certain data within some given time constraints (which may well be, "as fast as possible on some given hardware."). For that, you look around and do what you need to do. If that includes introducing mutability, I don't see a big problem with that, and you can do it in Haskell, though it might not be as convenient as it is in other languages. But do keep in mind, if you have a problem of this nature, it's certainly not going to be as simple as, "use a generic hash table and you're done." Extremely high performance for particular functionality on a particular hardware platform invariably takes a lot of work, and usually more than a few tricks. Preferring one language implementation over another just because it has some particular thing that works better than it does in other languages is, in my opinion, a rather unsophisticated approach to software engineering that is not likely to consistently produce good results.

Substantialize answered 16/5, 2009 at 8:24 Comment(0)
P
6

Chris Okasaki's thesis, Purely Functional Data Structures, is available for free online. It covers many different strategies for immutable persistent data representation.

As far as really needing a hash table, consider that an O(lg n) lookup is only twenty times as slow as an O(1) lookup when you are searching a million elements.

Prodigy answered 1/3, 2009 at 4:52 Comment(3)
On the hash table: not necessarily. Yes, a tree is O(n log n), but thats not the same thing as 20 times slower just because (log n) is 20.Comanchean
Balanced tree lookup is O(lg n), not O(n lg n). But yes, there are a lot of other factors that can come into play. Typically O-notation is with reference to some class of operation, in this case comparison operations.Prodigy
It's 20.7 times "slower" for 1 billion elements. For a million, it's 13.8 "slower".Agar
C
4

Yes, the usage patterns are dramatically different, but no it's not a problem. If you want a hash table, you usually mean you want a finite map with string keys and fast access. Bentley and Sedgewick's ternary search trees are pureful functional, and at least in some cases, they outperform hash tables.

As mentioned above, Chris Okasaki's book on purely functional data structures is very good.

Cannell answered 2/3, 2009 at 23:26 Comment(0)
B
1

Functional programs tend to put more emphasis on recursion. This, in turn, suggests the use of recursive algorithms and recursive data structures. Both lists and trees are recursive structures (the "next" link on a list is another list, and the children of a tree node are both trees).

You may want to reconsider if you're looking at extra expense on an algorithm. Why does the hash table (which is O(1) for a non-recursive algorithm) incur an extra expense? What advantage are you gaining by using it, as opposed to a tree or list?

Bestead answered 1/3, 2009 at 3:49 Comment(0)
S
0

Yes, the primary difference is immutability of the data, which can include code (see higher order functions). See the Wikipedia page on Purely Functional for a list of the common data types and usages. Whether its a problem or not depends on how you look at it. There are many advantages to programming in a functional language if it fits the type of task you are working on. A hash table is a type of associative array, but isn't the one you want to use in a functional language because of the rehashing you'd have to do on insert and the poor performance without arrays. Instead, try the Haskell implementation of Data.Map for an associative array.

Scorbutic answered 1/3, 2009 at 3:49 Comment(0)
F
0
  • Are there really significantly different usage patterns for data structures between functional and imperative programming?

Big, gigantic, night and day — largely because side-effects are not tolerated in functional programming.

  • If so, is this a problem?

The problem is for the imperative paradigms that will be unable to retain efficiency as parallelization becomes more necessary — the only way out for these languages will be to get rid of side-effects but then they will become broken functional languages - but then, why should I bother with them when there are some pretty good, working functional languages. Also, the semantics of functional languages is easier to control hence, functional programs can be proved correct whereas their C++ counterparts cannot (yet, anyway). Hence, many formal verification tools are based on functional languages — for example, ACL2 is based on common lisp and Cryptol is based on Haskell. Since Formal Verification is the wave of the future functional languages can better integrate with those tools. In short, say bye-bye to C,C++ and such — good ridence! Someone should have taken a 30 ought 6 to them a long time ago.

  • What if you really do need a hash table for some application? Do you simply swallow the extra expense incurred for modifications?

The wave of the future is this: you write a functional programming with specifying a hash table - the language you use is cryptol. When you are done and have proven that your program works you press a button and out pops an efficient version that uses a hash table if it has been decided that is the best thing to use.

Foreandaft answered 1/3, 2009 at 15:40 Comment(1)
-1: "The problem is for the imperative paradigms that will be unable to retain efficiency as parallelization becomes more necessary — the only way out for these languages will be to get rid of side-effects but then they will become broken functional languages". There is overwhelming evidence to the contrary. Purely functional solutions are slow on one core and scale badly to multiple cores because they saturate memory bandwidth and stress the GC. When the experts optimize their purely functional code for parallelism they do so by introducing mutation.Enigmatic
F
0

As far as hash tables in functional languages go: Since ACL2 was mentioned above, I'll note that there is a "hash cons" library for ACL2 that provides the logical story that's basically association-list-semantics but has the performance of a hashtable (e.g., you can lookup a value in a table using hons-get). If you're interested, check out the topic "hons" in the ACL2 users Manuals.

Flutter answered 19/8, 2014 at 21:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.