Functions that look pure to callers but internally use mutation
Asked Answered
G

7

8

I just got my copy of Expert F# 2.0 and came across this statement, which somewhat surprised me:

For example, when necessary, you can use side effects on private data structures allocated at the start of an algorithm and then discard these data structures before returning a result; the overall result is then effectively a side-effect-free function. One example of separation from the F# library is the library's implementation of List.map, which uses mutation internally; the writes occur on an internal, separated data structure that no other code can access.

Now, obviously the advantage to this approach is performance. I'm just curious if there are any disadvantages--do any of the pitfalls that can come with side-effects apply here? Is parallelisibility affected?

In other words, if performance were set aside, would it be preferable to implement List.map in a pure way?

(Obviously this deals with F# in particular, but I'm also curious about general philosophy)

Goad answered 13/9, 2010 at 3:53 Comment(1)
Maybe change the title to something more direct, like "Are pure functon which are using mutable stuff internal harmfull?"Alvita
O
15

I think nearly every disadvantage of side-effects is tied to "interaction with other portions of the program". Side-effects themselves aren't bad (as @Gabe says, even a pure functional program is constantly mutating RAM), it's the common-side-consequences of effects (non-local interactions) which cause problems (with debugging/performance/understandability/etc.). So effects on purely local state (e.g. on a local variable that does not escape) are fine.

(The only detriment I can think of is that, when a human sees such a local mutable, they have to reason about whether it can escape. In F#, local mutables can never escape (closures cannot capture mutables), so the only potential "mental tax" comes from reasoning about mutable reference types.)

Summary: it's fine to use effects, so long as it's simple to convince one's self that the effects only happen on non-escaping locals. (It's also ok to use effects in other cases, but I'm ignoring those other cases, since on this question-thread we are the enlightened functional programmers trying to eschew effects whenever it's reasonable. :) )

(If you want to go very deep, local effects, like those in the implementation of F#'s List.map, are not only a non-hindrance to parallelizability, but actually a benefit, from the point of view that the more-efficient implementation allocates less, and thus is less of a strain on the shared resource of the GC.)

Occultism answered 13/9, 2010 at 5:50 Comment(7)
re: local effects can be more efficient. FUZxxl points out below that purely functional code can be more easily optimised by the compiler. This effect would seem to balance, or outweigh, the more efficient non-optimised solution.Bergamot
Optimizing compilers are a myth (except perhaps in Haskell). Ok, that statement is too strong, but it is exceedingly rare for a compiler of an impure language to do a better job "optimizing" pure code than for a human to hand-optimize it with impure code. Compilers are simply not that good.Occultism
@Brian: Given that the underlying hardware is "impure", I think it's safe to say that a sufficiently smart hand-optimizing human writing impure code will beat any compiler ever. Haskell compilers are amazingly clever but there's a reason why things like the ST monad exist, and all of GHC's deep magic mostly just gets you similar performance to competently-written (not optimized) impure code. Without the extensive static guarantees provided by Haskell, it's crazy to expect other languages to fare any better (though I've heard OCaml can sometimes get impressive results).Oversell
You're a big fan of parenthetical statements, aren't you? :)Soldiery
That's coz english doesn't have #light :)Bergamot
@camccann: OCaml gets great performance through a very different route though: by having a simple and predictable cost model and a language design that tends to make short code fast. That is basically the exact opposite of Haskell.Unction
Optimizing compilers are a myth C'mon. You don't want e.g. search in C++ for every function that used once in code to write inline, especially if it's big. But a compiler is smart enough to see for link-time optimization that call'n'ret unnecessary here, and inline it. It can also reveal that a small loop executed only once or twice, and unroll it — but you wouldn't want do it explicitly in code. Plus compiler may perform optimizations dependent of which CPU instructions are available, and even optimize for CPU cache size (the thing Gentoo peoples love).Redmer
F
6

You might be interested in Simon Peyton Jones's "Lazy Functional State Threads". I've only ever made it through the first few pages, which are very clear (I'm sure the rest is very clear as well).

The important point is that when you use Control.Monad.ST to do this kind of thing in Haskell, the type system itself enforces the encapsulation. In Scala (and probably F#) the approach is more "just trust us that we're not doing anything sneaky over here with this ListBuffer in your map".

Furnishings answered 13/9, 2010 at 4:47 Comment(5)
In practice, uncontrolled side effects are rife in real Haskell code though.Unction
@JonHarrop, can you provide an example?Acanthocephalan
@sebleblanc: Sure. I studied several pieces of open source Haskell code such as the game Frag and found they had resorted to unsafePerformIO a lot. I asked the authors why and they all said "performance". I have since heard that the underlying problems have been addressed and this is no longer needed so it may no longer be the case.Unction
For the record, I could not find a single occurrence of unsafePerformIO in all published versions of Frag.Acanthocephalan
Again, can you back up your claims?Acanthocephalan
C
5

If a function uses local, private (to the function) mutable data structures, parallelization is not affected. So if the map function internally creates an array of the size of the list and iterates over its elements filling in the array, you could still run map 100 times concurrently on the same list and not worry because each instance of map will have its own private array. Since your code cannot see the contents of the array until it has been populated, it is effectively pure (remember, at some level your computer has to actually mutate the state of RAM).

On the other hand if a function uses global mutable data structures, parallelization could be affected. For example, let's say you have a Memoize function. Obviously the whole point of it is to maintain some global state (although "global" in the sense that it is not local to the function call, it is still "private" in the sense that it is not accessible outside the function) so that it doesn't have to run a function multiple times with the same arguments, yet it is still pure because the same inputs will always produce the same outputs. If your cache data structure is thread-safe (like ConcurrentDictionary) then you can still run your function in parallel with itself. If not, then you could argue that the function is not pure because it has side effects that are observable when run concurrently.

I should add that it is a common technique in F# to start off with a purely functional routine, then optimize it by taking advantage of mutable state (e.g. caching, explicit looping) when profiling shows that it's too slow.

Cretonne answered 13/9, 2010 at 4:48 Comment(1)
Actually, that is a common misconception. Impure guts tend to improve scalability because in-place mutation means better space reuse and, therefore, less allocation and a better memory footprint which mean less contention for the garbage collector and main memory, respectively. Less contention for shared resources means better scalability.Unction
V
4

Same approach can be found in use in Clojure. The immutable data structures in Clojure - list, map and vector - have their "transient" counterparts which are mutable. The Clojure reference about transient urges to use them only in the code which cannot be seen by "any other code".

There are guards against leaking transients in client code:

  • The usual function which work on the immutable data structures don't work on transients. Calling them will throw an exception.

  • The transients are bound to the thread they are created in. Modifying them from any other thread will throw an exception.

The clojure.core code itself uses a lot of transients behind the scenes.

The main benefit of using transients is the massive speed-up they provide.

So the tightly controlled use of mutable state seems to be OK in the functional languages.

Viosterol answered 13/9, 2010 at 5:24 Comment(0)
B
2

It won't affect whether the function can be run in parallel with other functions. It will affect whether the function's internals can be made parallel though - but that's unlikely to be an issue for most small functions (such as map), targeting a PC.

I've noticed is that some good F# programmers (on the web, and in books) seem be very relaxed about using imperative techniques for loops. They seem to prefer a simple loop, with mutable loop variables, to a complex recursive function.

Bergamot answered 13/9, 2010 at 4:27 Comment(0)
A
2

One issue can be, that a good functional compiler is constructed to optimize "functional" code well, but if you're using some mutable stuff, the compiler may not optimize as good as in the other case. In the worst case, this leads to more inefficient algorithms then the immutable variant.

The other issue I can think of is laziness - a mutable data structure is usually not lazy, thus a mutable unction may force unneccesary evaluation of arguments.

Alvita answered 13/9, 2010 at 5:47 Comment(0)
W
0

I would answer this with a question: "are you writing the function, or using the function?"

There are 2 aspects to functions, users and developers.

As a user, one doesn't care about the internal structure of a function at all. It can be coded in byte code and use hard side effects internally from now until judgement day so long as it matches the contract of data in->data out that one expects. A function is a black box or an oracle, it's internal structure is irrelevant(assuming it doesn't do anything stupid and external).

As a developer of a function, the internal structure matters a lot. Immutability, const correctness and avoiding side effects all help develop and maintain a function, and expand the function into the parallel domain.

Many people develop a function that they then use, so both of these aspects apply.

What the advantages are of immutability vs. mutable structures is a different question.

Window answered 13/9, 2010 at 19:40 Comment(1)
But this was exactly the question.Alvita

© 2022 - 2024 — McMap. All rights reserved.