Purely functional data structures with copy-on-write?

Asked 4/9, 2009 at 21:19 Answered 10/5, 2016 at 16:8

python functional-programming immutability copy-on-write

I want to have the advantage of functional data structures (multiple versions of data that can share structure) but be able to modify it in an imperative style.

What I'm thinking about (and a possible use): a RPG game in which whole game history is stored (for example, to allow for travelling back in time). Using copy-on-write, I could simply clone the structure holding game state and modify it by introducing a new turn - but have access to the earlier turns (not necessarily all of them, maybe just selected snapshots of game state), without the penalty of having to copy everything every time.

Let's say foo is a map.

bar = foo.clone()

Nothing of foo's structure (for example, tree) gets copied yet. However, from now on bar is treated as a copy and no changes are allowed to propagate back to `foo'.

baz = bar[someKey]
baz.modifyInSomeWay()

Now

a new object gets created, that is a modified copy of baz.
bar is replaced with a new map, holding new baz (possibly retaining some of foo's structure).
foo is unaffected.

But if we then do...

baz.modifyAgain()

...baz can be just modified, because we have a recent version of it. bar doesn't need to be changed.

All this requires holding some version information about foo and bar, increasing it on foo.clone(), and passing it to baz somehow (by making it a proxy object?).

Also, any part of structure that has been cloned becomes a 'part of history' and can't be changed anymore, which could be enforced at run-time.

This resembles JavaScript's prototypes a bit, but I it's more as it allows for changes to propagate upwards. I think it would be something like a version control system.

Has this been done, and to what extent?
Is this a good idea? If not, is there a way to save it?
How could it be implemented? I was thinking about building it on top of some high-level GC-ed language, like Python.

Biome answered 4/9, 2009 at 21:19 Comment(1)

probably pyrsistent is what are you looking for – Marengo 24/5, 2017 at 12:29

Functional ("persistent") data structures are typically recursively built up out of immutable nodes (e.g. singly linked list where common suffixes are shared, search tree or heap where only the parts of the tree structure that are on the path from the root to the updated item need copying).

Anything where the entire set has to be copied for every modification is bad. In those cases, you'd tend to overlay small "diffs" that are checked (taking extra time) with recursion to previous versions. Every so often, when the diffs become too large, you could do a deep copy/rebuild (so the amortized cost is not as bad).

Persistent data structures can have significant overhead, but if you have very efficient allocation of small objects (JVM GC qualifies), they can be very practical - in the best case, equally as fast as the mutable equivalent, giving persistence only at the cost of the memory used - which can be much less than a full copy on every update.

Depending on your language, you'll probably find the syntax you want difficult to achieve without operator overloading for assignment. Lvalues (mutable references) in C++ definitely require non-persistent semantics.

Zeringue answered 4/9, 2009 at 21:36 Comment(0)

1. Has this been done, and to what extent?

Yes, see for example qt5 implicit sharing.

2. Is this a good idea? If not, is there a way to save it?

Yes, it is a good idea. One of the proposed alternatives is to use a fully immutable data structure (wrapped in a imperative interface), but the problem is that even if a object is the only one to point to a data, a modification operation will create a copy of the data (there is no in place update), this is inefficient. Using the copy on write approach, a copy is made only in the first modification, subsequent modifications alters the copied data in place (if another reference to the same data wasn't created, off course).

3. How could it be implemented? I was thinking about building it on top of some high-level GC-ed language, like Python.

One way is to use reference counting, like in qt (see a description here). To implement it will requires either assignment operator overloading or a explicit method call (like bar = foo.clone(), but it may be fragile, what happens if someone forget to call clone and just do bar = foo?), so the counting can be kept.

Other possibility in tho create a proxy object, like you said. See for example a pycow (a python implementation).

Whiny answered 10/5, 2016 at 16:8 Comment(0)

-1

This sounds awfully convoluted and error prone compared to just having a fully immutable data structure and then using a wrapper which holds a reference to it and exposes an imperative interface which works by updating the wrapped version.

e.g. in Scala

class ImmutableData{
   def doStuff(blah : Blah) : ImmutableData = implementation
}

class MutableData(var wrapped : ImmutableData){
  def doStuff(blah : Blah) : Unit = { wrapped = wrapped.doStuff(blah); } 
}

Sure it means you have to make two versions of the interface, but the semantics are a lot saner.

Conney answered 5/9, 2009 at 10:6 Comment(2)

Right, but this means updating everything else manually - I can't use such MutableData in any other immutable data structure. – Biome 5/9, 2009 at 10:46

I don't follow. If you want to use it immutably, you can get a snapshot version by just unwrapping it. – Conney 6/9, 2009 at 22:6

Recommended topics

Hot tags