Any nice tools for untying knots in Haskell?

Asked 20/1, 2012 at 17:3 Answered 21/1, 2012 at 23:17

I've a data structure with several different types of internal circular linking, making it infinite in the sense of say the cycle command. Are there any interesting modules for collapsing such structures into flat data structures that use indexing instead?

I'm interested in serializing the complete data structure, both via Read and Show as well as via Data.Serialize or similar.

There are obviously nice features of building a sequential index, but an index based upon hash values of memory addresses might work fine too.

Enforcement answered 20/1, 2012 at 17:3 Comment(0)

This isn't really possible; there is no way to detect, from within pure code, that the list repeat 1 is cyclic. Indeed, it could even not be cyclic, on sufficiently esoteric implementations of Haskell.

It is technically possible on common implementations of Haskell, however, using a technique called observable sharing,¹ but it's rather complicated, and unless you're a very experienced Haskell programmer, most of the time you don't really want to solve your problem with observable sharing.

Unless you have very special requirements that would make this a good idea, I would suggest instead representing the cyclic graph that your structure represents directly; for instance, using a graph library such as the standard Data.Graph or FGL.

If you do want to try observable sharing, though, I would suggest using the data-reify package (as described in Type-Safe Observable Sharing in Haskell), which only requires a simple type-class instance and has a safe, IO-based interface; it's essentially based on memory addresses (actually, StableNames), as you suggested.

One of the reasons you shouldn't use observable sharing unless you really have a need to is that it breaks referential transparency; for instance, it allows you to distinguish repeat 1 and 1 : repeat 1, and even let xs = 1 : xs in xs and let f x = x : f x in f 1, of these depending on compiler optimisations!

¹ The basic idea behind observable sharing is to expose the sharing done by standard Haskell implementations, which can be basically summarised as "duplicating values doesn't copy them"; so let xs = 1:xs in xs is a cyclic list, because the same value for the entirety of xs is used for its tail, rather than recomputing it each time, and (\x -> (x,x)) expensive (where expensive is some long-running computation that produces a large result) just results in two pointers to expensive, rather than copying the thunk (which means that, even if you force both fields, the computation will only be done once, and the result will only be stored once in memory).

Saretta answered 20/1, 2012 at 17:16 Comment(3)

Alright, I could simply assign each record an index when I create it, perhaps even using a random number generator if sequentially assignment proves difficult. I could then fold the data structure into only mentioning these indices by writing instance for Data.Traversable or something, well must make sure it stays finite of course, not sure if there should be a separate typeclass for finiteized traversals or soemthing. – Enforcement 20/1, 2012 at 19:3

@JeffBurdges: If you're already creating each structure element in a monad, then that should work fine; observable sharing is used when you want pure construction. I would recommend defining your own traversing function for something like this; it feels weird to overload the meaning of Traversable in that way. – Saretta 20/1, 2012 at 19:11

Or I could simply define a normal infinitary traversal instance but only use it when filling in an output mapping according to the indices, thus making it terminate. Thanks! – Enforcement 20/1, 2012 at 19:20

As mentioned in @Paul Johnson's answer, you can build a labelled version of your structure and use the labels as a means of referencing different parts of your structure.

But when you need it you can eliminate the labels leaving an optimised structure with all of its knots tied. You keep the original version for serialisation, but use the optimised version when your algorithms need it.

Here is some code:

import Debug.Trace
import Data.Map

data Tree a name = Leaf a | Fork (Tree a name) (Tree a name) | Name name deriving Show

instance Monad (Tree a) where
    return x = Name x
    Leaf a >>= f = Leaf a
    Fork l r >>= f = Fork (l >>= f) (r >>= f)
    Name a >>= f = f a

tie :: Map String (Tree Int String) -> Map String (Tree Int ())
tie tree = let result = fmap (>>= find) tree
               find name = trace ("Looking up " ++ name) $ flip (findWithDefault (Leaf 0)) result name
           in  result

treerefs :: Map String (Tree Int String)
treerefs = ("root"  `insert` Fork (Name "left") (Name "right")) $
           ("left"  `insert` Fork (Leaf 1)      (Name "root"))  $
           ("right" `insert` Fork (Name "root") (Leaf 2))       $
           empty

trees = tie treerefs
root = findWithDefault (Leaf 0) "root" trees

p (Leaf a) = "Leaf " ++ show a
p (Fork _ _) = "Fork"
p (Name n) = "Name " ++ show n

l (Fork a _) = a
r (Fork _ b) = b

test = p (r (r (r (l (r (l (l (r (l (r (r (l root))))))))))))

Notice how you can easily serialise treerefs but you can traverse root fast. I put the trace call in so you can see how often name lookup takes place. It really does close the loop (in GHC at least).

A tree with cyclic references

More generically, instead of Tree you can use any free monad generated by a functor and you can use the method here to speed up the knot tying. (My example above doesn't fully make use of the Monad instance but I wanted to make the connection with that paper.) Compare also with loeb.

Donatus answered 21/1, 2012 at 23:17 Comment(2)

Interesting. You've embedded the names into the tree, but you strip them later, cute. I'll think about such options more. – Enforcement 22/1, 2012 at 18:14

Ignore the connection with the paper I link to. The code as written above performs fine and isn't improved (and is in fact broken) by using it. – Donatus 24/1, 2012 at 22:8

If your data elements have unique identifiers (some papers use the term "names") then you can build a table of those identifiers and the ways that they relate to other identifiers. In OO implementations (and in the "observable sharing" described by ehird) this is done using the memory address of the object as an ID. Functional programming only has values, not objects with memory addresses, so you have to add the unique ID yourself when you build the structure. Then you can detect cycles by testing whether an ID belongs to the set of those already seen.

Boney answered 20/1, 2012 at 18:41 Comment(0)

You might be able to get away with using a simple wrapper type to signal a cycle end.

data CyclePart a = CycleMiddle a | CycleEnd a

unCyclePart :: CyclePart a -> a
unCyclePart (CycleMiddle x) = x
unCyclePart (CycleEnd x) = x

listToCycleParts :: [a] -> [CyclePart a]
listToCycleParts [] = error "empty list"
listToCycleParts [x] = [CycleEnd x]
listToCycleParts (x : xs) = CycleMiddle x : listToCylceParts xs

cycle' :: [a] -> [CyclePart a]
cylce' = cycle . listToCycleParts

uncycle :: [CyclePart a] -> [a]
uncycle (CycleMiddle x : xs) = x : uncycle xs
uncycle (CycleEnd x : _) = [x]
uncycle [] = error "purported cycle does not cycle"

Scratches answered 20/1, 2012 at 19:19 Comment(2)

What does this type offer over just keeping a finite list in a newtype wrapper that reminds us we're supposed to treat it as an infinite cycle? – Lindbom 20/1, 2012 at 19:22

I donno if application for this arise much, but my case involves various unpredictable cyclic behavior, making this almost impossible to implement. – Enforcement 20/1, 2012 at 19:33

Recommended topics

Hot tags