Is there some way to reduce the pain of range tracking?
Asked Answered
I

1

19

There's currently a pull request by Jonathan S. to replace the implementation of Data.IntMap with one explained in this README based on ideas from a blog post by Edward Kmett.

The basic concept Jonathan S. developed from is that an IntMap is a binary tree that looks like this (I've made some slight changes to his development for the sake of consistency):

data IntMap0 a = Empty | NonEmpty (IntMapNE0 a) 
data IntMapNE0 a =
    Tip !Int a
  | Bin { lo :: !Int
        , hi :: !Int
        , left :: !(IntMapNE0 a)
        , right :: !(IntMapNE0 a) }

In this representation, each node has a field indicating the least and greatest key contained in the IntMapNE0. Using just a little bit fiddling allows this to be used as a PATRICIA trie. Jonathan noted that this structure has almost twice as much range information as it needs. Following a left or right spine will produce all the same lo or hi bounds. So he cut those out by only including the bound not determined by the ancestors:

data IntMap1 a = Empty | NonEmpty { topLo :: !Int, child :: !(IntMapNE1 a) }
data IntMapNE1 a =
    Tip a
  | IntMapNE1 { bound :: !Int
              , left :: !(IntMapNE1 a)
              , right :: !(IntMapNE1 a)

Now each node has either a left bound or a right bound, but not both. A right child will have only a left bound, while a left child will have only a right bound.

Jonathan makes one further change, moving the values out of the leaves and into the internal nodes, which places them exactly where they are determined. He also uses phantom types to help track left and right. The final type (for now, anyway) is

data L
data R
newtype IntMap a = IntMap (IntMap_ L a) deriving (Eq)
data IntMap_ t a = NonEmpty !Int a !(Node t a) | Empty deriving (Eq)
data Node t a = Bin !Int a !(Node L a) !(Node R a) | Tip deriving (Eq, Show)

Certain aspects of this new implementation are quite attractive. Most importantly, many of the most-used operations are substantially faster. Less importantly, but very nicely, the bit fiddling involved is much easier to understand.

However, there is one serious pain point: passing the missing range information down through the tree. This isn't so bad for lookups, insertions, etc., but gets pretty seriously hairy in the union and intersection code. Is there some abstraction that would allow this to be done automatically?

A couple extremely vague thoughts:

  1. Could the phantom types be used with a custom class to tie treatment directly to handedness?

  2. The "missing piece" nature is somewhat reminiscent of some zippery situations. Might there be a way to use ideas from that realm?


I've started thinking about using an intermediate type of some sort to provide a symmetrical "view" of the structure, but I'm a bit stuck. I can fairly easily convert back and forth between the basic structure and the fancy one, but that conversion is recursive. I need a way to convert only partially, but I don't know nearly enough about fancily built types to get it done.

Indisposed answered 15/9, 2016 at 21:6 Comment(3)
While this is interesting, I think you should expand on the question. Currently it's too reliant on external links, SO questions and answers should be mostly self contained.Coastland
@Cubic, I think I fixed it.Indisposed
Perhaps it would be helpful if you copy into the question the "easy" union/intersection code for the case where you've got repeated bounds info, so we can try to modify it to work on the harder case.Jealous
E
2

Is there some abstraction that would allow this to be done automatically?

You should be able to define a set of pattern synonyms that give you that. I’ll start from the second-to-last variant of your code, i.e.:

data IntMap1 a = Empty | NonEmpty { topLo :: !Int, child :: !(IntMapNE1 a) }
data IntMapNE1 a =
    Tip a
  | IntMapNE1 { bound :: !Int
              , left :: !(IntMapNE1 a)
              , right :: !(IntMapNE1 a)

We tuple such a value with the bound from the parent in an Either (indicating whether it is a low or a high bound).

viewLoHi (Left lo, IntMapNE1 hi left right)
    = Just (lo, hi, (Left lo, left), (Right hi, right)
viewLoHi (Right hi, IntMapNE1 lo left right)
    = Just (lo, hi, (Left lo, left), (Right hi, right)
viewLoHi _
    = Nothing

pattern Bin' lo hi left right <- (viewLoHi -> Just (lo, hi, left, right))

The top-level data type is different, so it needs its own pattern synonym

viewLoHi' (NonEmpty lo child) = viewLoHi (Left lo, child)
viewLoHi' Empty = Nothing

pattern NonEmpty' lo hi left right <- (viewLoHi' -> Just (lo, hi, left, right)

Using only NonEmpty' and Bin' as you traverse the tree, the bookkeeping should now be completely hidden. (Code not tested, so there will be typos here)

Escallop answered 29/3, 2017 at 21:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.