There's currently a pull request by Jonathan S. to replace the implementation of Data.IntMap
with one explained in this README based on ideas from a blog post by Edward Kmett.
The basic concept Jonathan S. developed from is that an IntMap
is a binary tree that looks like this (I've made some slight changes to his development for the sake of consistency):
data IntMap0 a = Empty | NonEmpty (IntMapNE0 a)
data IntMapNE0 a =
Tip !Int a
| Bin { lo :: !Int
, hi :: !Int
, left :: !(IntMapNE0 a)
, right :: !(IntMapNE0 a) }
In this representation, each node has a field indicating the least and greatest key contained in the IntMapNE0
. Using just a little bit fiddling allows this to be used as a PATRICIA trie. Jonathan noted that this structure has almost twice as much range information as it needs. Following a left or right spine will produce all the same lo
or hi
bounds. So he cut those out by only including the bound not determined by the ancestors:
data IntMap1 a = Empty | NonEmpty { topLo :: !Int, child :: !(IntMapNE1 a) }
data IntMapNE1 a =
Tip a
| IntMapNE1 { bound :: !Int
, left :: !(IntMapNE1 a)
, right :: !(IntMapNE1 a)
Now each node has either a left bound or a right bound, but not both. A right child will have only a left bound, while a left child will have only a right bound.
Jonathan makes one further change, moving the values out of the leaves and into the internal nodes, which places them exactly where they are determined. He also uses phantom types to help track left and right. The final type (for now, anyway) is
data L
data R
newtype IntMap a = IntMap (IntMap_ L a) deriving (Eq)
data IntMap_ t a = NonEmpty !Int a !(Node t a) | Empty deriving (Eq)
data Node t a = Bin !Int a !(Node L a) !(Node R a) | Tip deriving (Eq, Show)
Certain aspects of this new implementation are quite attractive. Most importantly, many of the most-used operations are substantially faster. Less importantly, but very nicely, the bit fiddling involved is much easier to understand.
However, there is one serious pain point: passing the missing range information down through the tree. This isn't so bad for lookups, insertions, etc., but gets pretty seriously hairy in the union and intersection code. Is there some abstraction that would allow this to be done automatically?
A couple extremely vague thoughts:
Could the phantom types be used with a custom class to tie treatment directly to handedness?
The "missing piece" nature is somewhat reminiscent of some zippery situations. Might there be a way to use ideas from that realm?
I've started thinking about using an intermediate type of some sort to provide a symmetrical "view" of the structure, but I'm a bit stuck. I can fairly easily convert back and forth between the basic structure and the fancy one, but that conversion is recursive. I need a way to convert only partially, but I don't know nearly enough about fancily built types to get it done.