I have a collection of sets that I'd like to place in a trie.
Normal tries are made of strings of elements - that is, the order of the elements is important. Sets lack a defined order, so there's the possibility of greater compression.
For example, given the strings "abc"
, "bc"
, and "c"
, I'd create the trie:
(*,3) -> ('a',1) -> ('b',1) -> ('c',1)
-> ('b',1) -> ('c',1)
-> ('c',1)
But given the sets { 'a', 'b', 'c' }
, { 'b', 'c' }
, { 'c' }
, I could create the above trie, or any of these eleven:
(*,3) -> ('a',1) -> ('b',1) -> ('c',1)
-> ('c',2) -> ('a',1)
(*,3) -> ('a',1) -> ('c',1) -> ('b',1)
-> ('b',1) -> ('c',1)
-> ('c',1)
(*,3) -> ('a',1) -> ('c',1) -> ('b',1)
-> ('c',2) -> ('a',1)
(*,3) -> ('b',2) -> ('a',1) -> ('c',1)
-> ('c',1)
-> ('c',1)
(*,3) -> ('b',1) -> ('a',1) -> ('c',1)
-> ('c',2) -> ('b',1)
(*,3) -> ('b',2) -> ('c',2) -> ('a',1)
-> ('c',1)
(*,3) -> ('b',1) -> ('c',1) -> ('a',1)
-> ('c',2) -> ('b',1)
(*,3) -> ('c',2) -> ('a',1) -> ('b',1)
-> ('b',1) -> ('c',1)
(*,3) -> ('c',2) -> ('a',1) -> ('b',1)
-> ('b',1)
(*,3) -> ('c',2) -> ('b',1) -> ('a',1)
-> ('b',1) -> ('c',1)
(*,3) -> ('c',3) -> ('b',2) -> ('a',1)
So there's obviously room for compression (7 nodes to 4).
I suspect defining a local order at each node dependent on the relative frequency of its children would do it, but I'm not certain, and it might be overly expensive.
So before I hit the whiteboard, and start cracking away at my own compression algorithm, is there an existing one? How expensive is it? Is it a bulk process, or can it be done per-insert/delete?