Need a good overview for Succinct Data Structures [closed]
Asked Answered
S

0

6

Cross posted: Need a good overview for Succinct Data Structure algorithms

Since I knew about Succinct Data Structures I'm in a desperate need of a good overview of most recent developments in that area.

I have googled and read a lot of articles I could see in top of google results on requests from top of my head. I still suspect I have missed something important here.

Here are topics of particular interest for me:

  1. Succinct encoding of binary trees with efficient operations of getting parent, left/right child, number of elements in a subtree.

    The main question here is as following: all approaches I know of assume the tree nodes enumerated in breath-first order (like in the pioneer work in this area Jacobson, G. J (1988). Succinct static data structures), which does not seem appropriate for my task. I deal with huge binary trees given in depth-first layout and the depth-first node indices are keys to other node properties, so changing the tree layout has some cost for me which I'd like to minimize. Hence the interest in getting references to works considering other then BF tree layouts.

  2. Large variable-length items arrays in external memory. The arrays are immutable: I don't need to add/delete/edit the items. The only requirement is O(1) element access time and as low overhead as possible, better then straightforward offset and size approach. Here is some statistics I gathered about typical data for my task:

    typical number of items - hundreds of millions, up to tens of milliards;

    about 30% of items have length not more then 1 bit;

    40%-60% items have length less then 8 bits;

    only few percents of items have length between 32 and 255 bits (255 bits is the limit)

    average item length ~4 bit +/- 1 bit.

    any other distribution of item lengths is theoretically possible but all practically interesting cases have statistics close to the described above.

Links to articles of any complexity, tutorials of any obscurity, more or less documented C/C++ libraries, - anything what was useful for you in similar tasks or what looks like that by your educated guess - all such things are gratefully appreciated.

Update: I forgot to add to the question 1: binary trees I'm dealing with are immutable. I have no requirements for altering them, all i need is only traversing them in various ways always moving from node to children or to parent, so that the average cost of such operations was O(1).

Also, typical tree has milliards of nodes and should not be fully stored in RAM.

Update 2 Just if someone interested. I got a couple of good links in https://cstheory.stackexchange.com/a/11265/9276.

Sunward answered 30/4, 2012 at 17:33 Comment(2)
I get that you have a humongous amount of data, but there is a myriad of ways of dealing with that problem. So I'm just curious : why specifically succint data structures ?Maillot
First of all, because it's marvelously compact. 2bits per binary tree node is by far more superior then any other representation I knew or could think of. Additional O(N*loglog(N)/log(N)) bits of auxiliary space to deal with that representation is easily affordable. Because of compactness it looks very promising that for processing trees stored in external memory all performance drawbacks in using succinct representation are comparable with performance gain due to significantly more compact representation (unless there are serious locality problems like in my case with breath-first layout).Sunward

© 2022 - 2024 — McMap. All rights reserved.