Parent index could be found in O(log* n) time and O(1) space.
Here log* n means iterated logarithm: the number of times the logarithm function must be iteratively applied before the result is less than or equal to 1.
Actually that may be done even faster - in O(1) time if we could afford O(n) space for a large lookup table (storing parent index for each node in the tree).
Below I'll sketch several algorithms that do not need any extra space and give result in O(log n) worst case time, O(log log n) expected time, O(log log n) worst case time, and O(log* n) worst case time. They are based on following properties of post-order indexes for perfect binary tree:
- All indexes on the leftmost path of the tree are equal to 2i-1.
- Indexes of every right child of the node on the leftmost path are equal to 2i-2.
- Any node on the leftmost path and its right sub-tree are labeled with indexes having most-significant non-zero bit on the same position:
i
.
- Left sub-tree of any node on the leftmost path contains 2i-1 nodes. (Which means after subtracting 2i-1 we'll get a node that is positioned in similar way relative to its parent, has the same depth, but is closer to "special" nodes satisfying properties #1 and #2).
Properties #1 and #2 give simple algorithms to get parent node for some nodes of the tree: for indexes of the form 2i-1, multiply by 2
and add 1
; for indexes of the form 2i-2, just add 1
. For other nodes we could repeatedly use property #4 to come to the node satisfying property #1 or #2 (by subtracting sizes of several left sub-trees), then find some parent node located on the leftmost path, then add back all previously subtracted values. And property #3 allows to quickly find size of which sub-trees should be subtracted. So we have following algorithm:
- While
((x+1) & x) != 0
and ((x+2) & (x+1)) != 0
repeat step 2.
- Clear most-significant non-zero bit and add
1
. Accumulate the difference.
- If
((x+1) & x) == 0
, multiply by 2
and add 1
; otherwise if ((x+2) & (x+1)) == 0
, add 1
.
- Add back all differences accumulated on step 2.
For example, 12
(in binary form 0b1100
) is transformed on step #2 to 0b0101
, then to 0b0010
(or 2
in decimal). Accumulated difference is 10
. Step #3 adds 1
and step #4 adds back 10
, so the result is 13
.
Other example: 10
(in binary form 0b1010
) is transformed on step #2 to 0b0011
(or 3
in decimal). Step #3 doubles it (6
), then adds 1
(7
). Step #4 adds back accumulated difference (7
), so the result is 14
.
Time complexity is O(log n) - but only when all elementary operations are executed in O(1) time.
To improve time complexity we could perform several iterations of step #2 in parallel. We could get n/2
high-order bits of the index and compute population count on them. If after adding the result to the remaining low-order bits the sum does not overflow to these high-order bits, we could apply this approach recursively, with O(log log n) complexity. If we have an overflow, we could roll back to the original algorithm (bit-by-bit). Note that all set low-order bits should be also treated as overflow. So the resulting complexity is O(log log n) expected time.
Instead of rolling back to bit-by-bit we could handle overflow using binary search. We could determine how many high-order bits (less than n/2
) are to be selected so that we either have no overflow or (as for index 0b00101111111111
) the number of selected non-zero high-order bits is exactly 1
. Note that we do not need to apply this binary search procedure more than once because second overflow would not happen while number of bits in the number is greater than O(log log n). So the resulting complexity is O(log log n) worst case time. All elementary operations are assumed to be executed in O(1) time. If some operations (population count, leading zero count) are implemented in O(log log n) time, then our time complexity increases to O(log2 log n).
Instead of dividing bits of the index into two equal sets we could use different strategy:
- Compute population count of the index and add it to index value. Most significant bit that changed from
0
to 1
determines separation point for high-order/low-order bits.
- Compute population count on high-order bits, then add the result to low-order bits.
- If "separation" bit is non-zero and
((x+1) & x) != 0
and ((x+2) & (x+1)) != 0
, continue from step #1.
- If all low-order bits are set and least significant high-order bit is set, process this corner case as right child.
- If
((x+1) & x) != 0
and ((x+2) & (x+1)) != 0
, also process this as right child.
- If
((x+1) & x) == 0
, multiply by 2
and add 1
; otherwise if ((x+2) & (x+1)) == 0
, add 1
.
If condition of step #3 is satisfied, this means addition on step #2 resulted in carry to "separation" bit. Other low-order bits represent some number that cannot be greater than original population count. Number of set bits in this number cannot be greater than logarithm of population count of the original value. This means that number of set bits after each iteration is at most logarithm of the number of set bits on previous iteration. Therefore worst case time complexity is O(log* n). This is very close to O(1). For example, for 32-bit numbers we need approximately 2 iterations or less.
Every step of this algorithm should be obvious, except probably step #5, correctness of which is to be proven. Note that this step is executed only when adding population count results in carry to "separation" bit but adding population count of only high-order bits does not result in this carry. "Separation" bit corresponds to value 2i. Difference between population count of all bits and population count of only high-order bits is at most i
. So step #5 deals with a value at least 2i-i. Let's apply bit-by-bit algorithm to this value. 2i-i is large enough so that bit i-1
is set. Clear this bit and add 1
to the value in bits 0..i-2
. This value is at least 2i-1-(i-1) because we just subtracted 2i-1 and added 1
. Or if we move index one position to the right, we have again at least 2i-i. If we repeat this procedure we'll always find non-zero bit at position i-1
. Each step gradually decreases difference between value in bits 0..i-1
and nearest power of 2
. When this difference comes to 2
, we could stop this bit-by-bit algorithm because current node is clearly a right child. Since this bit-by-bit algorithm always comes to the same result, we could skip it and always process current node as a right child.
Here is C++ implementation of this algorithm. More details and some other algorithms could be found on ideone.
uint32_t getMSBmask(const uint32_t x)
{ return 1 << getMSBindex(x); }
uint32_t notSimpleCase(const uint32_t x)
{ return ((x+1) & x) && ((x+2) & (x+1)); }
uint32_t parent(const uint32_t node)
{
uint32_t x = node;
uint32_t bit = x;
while ((x & bit) && notSimpleCase(x))
{
const uint32_t y = x + popcnt(x);
bit = getMSBmask(y & ~x);
const uint32_t mask = (bit << 1) - 1;
const uint32_t z = (x & mask) + popcnt(x & ~mask);
if (z == mask && (x & (bit << 1)))
return node + 1;
x = z;
}
if (notSimpleCase(x))
return node + 1;
else
return node + 1 + (((x+1) & x)? 0: x);
}
If we need to find parent only for a leaf node, this algorithm and code may be simplified. (Ideone).
uint32_t parent(const uint32_t node)
{
uint32_t x = node;
while (x > 2)
{
const uint32_t y = x + popcnt(x);
const uint32_t bit = getMSBmask(y & ~x);
const uint32_t mask = (bit << 1) - 1;
x = (x & mask) + popcnt(x & ~mask);
}
return node + 1 + (x == 1);
}