O(1) algorithm to determine if node is descendant of another node in a multiway tree?
Asked Answered
G

8

12

Imagine the following tree:

    A
   / \
  B   C
 / \   \
D   E   F

I'm looking for a way to query if for example F is a descendant of A (note: F doesn't need to be a direct descendant of A), which, in this particular case would be true. Only a limited amount of potential parent nodes need to be tested against a larger potential descendants node pool.

When testing whether a node is a descendant of a node in the potential parent pool, it needs to be tested against ALL potential parent nodes.

This is what a came up with:

  • Convert multiway tree to a trie, i.e. assign the following prefixes to every node in the above tree:

     A = 1
     B = 11
     C = 12
     D = 111
     E = 112
     F = 121
    
  • Then, reserve a bit array for every possible prefix size and add the parent nodes to be tested against, i.e. if C is added to the potential parent node pool, do:

      1    2    3  <- Prefix length
    
    *[1]  [1]  ...
     [2] *[2]  ...
     [3]  [3]  ...
     [4]  [4]  ...
     ...  ...
    
  • When testing if a node is a descendant of a potential parent node, take its trie prefix, lookup the first character in the first "prefix array" (see above) and if it is present, lookup the second prefix character in the second "prefix array" and so on, i.e. testing F leads to:

     F = 1    2    1
    
       *[1]  [1]  ...
        [2] *[2]  ...
        [3]  [3]  ...
        [4]  [4]  ...
        ...  ...
    

    so yes F, is a descendant of C.

This test seems to be worst case O(n), where n = maximum prefix length = maximum tree depth, so its worst case is exactly equal to the obvious way of just going up the tree and comparing nodes. However, this performs much better if the tested node is near the bottom of the tree and the potential parent node is somewhere at the top. Combining both algorithms would mitigate both worst case scenarios. However, memory overhead is a concern.

Is there another way for doing that? Any pointers greatly appreciated!

Gatewood answered 16/5, 2011 at 16:27 Comment(1)
Is this always a binary tree?Lipoma
W
8

Are your input trees always static? If so, then you can use a Lowest Common Ancestor algorithm to answer the is descendant question in O(1) time with an O(n) time/space construction. An LCA query is given two nodes and asked which is the lowest node in the tree whose subtree contains both nodes. Then you can answer the IsDescendent query with a single LCA query, if LCA(A, B) == A or LCA(A, B) == B, then one is the descendent of the other.

This Topcoder algorithm tuorial gives a thorough discussion of the problem and a few solutions at various levels of code complexity/efficiency.

Walters answered 16/5, 2011 at 17:51 Comment(1)
Excellent tutorial, thanks! Learned a few things :) Unfortunately, my tree isn't static, but I think the sqrt-parts LCA idea can be easily adopted for dynamic use when the tree depth is always approximately the same, as in my case.Gatewood
S
4

I don't know if this would fit your problem, but one way to store hierarchies in databases, with quick "give me everything from this node and downwards" features is to store a "path".

For instance, for a tree that looks like this:

    +-- b
    |
a --+       +-- d
    |       |
    +-- c --+
            |
            +-- e

you would store the rows as follows, assuming the letter in the above tree is the "id" of each row:

id    path
a     a
b     a*b
c     a*c
d     a*c*d
e     a*c*e

To find all descendants of a particular node, you would do a "STARTSWITH" query on the path column, ie. all nodes with a path that starts with a*c*

To find out if a particular node is a descendant of another node, you would see if the longest path started with the shortest path.

So for instance:

  • e is a descendant of a since a*c*e starts with a
  • d is a descendant of c since a*c*d starts with a*c

Would that be useful in your instance?

Socle answered 16/5, 2011 at 16:34 Comment(3)
This is exactly what I'm proposing, only for databases, as far as I can tell :) In a native implementation, STARTSWITH would be the slow part - for m potential parent nodes with a max prefix length of n, this would be O(nm) I think, whereas the bit arrays make this O(n), at the expense of exponential memory growth :(Gatewood
path length may be is O(log(n) ) so starts with takes O(log(n))Tinstone
Would a static mapping table work? ie. you explode all combinations, so you would store for every node a list of descendants, where that list is populated with all descendants, regardless of depth? It would have a fair amount of startup cost to build the lookup tables, but after that would be able to answer "what nodes are descendants of X" in constant time (amortized), at least if you used a hashtable/dictionary.Socle
B
3

Traversing any tree will require "depth-of-tree" steps. Therefore if you maintain balanced tree structure it is provable that you will need O(log n) operations for your lookup operation. From what I understand your tree looks special and you can not maintain it in a balanced way, right? So O(n) will be possible. But this is bad during creation of the tree anyways, so you will probably die before you use the lookup anyway...

Depending on how often you will need that lookup operation compared to insert, you could decide to pay during insert to maintain an extra data structure. I would suggest a hashing if you really need amortized O(1). On every insert operation you put all parents of a node into a hashtable. By your description this could be O(n) items on a given insert. If you do n inserts this sounds bad (towards O(n^2)), but actually your tree can not degrade that bad, so you probably get an amortized overall hastable size of O(n log n). (actually, the log n part depends on the degration-degree of your tree. If you expect it to be maximal degraed, don't do it.)

So, you would pay about O(log n) on every insert, and get hashtable efficiency O(1) for a lookup.

Bigham answered 16/5, 2011 at 16:48 Comment(2)
I just saw that I didn't clarify my question enough, sorry about that :( The problem is that there is a potential parent node pool, say, A, B, C - then I'd like to see if F is a descendant of ANY node in the parent pool. With your hashmap idea this would still be O(n), but the memory overhead should be much less :)Gatewood
Actually, I would not follow my own advice. I don't like hashmaps much, I like guarantees better. Secondly, I think I overlooked something in my description... but if not: Whatever "node pool" means, I think the hash-entry should work there as well. Speaking in C++-parlance it is actually a set<> what I am thinking of. The entries would be pair<Node,Node>, and you would use insert( make_pair(parent,child) ). Then if you want to check any child.isThatAParent(potentialParent) is just a return set.find( make_pair(node,potentialParent)!=end). Check done for any given pair of in O(1).Bigham
M
2

For a M-way tree, instead of your bit array, why not just store the binary "trie id" (using M bits per level) with each node? For your example (assuming M==2) : A=0b01, B=0b0101, C=0b1001, ...

Then you can do the test in O(1):

bool IsParent(node* child, node* parent)
{ 
   return ((child->id & parent->id) == parent->id)
}

You could compress the storage to ceil(lg2(M)) bits per level if you have a fast FindMSB() function which returns the position of the most significant bit set:

mask = (1<<( FindMSB(parent->id)+1) ) -1;
retunr (child->id&mask == parent->id);
Mayers answered 16/5, 2011 at 16:38 Comment(7)
Can you elaborate on how to compute that trie id? Keep in mind that its a multiway tree with potentially many nodes per level.Gatewood
I'm assuming that A) you can already compute the ID in the form you described above, and B) M is fixed. For the first function, just do foreach digit: id= (id<<M) | (1<<(digit-1)). For the 2nd method, it's: id = (id<<B) | digit, where B is the number of bits needed to hold M (i.e: findMSB(M)+1).Mayers
And if M exceeds 32/64 bits just use a long array and then loop over all id parts when ANDing? I like that :)Gatewood
-OR- if you have storage and don't want to go mucking around with bitfields, store the ids as byte strings in exactly the form you showed: for example A is "\x01", F is "\x01\x02\x01". That can handle up to 255 children per node, and you can do the test with strstr(child,parent)==parentMayers
@Mayers Shouldn't C=0b1001 be C=0b0110 in your example with M==2?Parkland
No, the representation is right, it's the construction algorithm in my first comment that was wrong. We want level 0's id to be in the least significant bits, so that we don't have to shift it left as we add more levels. To assign trie ids to a node's children numbered from 1..M, the construction should be trie_id = (1 << (childNum-1)) | parent_idMayers
@Mayers OK, I got it now, thank you for your comment! Just to be sure if I got everything right, in your example A=0b01, B=0b0101, C=0b1001, ... you reserve 2 bits per tree depth + 1, so e.g. if a tree has a depth of 11 like the following: 1 (root, depth 0) -> 2 (child of 1, depth 1) -> 3 (child of 2, depth 2) -> ... -> 12 (child of 11, depth 11); you will use 2 bits * (11 depth + 1) = 24 bits bits for the trie ID of the nodes which have depth 11, right?Parkland
M
1

In a pre-order traversal, every set of descendants is contiguous. For your example,

A B D E C F
+---------+ A
  +---+ B
    + D
      + E
        +-+ C
          + F

If you can preprocess, then all you need to do is number each node and compute the descendant interval.

If you can't preprocess, then a link/cut tree offers O(log n) performance for both updates and queries.

Meridithmeriel answered 16/5, 2011 at 17:11 Comment(1)
Note: log n performance is not contingent on a particular tree topology.Meridithmeriel
Z
0

You can answer query of the form "Is node A a descendant of node B?" in constant time, by just using two auxiliary arrays.

Preprocess the tree, by visiting in Depth-First order, and for each node A store its starting and ending time in the visit in the two arrays Start[] and End[].

So, let us say that End[u] and Start[u] are respectively the ending and starting time of the visit of node u.

Then node u is a descendant of node v if and only if:

Start[v] <= Start[u] and End[u] <= End[v].

and you are done, checking this condition requires just two lookup in the arrays Start and End

Zamarripa answered 24/11, 2012 at 15:49 Comment(1)
Unfortunately the tree isn't static, so I can't really preprocess it.Gatewood
A
0

Take a look at Nested set model It's very effective to select but too slow to update

Acceptance answered 11/4, 2018 at 9:24 Comment(0)
S
0

For what it's worth, what you're asking for here is equivalent to testing if a class is a subtype of another class in a class hierarchy, and in implementations like CPython this is just done the good old fashioned "iterate the parents looking for the parent" way.

Silica answered 10/8, 2021 at 17:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.