Node search in Binary Tree overflows stack
Asked Answered
F

6

18

I use the following method to traverse* a binary tree of 300 000 levels:

Node* find(int v){
   if(value==v)
      return this;
   else if(right && value<v)
      return right->find(v); 
   else if(left && value>v)
      return left->find(v);
}

However I get a segmentation fault due to stack overflow. Any ideas on how to traverse the deep tree without the overhead of recursive function calls?

* By "traverse" I mean "search for a node with given value", not full tree traversal.

Fiedler answered 28/3, 2017 at 9:6 Comment(14)
300.000 levels? You know that a full binary tree with N levels has 2^N-1 nodes? This comment is too small to write out how many nodes your tree can have !Edgell
Perhaps the tree is stored externally and is not loaded to memory as a whole.Tutor
@Edgell That is for a balanced binary tree. Perhaps this tree is not balanced.Adamsite
What do you return if nothing is found?Ionize
@ArthurTacca: Actually, the figure I quoted is for a fully populated tree. A tree can be balanced even if not fully populated, but every fully populated tree is balanced. And every binary tree with 2^N-1 nodes on N levels is fully populated.Edgell
Oh and how do you build a tree of 300,000 levels?Ionize
@n.m I return nullptr if nothing is found. Actually I build it using a loopFiedler
"I return nullptr if nothing is found" can't see it in the posted code. "Actually I build it using a loop" Then why not search it using a loop?Ionize
Wait, that's not a full traversal, that's a node search. A node search is easy to do iteratively; a full traversal requires either recursion (using the system stack or a custom stack) or a doubly-linked tree.Prune
You said somewhere else that you tried Tail Recursion, I'm curious as to why that didn't work out/what you tried, that was my first instinctTrautman
Have you considered an AVL tree? I highly suspect your tree is not balanced.Sarnoff
The stack overflow issue could be due to the fact that your function falls off the end without returning a value in some cases. The garbage pointer value that is apparently returned could be a pointer to some garbage which happens to accidentally implement a circular list. The recursive call which receives this garbage will then traverse it until a stack overflow. Apply more compiler warnings to your code, and use tools like Valgrind.Rockling
I think you can use tails recursion if you compile with O2. https://mcmap.net/q/669063/-tail-recursion-in-gcc-gHemihedral
The fact that your tree has 300,000 levels indicates that your tree is quite unbalanced, e.g. generated by inserting from an ordered list without making sure it doesn't degenerate. A severely unbalanced binary search tree defeats the purpose of binary search trees, which is efficient search (O(log n) instead of O(n)). While you still may want to use an iterative search algorithm to be safe from stack overflow, you probably also should take steps to ensure that your tree is more balanced.Whelk
B
27

Yes! For a 300 000 level tree avoid recursion. Traverse your tree and find the value iteratively using a loop.

Binary Search Tree representation

           25             // Level 1
        20    36          // Level 2
      10 22  30 40        // Level 3
  .. .. .. .. .. .. .. 
.. .. .. .. .. .. .. ..   // Level n

Just to clarify the problem further. Your tree has a depth of n = 300.000 levels. Thus, in the worst case scenario a Binary Search Tree (BST) will have to visit ALL of the tree's nodes. This is bad news because that worst case has an algorithmic O(n) time complexity. Such a tree can have:

2ˆ300.000 nodes = 9.9701e+90308 nodes (approximately).


9.9701e+90308 nodes is an exponentially massive number of nodes to visit. With these numbers it becomes so clear why the call stack overflows.


Solution (iterative way):

I'm assuming your Node class/struct declaration is a classic standard integer BST one. Then you could adapt it and it will work:

struct Node {
    int data;
    Node* right;
    Node* left;
};

Node* find(int v) {
    Node* temp = root;  // temp Node* value copy to not mess up tree structure by changing the root
    while (temp != nullptr) {
        if (temp->data == v) {
            return temp;
        }
        if (v > temp->data) {
            temp = temp->right;
        }
        else {
            temp = temp->left;
        }
    }
    return nullptr;
}

Taking this iterative approach avoids recursion, hence saving you the hassle of having to recursively find the value in a tree so large with your program call stack.

Blindheim answered 28/3, 2017 at 9:38 Comment(7)
Wait, that's not a full traversal, that's a node search. That said, the same applies to the code in the question. A node search is easy to do iteratively; a full traversal requires either recursion (using the system stack or a custom stack) or a doubly-linked tree.Prune
Possibly you meant Node* temp = this; not Node* temp = root; and temp = left; instead of temp = temp->left; etc. – see the original code, it looks like an inline method in the Node class.Trader
@Prune who said anything about traversal. This is a find, which the OP asked about to avoid the 300 000 line constraint.Blindheim
@user54264611634646244 "who said anything about traversal" ---> "Binary Tree Traversal overflows stack" ಠ_ಠPrune
@Medinoc. Lol yeah you got a point, it's in the title, but not in the answer.Blindheim
Up for actually returning nullptr if not found, down again for if - return - else.Thumbnail
@greybeard. Yeah.Blindheim
P
9

A simple loop where you have a variable of type Node* which you set to the next node, then loop again ...
Don't forget the case that the value you are searching for does not exist!

Peril answered 28/3, 2017 at 9:12 Comment(3)
This approach does not permit to return from the recursion and process the left subtree if the desired value is not found in the right subtree.Tutor
@Tutor Right, but the example code does not too (... else if ...). Which is the normal procedure when searching in a sorted binary tree. If the tree is not sorted and all nodes must be visited, then, yes, it does not work like this.Peril
Thanks for the clarification, I was overlooking that.Tutor
T
7

You could implement the recursion by not using the call stack but a user-defined stack or something similar; this could be done via the existing stack template. The approach would be to have a while loop which iterates until the stack is empty; as the existing implementaion uses depth-first search, elimination of the recursive calls can be found here.

Tutor answered 28/3, 2017 at 9:9 Comment(1)
When the stack overflows, use a different stack =PGastrostomy
F
6

When the tree that you have is a Binary Search Tree, and all you want to do is search for a node in it that has a specific value, then things are simple: no recursion is necessary, you can do it using a simple loop as others have pointed out.

In the more general case of having a tree which is not necessarily a Binary Search Tree, and wanting to perform a full traversal of it, the simplest way is using recursion, but as you already understand, if the tree is very deep, then recursion will not work.

So, in order to avoid recursion, you have to implement a stack on the C++ heap. You need to declare a new StackElement class that will contain one member for each local variable that your original recursive function had, and one member for each parameter that your original recursive function accepted. (You might be able to get away with fewer member variables, you can worry about that after you have gotten your code to work.)

You can store instances of StackElement in a stack collection, or you can simply have each one of them contain a pointer to its parent, thus fully implementing the stack by yourself.

So, instead of your function recursively calling itself, it will simply consist of a loop. Your function enters the loop with the current StackElement being initialized with information about the root node of your tree. Its parent pointer will be null, which is another way of saying that the stack will be empty.

In every place where the recursive version of your function was calling itself, your new function will be allocating a new instance of StackElement, initializing it, and repeating the loop using this new instance as the current element.

In every place where the recursive version of your function was returning, your new function will be releasing the current StackElement, popping the one that was sitting on the top of the stack, making it the new current element, and repeating the loop.

When you find the node you were looking for, you simply break from the loop.

Alternatively, if the node of your existing tree supports a) a link to its "parent" node and b) user data (where you can store a "visited" flag) then you don't need to implement your own stack, you can just traverse the tree in-place: in each iteration of your loop you first check if the current node is the node you were looking for; if not, then you enumerate through children until you find one which has not been visited yet, and then you visit it; when you reach a leaf, or a node whose children have all been visited, then you back-track by following the link to the parent. Also, if you have the freedom to destroy the tree as you are traversing it, then you do not even need the concept of "user data": once you are done with a child node, you free it and make it null.

Fireboard answered 28/3, 2017 at 9:13 Comment(1)
If there is a stack, that's still recursion, you just moved the stack. And for traversal of a doubly-linked tree, you don't actually mean a "visited" flag: You just need to keep a pointer to the previous visited node in addition to the current one.Prune
A
3

Well, it can be made tail recursive at the cost of a single additional local variable and a few comparisons:

Node* find(int v){
  if(value==v)
    return this;
  else if(!right && value<v)
    return NULL;
  else if(!left && value>v)
    return NULL;
  else {
    Node *tmp = NULL;
    if(value<v)
      tmp = right;
    else if(value>v)
      tmp = left;
    return tmp->find(v);
  }
}
Airiness answered 28/3, 2017 at 23:36 Comment(1)
+1 for the idea but please don’t write code like this. Your else branch can be made much more readable and less error-prone (and, lastly, much shorter) by directly initialising the variable tmp: Node* tmp = value < v ? right : left;.Rackrent
V
0

Walking through a binary tree is a recursive process, where you'll keep walking until you find that the node you're at currently points nowhere.

It is that you need an appropriate base condition. Something which looks like:

if (treeNode == NULL)
   return NULL;

In general, traversing a tree is accomplished this way (in C):

void traverse(treeNode *pTree){
  if (pTree==0)
    return;
  printf("%d\n",pTree->nodeData);
  traverse(pTree->leftChild);
  traverse(pTree->rightChild);
}
Vacuole answered 28/3, 2017 at 9:27 Comment(4)
This is exactly what I want to avoid: recursion. I even tried a tail call recursion but it doesn't work out of the box and looks like it need compiler optimization fine tuningFiedler
@MarwanB What do you mean by “doesn’t work out of the box”? It works with optimisations enabled on every modern C++ compiler. And you’re compiling with optimisations, right?Rackrent
@KonradRudolph I'm curious about one thing though - if a recursive function has a well-thought-out base case, shouldn't it be able to handle deep recursion? Like somewhere in the ballpark of 300K levels (like in the question).Vacuole
@PrashantWarrier If it’s tail recursive, absolutely: it will handle an arbitrary depth. However, this function is actually not completely tail recursive (and can’t be, I think): the second call to traverse (or find) can be made into a tail call, but the first one cannot. This has nothing to do with optimisation, in fact: it’s simply not possible. I should have made that clear in my previous comment.Rackrent

© 2022 - 2024 — McMap. All rights reserved.