SIGSEGV when trying to solve large mazes with A*
Asked Answered
R

3

5

I'm implementing an A* pathfinding algorithm in C++ to solve a maze, but I'm encountering a segmentation fault (SIGSEGV) when the maze size is high (~10,000x10,000 or larger).

The error occurs in the ~__shared_count() function in shared_ptr_base.h after node = queue.top() is called in line 58 and dozens of ~Node() deconstructors are called after that.

Edit: The error actually occurs in the _Sp_counted_base<_S_atomic>::_M_release() function now, though I don't know what I changed.

Here are the parts of my code that should be relevant. I'm new to using shared pointers so I'm not really sure what the problem is or how to fix it.

main.cpp

#include <iostream>
#include <queue>
#include <random>
#include <vector>
#include <algorithm>
#include <stack>
#include <fstream>

#include "Node.hpp"

constexpr uint16_t maze_size = 10000;

void solve(const uint16_t start_x, const uint16_t start_y, const uint16_t goal_x,
           const uint16_t goal_y,
           const std::vector<std::vector<std::pair<bool, bool> > > &walls) {
    std::vector<std::vector<bool> > visited(maze_size, std::vector<bool>(maze_size, false));

    // Min heap for our queue
    std::priority_queue<Node, std::vector<Node>, std::greater<> > queue;

    Node node(start_x, start_y, goal_x, goal_y);

    visited[node.x][node.y] = true;

    while (!node.solved(goal_x, goal_y)) {
        // Add all valid neighbors to the queue
        {
            const auto x = node.x;
            const auto y = node.y;
            const auto ptr = std::make_shared<Node>(node);
            // Right
            if (x < maze_size - 1 && !walls[x][y].first && !visited[x + 1][y]) {
                queue.emplace(x + 1, y, goal_x, goal_y, ptr);
                visited[x + 1][y] = true;
            }
            // Left
            if (x > 0 && !walls[x - 1][y].first && !visited[x - 1][y]) {
                queue.emplace(x - 1, y, goal_x, goal_y, ptr);
                visited[x - 1][y] = true;
            }
            // Down
            if (y < maze_size - 1 && !walls[x][y].second && !visited[x][y + 1]) {
                queue.emplace(x, y + 1, goal_x, goal_y, ptr);
                visited[x][y + 1] = true;
            }
            // Up
            if (y > 0 && !walls[x][y - 1].second && !visited[x][y - 1]) {
                queue.emplace(x, y - 1, goal_x, goal_y, ptr);
                visited[x][y - 1] = true;
            }
        }
        node = queue.top();
        queue.pop();
    }
}

Node.hpp

#ifndef NODE_HPP
#define NODE_HPP

#include <cstdint>
#include <memory>

class Node {
public:
    uint16_t x, y;
    uint32_t steps;
    uint32_t value;

    std::shared_ptr<Node> parent;

    Node(const uint16_t x, const uint16_t y, const uint16_t goal_x, const uint16_t goal_y,
         const std::shared_ptr<Node> &parent = nullptr) : x(x), y(y), parent(parent) {
        steps = parent ? parent->steps + 1 : 0;
        value = abs(static_cast<int32_t>(goal_x) - static_cast<int32_t>(x)) +
                abs(static_cast<int32_t>(goal_y) - static_cast<int32_t>(y)) +
                steps;
    }

    [[nodiscard]] bool solved(const uint16_t goal_x, const uint16_t goal_y) const {
        return x == goal_x && y == goal_y;
    }

    // Overload the > operator for the priority queue
    bool operator>(const Node &rhs) const {
        if (value == rhs.value) {
            return steps < rhs.steps;
        }
        return value > rhs.value;
    }
};

#endif

Valgrind output

valgrind --leak-check=full ./Tests 
==5385== Memcheck, a memory error detector
==5385== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==5385== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==5385== Command: ./Tests
==5385== 
Maze loaded10000
==5385== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==5385== 
==5385== Process terminating with default action of signal 11 (SIGSEGV)
==5385==  Access not within mapped region at address 0x1FFE801FF8
==5385== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==5385==    at 0x116230: std::_Sp_counted_ptr_inplace<Node, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:613)
==5385==  If you believe this happened as a result of a stack
==5385==  overflow in your program's main thread (unlikely but
==5385==  possible), you can try to increase the size of the
==5385==  main thread stack using the --main-stacksize= flag.
==5385==  The main thread stack size used in this run was 8388608.
==5385== 
==5385== HEAP SUMMARY:
==5385==     in use at exit: 239,026,864 bytes in 556,422 blocks
==5385==   total heap usage: 3,380,473 allocs, 2,824,051 frees, 374,594,816 bytes allocated
==5385== 
==5385== LEAK SUMMARY:
==5385==    definitely lost: 0 bytes in 0 blocks
==5385==    indirectly lost: 0 bytes in 0 blocks
==5385==      possibly lost: 0 bytes in 0 blocks
==5385==    still reachable: 239,026,864 bytes in 556,422 blocks
==5385==         suppressed: 0 bytes in 0 blocks
==5385== Reachable blocks (those to which a pointer was found) are not shown.
==5385== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5385== 
==5385== For lists of detected and suppressed errors, rerun with: -s
==5385== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

You could try to reproduce the error using the following function to load this maze file. Sorry about the size. You can load it using this function:

std::vector<std::vector<std::pair<bool, bool> > > readVectorFromFile(const std::string &filename) {
    std::vector<std::vector<std::pair<bool, bool> > > vec;
    std::ifstream inFile(filename, std::ios::binary);
    if (!inFile) {
        std::cerr << "Error opening file for reading: " << filename << std::endl;
        return vec;
    }

    // Read the size of the outer vector
    size_t outerSize;
    inFile.read(reinterpret_cast<char *>(&outerSize), sizeof(outerSize));

    vec.resize(outerSize);

    for (auto &innerVec: vec) {
        // Read the size of each inner vector
        size_t innerSize;
        inFile.read(reinterpret_cast<char *>(&innerSize), sizeof(innerSize));

        innerVec.resize(innerSize);

        // Read each pair in the inner vector
        for (auto &pair: innerVec) {
            inFile.read(reinterpret_cast<char *>(&pair.first), sizeof(bool));
            inFile.read(reinterpret_cast<char *>(&pair.second), sizeof(bool));
        }
    }

    inFile.close();
    return vec;
}

The main would look something like:

int main() {
    auto walls = readVectorFromFile("maze.dat");
    std::cout << "Maze loaded" << std::endl;

    solve(0, 0, maze_size - 1, maze_size - 1, walls);

    return 0;
}
Ringed answered 8/8 at 14:16 Comment(8)
Stack overflow in thread #1: can't grow stack to 0x1ffe801000 seems fairly clear to me. Did you not notice this or not understand?Improvised
Build with debug information (add the -g option when building) and Valgrind should hopefully be able to tell you more information.Shute
Although the stack overflow theory requires recursive code, and that's not obviously the case from what's been posted.Improvised
Also, never call top or pop on a std::queue (or std::stack) without making sure it's not empty.Shute
@Improvised Yeah I don't understand that. I am still a beginner so I was hoping for someone to help me here.Ringed
@Ringed Well I'm not sure if you have a stack overflow or not. Are any of your functions recursive?Improvised
@Improvised My functions aren't but the deconstructor of Node might be called recursively when one ~Node() get's called and the smart_ptr releases the parent and so on...Ringed
@Ringed Didn't spot that, I would call that a smoking gun.Improvised
H
6

You didn't explicitly define Node::~Node() but the compiler did that for you. In there, it calls std::shared_ptr<Node>::~shared_ptr. In turn, that calls Node::~Node.

This is a recursive call. Depending on your maze and query, this can overflow the call stack.

The easier solution is to let solve() manage the ownership of all the Nodes; each Node can just have a non-owning Node* parent. This ownership can be as easy as a std::deque<Node> (but not a std::vector<Node>, because that reallocates and breaks the Node* parent)

[edit]

Using smart pointers in node objects isn't necessarily a bad idea. In wide trees, where each parent had 10 children, you're going to run into memory problems around 9 levels deep (one billion nodes), but the stack can easily handle that level of nesting. Even a balanced binary tree with a billion nodes is not too bad (depth 32). In these tree cases, each node is the sole owner of a few children. That can be implemented with a std::vector<std::unique_ptr>> children.

Hamm answered 8/8 at 14:29 Comment(6)
Thanks. So does this mean that the code generally works but the stack is too small for it? Sorry, I don't really know what this stuff meansRinged
@Ringed no. It means that there is a problem in your code that just waits to blow it up. Even when your code does not blow up, the problem is there. Rather than hoping it will not show up you should fix it.Eyot
@hubenhau: Strictly speaking, yes - it works but the stack size limits the size of the maze your implementation can solve. There may be a lot of memory still available, but that is not stack memory.Hamm
@Hamm Thank you. I just tried with valgrind and --main-stacksize increased and it does work. I guess I'll have to find a different method for this, but I'm happy that at least the logic checked out in the end.Ringed
@Hamm Not really what you suggested but I wanted to ask if the following fix is okay or not. I added a deconstructor for the nodes: ~Node() { while (parent.use_count() == 1) { parent = parent->parent; } } If parent.use_count() == 1, that means that the parent will die since the current node has the last shared_ptr that points to parent. By setting parent to parent->parent, essentially the parent node dies without causing the chain reaction since our node now points to it's grand father, and so on... Is this approach okay or still problematic?Ringed
@hubenhau: I think it's cleaner to write while (parent) parent = parent->parent. The idea is the same, you use explicit iteration (while loop) instead of recursion. That should avoid the stack overflow.Hamm
J
3

In your Node the std::shared_ptr<Node> parent; is not really needed, and is prone to cyclic dependencies, which apparently happened in your maze (although I am too lazy to confirm that). Remove the shared_ptr and just provide the information about the parent Node steps:

class Node {
public:
    uint16_t x, y;
    uint32_t steps;
    uint32_t value;

    Node(uint16_t x, uint16_t y, uint16_t goal_x, uint16_t goal_y, int parentSteps = -1) : x(x), y(y) {
        steps = parentSteps + 1;
        value = abs(static_cast<int32_t>(goal_x) - static_cast<int32_t>(x)) +
                abs(static_cast<int32_t>(goal_y) - static_cast<int32_t>(y)) + steps;
    }

    [[nodiscard]] bool solved(const uint16_t goal_x, const uint16_t goal_y) const { return x == goal_x && y == goal_y; }

    // Overload the < operator for the priority queue
    bool operator>(const Node &rhs) const {
        if (value == rhs.value) {
            return steps < rhs.steps;
        }
        return value > rhs.value;
    }
};

Then you can create new nodes like this:

queue.emplace(x + 1, y, goal_x, goal_y, node.steps);
Joinville answered 8/8 at 14:29 Comment(5)
I need the parent to track back the optimal path through the maze after it's solved. Go from last now through all the parents back to the start to find the pathRinged
Mhmm, that was not mentioned. But in any case you don't need a pointer to Node, just keep the information about parents coordinates, parent id, or create a collection (e.g. std::vector>) where you will store your path on the fly.Joinville
Yeah I have already implemented a different solution that stores all nodes in a vector and keeps track of the parent_id for the nodes, but I thought that smart_pointers would require less memory since they don't need to store all nodes and could automatically release the nodes that will definitely not be part of the final pathRinged
It's more of a design issue rather then pointers requiring more or less memory. The approach with entities linked to each other is fine by itself, but having a Node that also behaves like a List is problematic, e.g. because it triggers a chain reaction when one Node is destroyed. Typically you would have a Tree or List class that manages the Nodes, links them together and decides when to destroy them.Joinville
Okay so it would be a good idea to write my own class that does this instead of relying on shared pointers? Also, I'm still not sure if all of this means that my code would work correct, let's say without hardware constraints?Ringed
N
2

This is the error that you get:

Stack overflow in thread #1: can't grow stack to 0x1ffe801000

Stack overflow means that the stack has been overflown, so let's understand what the stack is.

Stack is a data structure of the LIFO kind, that is, you pile elements on top of each other and you are always able to pop the element on the top first.

In Computer Science, there is a section of memory called the call stack.

This section is where subroutine calls are being called. So, if you call a function which calls a function which calls a function, then these functions are being stacked into the call stack. This works up to a certain limit, but if you have too many function calls without the functions executed (and being popped from the stack), then your unfinished function calls pile up and fill the size of the call stack. At this point if there is yet another function call, it is attempted to be pushed into the call stack, but fails doing so for not having sufficient storage space. This is what happened.

As you probably already realized, you can increase the size of the call stack, which will remove the symptoms, but the underlying problem would still be there, just the limit of the stack would be harder to reach.

So, the solution is to make sure that your algorithm will be iterative rather than recursive. One way to try and solve it is to try and create an iterative stack (you actually explicitly use a stack rather than relying on the call stack) and if you want to free a Node, then push all its references into your stack, set them to null, so nothing fishy happens, free the Node that now no longer has any reference to trigger a destructor for, pop your node from the top of the stack, behave similarly, that is, push its references to the stack, remove the references of this object, then free it, then proceed with the top of the stack and so on until there is no more pointer to release.

Neglect answered 8/8 at 15:10 Comment(3)
Thanks for the answer. Not sure if I really understand how to write the solution you suggested, but I'll try it.Ringed
I managed to get rid of the error now by calling this function on every node before it dies: void free(Node &node) { while(node.parent.use_count() == 1) { auto parent = node.parent; node.parent = nullptr; node = *parent; } } Is it okay to do it like that?Ringed
@Ringed I was suggesting the use of a stack. See programiz.com/cpp-programming/stack . You would actively push all the references your object has into the stack, set them all to null inside your object, delete your object with the now safe destructor, then proceed popping the next element from the stack, do the same thing, until all references were successfully freed.Neglect

© 2022 - 2024 — McMap. All rights reserved.