Why did boost regex run out of stack space?
Asked Answered
B

2

7
#include <boost/regex.hpp>
#include <string>
#include <iostream>

using namespace boost;
static const regex regexp(
        "std::vector<"
            "(std::map<"
                   "(std::pair<((\\w+)(::)?)+, (\\w+)>,?)+"
             ">,?)+"
        ">");

std::string errorMsg =
"std::vector<"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">,"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">"
">";
int main()
{
    smatch result;
    if(regex_match(errorMsg, result, regexp))
    {  
        for (unsigned i = 0; i < result.size(); ++i)
        {  
            std::cout << result[i] << std::endl;
        }
    }

//    std::cout << errorMsg << std::endl;

    return 0;
}

this produces:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error>
>'   what():  Ran out of stack space trying to match the regular expression.

compiled with

g++ regex.cc -lboost_regex

EDIT

my platform:

g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
libboost-regex1.42
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
So the latest Ubuntu 64 bit
Bride answered 21/12, 2010 at 2:8 Comment(0)
R
13

((\\w+)(::)?)+ is one of the so called "pathological" regular expressions -- it's going to take exponential time, because you have two expressions which are dependent upon each other one right after the other. That is, it fails due to catastrophic backtracking.

Consider if we follow the example of the link, and reduce "something more complicated" to "x". Let's do that with \\w:

  • ((x+)(::)?)+

Let's also assume that our input is never going to have ::. Having this actually makes the regex more complex, so if we throw out complexity then we really should be making things simpler if nothing else:

  • (x+)+

Now you've got a textbook nested quantifier problem like that detailed in the link above.

There are a few ways to fix this but the simplest way is probably to just disallow backtracking on the inner match using the atomic group modifier "(?>":

  • ((?>\\w+)(::)?)+
Rivulet answered 21/12, 2010 at 2:19 Comment(13)
A pathological sub-regex could be the problem, but it's not the one you mentioned -- \w is just [A-Za-z0-9_] or thereabouts, so it already excludes , and : just like your proposed fix and so will take the same time. Also I think you need an opening paren on both regexes.Sorrells
@j_random_hacker: Hmm.. good point. My proposed fix doesn't fix the problem, but catastrophic backtracking is the issue.. give me a sec to fix it.Rivulet
Can you reproduce the problem?Bride
@nice blonde stupid girl: Sorry -- I had the wrong syntax before. It should be (?> ) not +? in order to disallow backtracking. No, I can't reproduce the problem, because I don't have a working installation of boost::regex to play with at the moment.Rivulet
@Billy: Yep, and I was in the process of writing more or less exactly that :) I would suggest ([^>]+) as a simpler alternative though (for the whole ((\\w+)(::)?)+, (\\w+)>,?)+ mess).Sorrells
@Billy: That version is not matchBride
@j_random_hacker: But I want to use the matched values from std::pair<> that is why I use this form.Bride
@nice: std::vector<(std::map<(std::pair<((?>\w+)(::)?)+, (\w+)>,?)+>,?)+> worked for me. (I used RegexBuddy though for testing rather than Boost)Rivulet
@nice: If Boost's regex library works the way most do, then each () group only captures the last instance matched, not all instances. Is that really what you want? But if somehow it is, just use ([^,]+), (\\w+) instead of ((\\w+)(::)?)+, (\\w+).Sorrells
@j_random_hacker: but when I define BOOST_REGEX_MATCH_EXTRA, than I can every submatch through captures()Bride
@nice: Well that feature is news to me! And it's almost certainly the reason why you're blowing your stack while others can't reproduce the problem. Don't you think you should have mentioned the fact that you enabled this experimental feature? -1 to your question.Sorrells
@j_random_hacker: I did not used that feature now. I am just planning to use it, because I used it somewhere else and I know it is working. So the error message is present without BOOST_REGEX_MATCH_EXTRABride
@nice: OK fair enough, but turning that feature on will almost certainly blow the stack if any backtracking is done, so I'd advise against doing that.Sorrells
B
1

Tested it locally and it worked fine, my guess is your compiler is doing something weird.

What version of gcc? what platform? which version of boost?

 -> ./regex
std::vector<std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>,std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>>
std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>
std::pair<Test::Test, int>
Test
Test
::
int
Bedabble answered 21/12, 2010 at 2:18 Comment(3)
My test was with boost-1.42 / gcc version 4.5.1 (Gentoo 4.5.1-r1 p1.4, pie-0.4.5), is it possible to try a different version of gcc and/or boost?Bedabble
Strange. I can try it somewhere else, but later. I will see my aptitude for an other versionBride
FYI, I'm hitting this webpage because code that worked under an old version of boost (1.43.0) hit this stack space error on a new version of boost (1.53.0). Thus, the version of boost does seem pertinent to whether the issue is manifested.Cirrostratus

© 2022 - 2024 — McMap. All rights reserved.