PHP - Is this a safe way to allow user-supplied regular expressions
Asked Answered
A

2

13

I would like to allow small user-defined regular expressions to be submitted for testing. However, there are many problems to consider from run-away server usage to more evil eval() usage.

To my knowledge I have handled all the problems I could think of in the following code. Are their any attack vectors I haven't thought of? (A rather naive question I know)

function testRegex($regex)
{
    // null character allows a premature regex end and "/../e" injection
    if (strpos($regex, 0) !== false || ! trim($regex)) {
        return false;
    }

    $backtrack_limit = ini_set('pcre.backtrack_limit', 200);
    $recursion_limit = ini_set('pcre.recursion_limit', 20);

    $valid = @preg_match("~$regex~u", null) !== false;

    ini_set('pcre.backtrack_limit', $backtrack_limit);
    ini_set('pcre.recursion_limit', $recursion_limit);

    return $valid;
}


$regexes = array(
    "InvalidRegular)Expression",
    '',
    '\w+',
    '\/\w+/',
    'foo[bar]*',
    '\/\x00known/e' . chr(0x00) . chr(0),
    'known~e' . chr(0),
    'known~e' . chr(0x00),
    '[a-z]+',
    '\p{Lu}+',
);


foreach($regexes as $regex) {
    var_dump($regex, testRegex($regex));
}

If you want to see an example of a null-byte injection:

$user_regex = '.+~e' . chr(0);
$user_match = 'system("whoami")';

var_dump(preg_replace("~$user_regex~u", $user_match, 'foo'));
Agueweed answered 18/2, 2014 at 1:41 Comment(8)
if (strpos($regex, 0) !== false && trim($regex)) { return false; } at the top would be easier to comprehend :)Monsignor
Thanks, I updated it. Also, note that chr(0) !== 0Agueweed
Ah, check the documentation and you will see that the second argument to strpos() can be an integer value :)Monsignor
Um, it's not about integers - it's about the controller character that is ASCII null. Try running $ php -r "var_dump(0, chr(0)); to see. chr() isn't typecasting, it's getting an ASCII value for the given integer. It might help if I wrote it chr(0x00) so people would know I didn't mean "0".Agueweed
Seriously though, check the documentation :)Monsignor
Ah, my bad. Good call out.Agueweed
Just a side note: You might want to store the delimiter and regex modifiers in separate variables. So that the "user" could define their own modifiers and take into account for that while providing a regex. Of course, you could limit the choice. The same goes for modifiers. I would imagine something like $delimiter . $regex . $delimiter . $modifier.Pentaprism
Note, that if you use brackets for delimiters (f.ex. "($regex)u"), your users don't have to guess, what delimiter should be escaped additionally.Saturn
F
9

Obviously, the only way to test whether a string is a valid regular expression is by compiling it (which is done when you call any of the matching functions), so what you're doing makes a lot of sense.

The null-byte protection you have added is actually not necessary since 5.4, because there are already checks made in the leader, the middle and the ending. The latter in particular is a relatively recent commit (2011) to fix this bug.

Setting a lower backtrack and recursion limit is a good enough sandbox, perhaps you could check for a maximum length as well.

That said, this particular solution doesn't provide the ability to use modifiers such as /s, /i and /m; perhaps that's not your main concern at the moment, but rather food for thought :)

Filiation answered 18/2, 2014 at 2:20 Comment(2)
The null-byte protection actually is necessary. PHP only checks for null bytes before the ending delimiter.Agueweed
@Agueweed Thanks, I've updated my answer accordingly; we were both right :)Monsignor
H
1

You could tell them to input something like {expression} and then use preg_replace(). That way, they only use what you let them.

Heteroousian answered 9/3, 2014 at 14:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.