I would like to allow small user-defined regular expressions to be submitted for testing. However, there are many problems to consider from run-away server usage to more evil eval()
usage.
To my knowledge I have handled all the problems I could think of in the following code. Are their any attack vectors I haven't thought of? (A rather naive question I know)
function testRegex($regex)
{
// null character allows a premature regex end and "/../e" injection
if (strpos($regex, 0) !== false || ! trim($regex)) {
return false;
}
$backtrack_limit = ini_set('pcre.backtrack_limit', 200);
$recursion_limit = ini_set('pcre.recursion_limit', 20);
$valid = @preg_match("~$regex~u", null) !== false;
ini_set('pcre.backtrack_limit', $backtrack_limit);
ini_set('pcre.recursion_limit', $recursion_limit);
return $valid;
}
$regexes = array(
"InvalidRegular)Expression",
'',
'\w+',
'\/\w+/',
'foo[bar]*',
'\/\x00known/e' . chr(0x00) . chr(0),
'known~e' . chr(0),
'known~e' . chr(0x00),
'[a-z]+',
'\p{Lu}+',
);
foreach($regexes as $regex) {
var_dump($regex, testRegex($regex));
}
If you want to see an example of a null-byte
injection:
$user_regex = '.+~e' . chr(0);
$user_match = 'system("whoami")';
var_dump(preg_replace("~$user_regex~u", $user_match, 'foo'));
if (strpos($regex, 0) !== false && trim($regex)) { return false; }
at the top would be easier to comprehend :) – Monsignorchr(0) !== 0
– Agueweedstrpos()
can be an integer value :) – Monsignornull
. Try running$ php -r "var_dump(0, chr(0));
to see.chr()
isn't typecasting, it's getting an ASCII value for the given integer. It might help if I wrote itchr(0x00)
so people would know I didn't mean "0". – Agueweed$delimiter . $regex . $delimiter . $modifier
. – Pentaprism"($regex)u"
), your users don't have to guess, what delimiter should be escaped additionally. – Saturn