RegExp in preg_match function returning browser error
Asked Answered
W

4

15

The following function breaks with the regexp I've provided in the $pattern variable. If I change the regexp I'm fine, so I think that's the problem. I'm not seeing the problem, though, and I'm not receiving a standard PHP error even though they're turned on.

function parseAPIResults($results){
//Takes results from getAPIResults, returns array.

    $pattern = '/\[(.|\n)+\]/';
    $resultsArray = preg_match($pattern, $results, $matches);

}

Firefox 6: The connection was reset

Chrome 14: Error 101 (net::ERR_CONNECTION_RESET): The connection was reset.

IE 8: Internet Explorer cannot display the webpage

UPDATE:
Apache/PHP may be crashing. Here's the Apache error log from when I run the script:

[Sat Oct 01 11:41:40 2011] [notice] Parent: child process exited with status 255 -- Restarting.
[Sat Oct 01 11:41:40 2011] [notice] Apache/2.2.11 (Win32) PHP/5.3.0 configured -- resuming normal operations

Running WAMP 2.0 on Windows 7.

Weisman answered 1/10, 2011 at 14:42 Comment(5)
so your looking for a . or a new line?Benefice
I'm looking for a . or a new line within brackets. The regexp checks out on regexpal.comWeisman
@stereofrog You may be right. Here's the apache error log from the moment I run the script: [Sat Oct 01 11:41:40 2011] [notice] Parent: child process exited with status 255 -- Restarting.Weisman
The crash you are seeing is not new and is due to am unhandled stack overflow in the PCRE library due to a certain class of regex being applied to a largish subject string. Upgrading PHP to the latest version (5.3.8) will not help. I am currently working on a detailed answer to this question right now (it is not trivial). Standby... In the meantime, you can take a look at how this same problem affected the Drupal project a while back: Optimize CSS option causes php cgi to segfault in pcre function "match"Longicorn
@stereofrog - Yes. Short story: PHP's pcre.recursion_limit defaults to 100,000 which is way too high. This value needs to be set to the stacksize divided by 500 according to the PCRE Documentation. For a Win32 build of httpd.exe, (with its 256KB stack), pcre.recursion_limit needs to be set to 524. On *nix systems, (with executables typically having an 8MB stack) it needs to be reduced to 16777.Longicorn
L
54

Simple question. Complex answer!

Yes, this class of regex will repeatably (and silently) crash Apache/PHP with an unhandled segmentation fault due to a stack overflow!

Background:

The PHP preg_* family of regex functions use the powerful PCRE library by Philip Hazel. With this library, there is a certain class of regex which requires lots of recursive calls to its internal match() function and this uses up a lot of stack space, (and the stack space used is directly proportional to the size of the subject string being matched). Thus, if the subject string is too long, a stack overflow and corresponding segmentation fault will occur. This behavior is described in the PCRE documentation at the end under the section titled: pcrestack.

PHP Bug 1: PHP sets: pcre.recursion_limit too large.

The PCRE documentation describes how to avoid a stack overflow segmentation fault by limiting the recursion depth to a safe value roughly equal to the stack size of the linked application divided by 500. When the recursion depth is properly limited as recommended, the library does not generate a stack overflow and instead gracefully exits with an error code. Under PHP, this maximum recursion depth is specified with the pcre.recursion_limit configuration variable and (unfortunately) the default value is set to 100,000. This value is TOO BIG! Here is a table of safe values of pcre.recursion_limit for a variety of executable stack sizes:

Stacksize   pcre.recursion_limit
 64 MB      134217
 32 MB      67108
 16 MB      33554
  8 MB      16777
  4 MB      8388
  2 MB      4194
  1 MB      2097
512 KB      1048
256 KB      524

Thus, for the Win32 build of the Apache webserver (httpd.exe), which has a (relatively small) stack size of 256KB, the correct value of pcre.recursion_limit should be set to 524. This can be accomplished with the following line of PHP code:

ini_set("pcre.recursion_limit", "524"); // PHP default is 100,000.

When this code is added to the PHP script, the stack overflow does NOT occur, but instead generates a meaningful error code. That is, it SHOULD generate an error code! (But unfortunately, due to another PHP bug, preg_match() does not.)

PHP Bug 2: preg_match() does not return FALSE on error.

The PHP documentation for preg_match() says that it returns FALSE on error. Unfortunately, PHP versions 5.3.3 and below have a bug (#52732) where preg_match() does NOT return FALSE on error (it instead returns int(0), which is the same value returned in the case of a non-match). This bug was fixed in PHP version 5.3.4.

Solution:

Assuming you will continue using WAMP 2.0 (with PHP 5.3.0) the solution needs to take both of the above bugs into consideration. Here is what I would recommend:

  • Need to reduce pcre.recursion_limit to a safe value: 524.
  • Need to explicitly check for a PCRE error whenever preg_match() returns anything other than int(1).
  • If preg_match() returns int(1), then the match was successful.
  • If preg_match() returns int(0), then the match was either not successful, or there was an error.

Here is a modified version of your script (designed to be run from the command line) that determines the subject string length that results in the recursion limit error:

<?php
// This test script is designed to be run from the command line.
// It measures the subject string length that results in a
// PREG_RECURSION_LIMIT_ERROR error in the preg_match() function.

echo("Entering TEST.PHP...\n");

// Set and display pcre.recursion_limit. (set to stacksize / 500).
// Under Win32 httpd.exe has a stack = 256KB and 8MB for php.exe.
//ini_set("pcre.recursion_limit", "524");       // Stacksize = 256KB.
ini_set("pcre.recursion_limit", "16777");   // Stacksize = 8MB.
echo(sprintf("PCRE pcre.recursion_limit is set to %s\n",
    ini_get("pcre.recursion_limit")));

function parseAPIResults($results){
    $pattern = "/\[(.|\n)+\]/";
    $resultsArray = preg_match($pattern, $results, $matches);
    if ($resultsArray === 1) {
        $msg = 'Successful match.';
    } else {
        // Either an unsuccessful match, or a PCRE error occurred.
        $pcre_err = preg_last_error();  // PHP 5.2 and above.
        if ($pcre_err === PREG_NO_ERROR) {
            $msg = 'Successful non-match.';
        } else {
            // preg_match error!
            switch ($pcre_err) {
                case PREG_INTERNAL_ERROR:
                    $msg = 'PREG_INTERNAL_ERROR';
                    break;
                case PREG_BACKTRACK_LIMIT_ERROR:
                    $msg = 'PREG_BACKTRACK_LIMIT_ERROR';
                    break;
                case PREG_RECURSION_LIMIT_ERROR:
                    $msg = 'PREG_RECURSION_LIMIT_ERROR';
                    break;
                case PREG_BAD_UTF8_ERROR:
                    $msg = 'PREG_BAD_UTF8_ERROR';
                    break;
                case PREG_BAD_UTF8_OFFSET_ERROR:
                    $msg = 'PREG_BAD_UTF8_OFFSET_ERROR';
                    break;
                default:
                    $msg = 'Unrecognized PREG error';
                    break;
            }
        }
    }
    return($msg);
}

// Build a matching test string of increasing size.
function buildTestString() {
    static $content = "";
    $content .= "A";
    return '['. $content .']';
}

// Find subject string length that results in error.
for (;;) { // Infinite loop. Break out.
    $str = buildTestString();
    $msg = parseAPIResults($str);
    printf("Length =%10d\r", strlen($str));
    if ($msg !== 'Successful match.') break;
}

echo(sprintf("\nPCRE_ERROR = \"%s\" at subject string length = %d\n",
    $msg, strlen($str)));

echo("Exiting TEST.PHP...");

?>

When you run this script, it provides a continuous readout of the current length of the subject string. If the pcre.recursion_limit is left at its too high default value, this allows you to measure the length of string that causes the executable to crash.

Comments:

  • Before investigating the answer to this question, I didn't know about PHP bug where preg_match() fails to return FALSE when an error occurs in the PCRE library. This bug certainly calls into question a LOT of code that uses preg_match! (I'm certainly going to do an inventory of my own PHP code.)
  • Under Windows, the Apache webserver executable (httpd.exe) is built with a stacksize of 256KB. The PHP command line executable (php.exe) is built with a stacksize of 8MB. The safe value for pcre.recursion_limit should be set in accordance with the executable that the script is being run under (524 and 16777 respectively).
  • Under *nix systems, the Apache webserver and command line executables are both typically built with a stacksize of 8MB, so this problem is not encountered as often.
  • The PHP developers should set the default value of pcre.recursion_limit to a safe value.
  • The PHP developers should apply the preg_match() bugfix to PHP version 5.2.
  • The stacksize of a Windows executable can be manually modified using the CFF Explorer freeware program. You can use this program to increase the stacksize of the Apache httpd.exe executable. (This works under XP but Vista and Win7 might complain.)
Longicorn answered 2/10, 2011 at 17:23 Comment(5)
Setting a lot recursion limit does not work for me: <?php ini_set("pcre.recursion_limit", "524"); $contents = 'd' . str_repeat('a', 1900) . 'b'; $contents = preg_replace('/d(a)+b/', '\1', $contents); crashes on Win7, PHP v5.3.9Olindaolinde
Hi Cris. Thanks for the feedback. Yes, my testing shows that your even simpler expression: /d(a)+b/ results in the same behavior as described in my answer. It appears that (x)+ results in one recursion per rep. Good to know.Longicorn
Setting Stacksize=256 KB with ini_set("pcre.recursion_limit", "524"); seems to work for me.Coadjutor
It helped me: #5059345Febri
thx, this saved me for changing stack size on linux (systemd): freedesktop.org/software/systemd/man/systemd.exec.htmlHarold
I
3

I ran into the same problem. Thanks a lot for the answer posted by ridgerunner.

Although it is helpful to know why php crashes, for me this does not really solve the problem. To solve the problem, I need to adjust my regex in order to save memory so php won't crash anylonger.

So the question is how to change the regex. The link to the PCRE manual posted above already describes a solution for an example regex that is quite similar to yours.

So how to fix your regex? First, you say you want to match "a . or a newline". Note that "." is a special character in a regex that does not only match a dot but any character, so you need to escape that. (I hope I did not get you wrong here and this was intended.)

$pattern = '/\[(\.|\n)+\]/';

Next, we can copy the quantifier inside the brackets:

$pattern = '/\[(\.+|\n+)+\]/';

This does not change the meaning of the expression. Now we use possessive quantifiers instead of normal ones:

$pattern = '/\[(\.++|\n++)++\]/';

So this should have the same meaning as your original regex, but work in php without crashing it. Why? Possessive quantifiers "eat up" the characters and do not allow to backtrack. Therefore, PCRE does not have to use recursion and stack will not overflow. Using them inside the brackets seems to be a good idea as we do not need the quantification of the alternative this often.

To sum up, best practice seems to be:

  • use possessive quantifiers where possible. This means: ++, *+, ?+ {}+ instead of +, *, ?, {}.
  • move quantifiers inside of alternative-brackets where possible

Following these rules I was able to fix my own problem, and I hope this will help somebody else.

Infanticide answered 17/11, 2012 at 0:27 Comment(0)
C
1

I had the same problem and you need to chenge the pattern to something like

$pattern = '|/your pattern/|s';

The 's' on the end basically means treat the string as a single line.

Checkrow answered 7/12, 2012 at 13:46 Comment(1)
Even though this is the shortest answer, it actually solves the problem.Williams
C
0

preg_match returns the number of matches found for the pattern. When you have a match, it is causing a fatal error in php (print_r(1), for instance, causes the error). print_r(0) (for when you change the pattern and have no matches) doesn't and just prints out 0.

You want print_r($matches)

As an aside, your pattern is not escaped properly. Using double quotes means you need to escape the backslashes in front of your brackets.

Conics answered 1/10, 2011 at 14:48 Comment(1)
You're right, though I don't think the print_r function is what's killing this. Script fails with the same browser error when I remove that line.Weisman

© 2022 - 2024 — McMap. All rights reserved.