Replace preg_replace() e modifier with preg_replace_callback
Asked Answered
T

3

88

I'm terrible with regular expressions. I'm trying to replace this:

public static function camelize($word) {
   return preg_replace('/(^|_)([a-z])/e', 'strtoupper("\\2")', $word);
}

with preg_replace_callback with an anonymous function. I don't understand what the \\2 is doing. Or for that matter exactly how preg_replace_callback works.

What would be the correct code for achieving this?

Tennilletennis answered 16/3, 2013 at 20:13 Comment(5)
The e modifier is deprecated as of PHP 5.5.0Inhibitory
@HamZaDzCyberDeV I know. That's one of the reasons I want to replace it with preg_replace_callbackTennilletennis
There's a manual page for preg_replace_callback. And \\2 will become $matches[2] in said callback. Or which part are you confused about specifically?Uhf
@Uhf ahh The $matches[2] was all I needed. I still don't understand how it works, but it does. If you put that in an answer I'll mark it is as solving the problem.Tennilletennis
Please don't use create_function, it's just yet another wrapper around eval. You should use a proper anonymous function, unless you're stuck in PHP 5.2 for some reason.Burstone
B
82

The Basics

In a regular expression, you can "capture" parts of the matched string with (brackets); in this case, you are capturing the (^|_) and ([a-z]) parts of the match. These are numbered starting at 1, so you have back-references 1 and 2. Match 0 is the whole matched string.

The /e modifier takes a replacement string, and substitutes backslash followed by a number (e.g. \1) with the appropriate back-reference - but because you're inside a string, you need to escape the backslash, so you get '\\1'. It then (effectively) runs eval to run the resulting string as though it was PHP code (which is why it's being deprecated, because it's easy to use eval in an insecure way).

The preg_replace_callback function instead takes a callback function and passes it an array containing the matched back-references. So where you would have written '\\1', you instead access element 1 of that parameter - e.g. if you have an anonymous function of the form function($matches) { ... }, the first back-reference is $matches[1] inside that function.

So a /e argument of

'do_stuff(\\1) . "and" . do_stuff(\\2)'

could become a callback of

function($m) { return do_stuff($m[1]) . "and" . do_stuff($m[2]); }

Or in your case

'strtoupper("\\2")'

could become

function($m) { return strtoupper($m[2]); }

Parameter and Function Names

Note that $m and $matches are not magic names, they're just the parameter name I gave when declaring my callback functions.

Also, you don't have to pass an anonymous function, it could be a function name as a string, or something of the form array($object, $method), as with any callback in PHP. For example, using a string to reference a global function:

function stuffy_callback($things) {
    return do_stuff($things[1]) . "and" . do_stuff($things[2]);
}
$foo = preg_replace_callback('/([a-z]+) and ([a-z]+)/', 'stuffy_callback', 'fish and chips');

Or using First-Class Callable Syntax added in PHP 8.1:

function stuffy_callback($things) {
    return do_stuff($things[1]) . "and" . do_stuff($things[2]);
}
$foo = preg_replace_callback('/([a-z]+) and ([a-z]+)/', stuffy_callback(...), 'fish and chips');

Variable Scope

As with any function, you can't access variables outside your callback (from the surrounding scope) by default. When using an anonymous function, you can use the use keyword to import the variables you need to access, as discussed in the PHP manual. e.g. if the old argument was

'do_stuff(\\1, $foo)'

then the new callback might look like

function($m) use ($foo) { return do_stuff($m[1], $foo); }

Gotchas

  • Use of preg_replace_callback is instead of the /e modifier on the regex, so you need to remove that flag from your "pattern" argument. So a pattern like /blah(.*)blah/mei would become /blah(.*)blah/mi.
  • The /e modifier used a variant of addslashes() internally on the arguments, so some replacements used stripslashes() to remove it; in most cases, you probably want to remove the call to stripslashes from your new callback.
Burstone answered 16/3, 2013 at 20:37 Comment(0)
U
2

preg_replace shim with eval support

This is very inadvisable. But if you're not a programmer, or really prefer terrible code, you could use a substitute preg_replace function to keep your /e flag working temporarily.

/**
 * Can be used as a stopgap shim for preg_replace() calls with /e flag.
 * Is likely to fail for more complex string munging expressions. And
 * very obviously won't help with local-scope variable expressions.
 *
 * @license: CC-BY-*.*-comment-must-be-retained
 * @security: Provides `eval` support for replacement patterns. Which
 *   poses troubles for user-supplied input when paired with overly
 *   generic placeholders. This variant is only slightly stricter than
 *   the C implementation, but still susceptible to varexpression, quote
 *   breakouts and mundane exploits from unquoted capture placeholders.
 * @url: https://mcmap.net/q/237285/-replace-preg_replace-e-modifier-with-preg_replace_callback
 */
function preg_replace_eval($pattern, $replacement, $subject, $limit=-1) {
    # strip /e flag
    $pattern = preg_replace('/(\W[a-df-z]*)e([a-df-z]*)$/i', '$1$2', $pattern);
    # warn about most blatant misuses at least
    if (preg_match('/\(\.[+*]/', $pattern)) {
        trigger_error("preg_replace_eval(): regex contains (.*) or (.+) placeholders, which easily causes security issues for unconstrained/user input in the replacement expression. Transform your code to use preg_replace_callback() with a sane replacement callback!");
    }
    # run preg_replace with eval-callback
    return preg_replace_callback(
        $pattern,
        function ($matches) use ($replacement) {
            # substitute $1/$2/… with literals from $matches[]
            $repl = preg_replace_callback(
                '/(?<!\\\\)(?:[$]|\\\\)(\d+)/',
                function ($m) use ($matches) {
                    if (!isset($matches[$m[1]])) { trigger_error("No capture group for '$m[0]' eval placeholder"); }
                    return addcslashes($matches[$m[1]], '\"\'\`\$\\\0'); # additionally escapes '$' and backticks
                },
                $replacement
            );
            # run the replacement expression
            return eval("return $repl;");
        },
        $subject,
        $limit
    );
}

In essence, you just include that function in your codebase, and edit preg_replace to preg_replace_eval wherever the /e flag was used.

Pros and cons:

  • Really just tested with a few samples from Stack Overflow.
  • Does only support the easy cases (function calls, not variable lookups).
  • Contains a few more restrictions and advisory notices.
  • Will yield dislocated and less comprehensible errors for expression failures.
  • However is still a usable temporary solution and doesn't complicate a proper transition to preg_replace_callback.
  • And the license comment is just meant to deter people from overusing or spreading this too far.

Replacement code generator

Now this is somewhat redundant. But might help those users who are still overwhelmed with manually restructuring their code to preg_replace_callback. While this is effectively more time consuming, a code generator has less trouble to expand the /e replacement string into an expression. It's a very unremarkable conversion, but likely suffices for the most prevalent examples.

To use this function, edit any broken preg_replace call into preg_replace_eval_replacement and run it once. This will print out the according preg_replace_callback block to be used in its place.

/**
 * Use once to generate a crude preg_replace_callback() substitution. Might often
 * require additional changes in the `return …;` expression. You'll also have to
 * refit the variable names for input/output obviously.
 *
 * >>>  preg_replace_eval_replacement("/\w+/", 'strtopupper("$1")', $ignored);
 */
function preg_replace_eval_replacement($pattern, $replacement, $subjectvar="IGNORED") {
    $pattern = preg_replace('/(\W[a-df-z]*)e([a-df-z]*)$/i', '$1$2', $pattern);
    $replacement = preg_replace_callback('/[\'\"]?(?<!\\\\)(?:[$]|\\\\)(\d+)[\'\"]?/', function ($m) { return "\$m[{$m[1]}]"; }, $replacement);
    $ve = "var_export";
    $bt = debug_backtrace(0, 1)[0];
    print "<pre><code>
    #----------------------------------------------------
    # replace preg_*() call in '$bt[file]' line $bt[line] with:
    #----------------------------------------------------
    \$OUTPUT_VAR = preg_replace_callback(
        {$ve($pattern, TRUE)},
        function (\$m) {
            return {$replacement};
        },
        \$YOUR_INPUT_VARIABLE_GOES_HERE
    )
    #----------------------------------------------------
    </code></pre>\n";
}

Take in mind that mere copy&pasting is not programming. You'll have to adapt the generated code back to your actual input/output variable names, or usage context.

  • Specificially the $OUTPUT = assignment would have to go if the previous preg_replace call was used in an if.
  • It's best to keep temporary variables or the multiline code block structure though.

And the replacement expression may demand more readability improvements or rework.

  • For instance stripslashes() often becomes redundant in literal expressions.
  • Variable-scope lookups require a use or global reference for/within the callback.
  • Unevenly quote-enclosed "-$1-$2" capture references will end up syntactically broken by the plain transformation into "-$m[1]-$m[2].

The code output is merely a starting point. And yes, this would have been more useful as an online tool. This code rewriting approach (edit, run, edit, edit) is somewhat impractical. Yet could be more approachable to those who are accustomed to task-centric coding (more steps, more uncoveries). So this alternative might curb a few more duplicate questions.

Uhf answered 14/10, 2019 at 10:56 Comment(0)
R
0

You shouldn't use flag e (or eval in general).

You can also use T-Regx library

pattern('(^|_)([a-z])')->replace($word)->by()->group(2)->callback('strtoupper');
Ralph answered 31/5, 2019 at 10:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.