Expected lifespan of ereg, migrating to preg [duplicate]
Asked Answered
R

4

8

I work on a large PHP application (>1 million lines, 10 yrs old) which makes extensive use of ereg and ereg_replace - currently 1,768 unique regular expressions in 516 classes.

I'm very aware why ereg is being deprecated but clearly migrating to preg could be highly involved.

Does anyone know how long ereg support is likely to be maintained in PHP, and/or have any advice for migrating to preg on this scale. I suspect automated translation from ereg to preg is impossible/impractical?

Receptor answered 3/5, 2011 at 14:47 Comment(2)
ereg has been discouraged since around PHP 4.1 - but solely because it is not as optimized as the PCRE functions. It's not overly likely to be removed anytime soon (not with the mythical PHP6 anyway), and even then it would be simple to write a runtime compatibility support wrapper (which you should do for testing). You should list some examples why you think your posix extended regexpressions would be incompatible. The differences are seldomly significant.Decompose
There are tools out there that can translate from ereg to preg; RegexBuddy for example does this and supports COM automation. Depending on the complexity of your regexes, and whether you actually need to migrate, that might be a relevant option.Centre
E
2

I'm not sure when ereg will be removed but my bet is as of PHP 6.0.

Regarding your second issue (translating ereg to preg) doesn't seem something that hard, if your application has > 1 million lines surely you must have the resources to get someone doing this job for a week at most. I would grep all the ereg_ instances in your code and set up some macros in your favorite IDE (simple stuff like adding delimiters, modifiers and so on).

I bet most of the 1768 regexes can be ported using a macro, and the others, well, a good pair of eyes.

Another option might be to write wrappers around the ereg functions if they are not available, implementing the changes as needed:

if (function_exists('ereg') !== true)
{
    function ereg($pattern, $string, &$regs)
    {
        return preg_match('~' . addcslashes($pattern, '~') . '~', $string, $regs);
    }
}

if (function_exists('eregi') !== true)
{
    function eregi($pattern, $string, &$regs)
    {
        return preg_match('~' . addcslashes($pattern, '~') . '~i', $string, $regs);
    }
}

You get the idea. Also, PEAR package PHP Compat might be a viable solution too.


Differences from POSIX regex

As of PHP 5.3.0, the POSIX Regex extension is deprecated. There are a number of differences between POSIX regex and PCRE regex. This page lists the most notable ones that are necessary to know when converting to PCRE.

  1. The PCRE functions require that the pattern is enclosed by delimiters.
  2. Unlike POSIX, the PCRE extension does not have dedicated functions for case-insensitive matching. Instead, this is supported using the /i pattern modifier. Other pattern modifiers are also available for changing the matching strategy.
  3. The POSIX functions find the longest of the leftmost match, but PCRE stops on the first valid match. If the string doesn't match at all it makes no difference, but if it matches it may have dramatic effects on both the resulting match and the matching speed. To illustrate this difference, consider the following example from "Mastering Regular Expressions" by Jeffrey Friedl. Using the pattern one(self)?(selfsufficient)? on the string oneselfsufficient with PCRE will result in matching oneself, but using POSIX the result will be the full string oneselfsufficient. Both (sub)strings match the original string, but POSIX requires that the longest be the result.
Etherify answered 3/5, 2011 at 14:56 Comment(4)
More evasive would be to use "\1" as delimiters. Could still fail, but the other typical delimiters might already be used and needlessly prefixed with backslashes [\-\~]+, so addcslashes could potentially undo the escaping there (yes, too many ifs). -- The wrapper function is the only sane approach. Would add more testing/warnings, and dry-run it with runkit/override_function in OPs case.Decompose
+1 - Great answer, thanks (this was my first StackOverflow question, and I'm already impressed!)Receptor
@mario: Is \1 a valid PCRE delimiter? Didn't knew that! =) If they are already escaped in the context of ereg, escaping the tildes again should have no bad consequences... Or should it be addcslashes($pattern, '\\~')? I'm really bad when it comes to escaping (un)escaped data, always have to test it countless times!Etherify
@Oliver: Thank you, I hope you stick around! ;)Etherify
A
2

My intuition says that they are never going to remove ereg on purpose. PHP still supports really old and deprecated stuff like register globals. There're simply too many outdated apps out there. There's however a little chance that the extension has to be removed because someone finds a serious vulnerability and there's just nobody to fix it.

In any case, it's worth noting that:

  1. You are not forced to upgrade your PHP installation. It's pretty common to keep outdated servers to run legady apps.

  2. The PHP_Compat PEAR package offers plain PHP version of some native functions. If ereg disappears, it's possible that it gets added.


BTW... In fact, PHP 6 is dead. They realised that their approach to make PHP fully Unicode compliant was harder than they thought and they are rethinking it all. The conclusion is: you can never make perfect predictions.

Assignat answered 3/5, 2011 at 15:12 Comment(1)
Thanks - your last link was particularly interesting. We make extensive use of multibyte/Unicode parsing and have a tonne of homegrown utility classes to support it. The promise of native support in PHP6 was enticing but terrifying to us!Receptor
B
1

I had this problem on a much smaller scale - an application more like 10,000 lines. In every case, all I need to do was switch to preg_replace() and put delimiters around the regex pattern.

Anyone should be able to do that - even a non-programmer can be given a list of filenames and line numbers.

Then just run your tests to watch for any failures that can be fixed.

ereg functions will be removed from PHP6, by the way - http://jero.net/articles/php6.

Beatification answered 3/5, 2011 at 14:59 Comment(0)
K
0

All ereg functions will be removed as of PHP 6, I believe.

Kathyrnkati answered 3/5, 2011 at 14:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.