There is much discussion about which is faster and better.
If you plan on someday advancing to PHP6 your decision is made. Otherwise:
The general consensus is that PCRE is the better all around solution, but if you have a specific page with a lot of traffic, and you don't need PHP6 it may be worth some testing.
For example, from the PHP manual comments:
Deprecating POSIX regex in PHP for
Perl searching is like substituting
wooden boards and brick for a house
with pre-fabricated rooms and walls.
Sure, you may be able to mix and match
some of the parts but it's a lot
easier to modify with all the pieces
laid out in front of you.
PCRE faster than POSIX RE? Not always.
In a recent search-engine project here
at Cynergi, I had a simple loop with a
few cute ereg_replace() functions that
took 3min to process data. I changed
that 10-line loop into a 100-line
hand-written code for replacement and
the loop now took 10s to process the
same data! This opened my eye to what
can IN SOME CASES be very slow
regular expressions. Lately I decided
to look into Perl-compatible regular
expressions (PCRE). Most pages claim
PCRE are faster than POSIX, but a few
claim otherwise. I decided on
bechmarks of my own. My first few
tests confirmed PCRE to be faster,
but... the results were slightly
different than others were getting, so
I decided to benchmark every case of
RE usage I had on a 8000-line secure
(and fast) Webmail project here at
Cynergi to check it out. The results?
Inconclusive! Sometimes PCRE are
faster (sometimes by a factor greater
than 100x faster!), but some other
times POSIX RE are faster (by a factor
of 2x). I still have to find a rule on
when are one or the other faster. It's
not only about search data size,
amount of data matched, or "RE
compilation time" which would show
when you repeated the function often:
one would always be faster than the
other. But I didn't find a pattern
here. But truth be said, I also didn't
take the time to look into the source
code and analyse the problem. I can
give you some examples, though. The
POSIX RE
([0-9]{4})/([0-9]{2})/([0-9]{2})[^0-9]+
([0-9]{2}):([0-9]{2}):([0-9]{2}) is
30% faster in POSIX than when
converted to PCRE (even if you use \d
and \D and non-greedy matching). On
the other hand, a similarly PCRE
complex pattern /[0-9]{1,2}[
\t]+[a-zA-Z]{3}[ \t]+[0-9]{4}[
\t]+[0-9]{1,2}:[0-9]{1,2}(:[0-9]{1,2})?[
\t]+[+-][0-9]{4}/ is 2.5x faster in
PCRE than in POSIX RE. Simple
replacement patterns like
ereg_replace( "[^a-zA-Z0-9-]+", "", $m
); are 2x faster in POSIX RE than
PCRE. And then we get confused again
because a POSIX RE pattern like
(^|\n|\r)begin-base64[ \t]+[0-7]{3,4}[
\t]+...... is 2x faster as POSIX RE,
but the case-insensitive PCRE
/^Received[ \t]*:[ \t]by[ \t]+([^
\t]+)[ \t]/i is 30x faster than its
POSIX RE version! When it comes to
case sensitivity, PCRE has so far
seemed to be the best option. But I
found some really strange behaviour
from ereg/eregi. On a very simple
POSIX RE (^|\r|\n)mime-version[ \t]:
I found eregi() taking 3.60s (just a
number in a test benchmark), while the
corresponding PCRE took 0.16s! But if
I used ereg() (case-sensitive) the
POSIX RE time went down to 0.08s! So I
investigated further. I tried to make
the POSIX RE case-insensitive itself.
I got as far as this:
(^|\r|\n)[mM][iI][mM][eE]-vers[iI][oO][nN][
\t]*: This version also took 0.08s.
But if I try to apply the same rule to
any of the 'v', 'e', 'r' or 's'
letters that are not changed, the time
is back to the 3.60s mark, and not
gradually, but immediatelly so! The
test data didn't have any "vers" in
it, other "mime" words in it or any
"ion" that might be confusing the
POSIX parser, so I'm at a loss. Bottom
line: always benchmark your PCRE /
POSIX RE to find the fastest! Tests
were performed with PHP 5.1.2 under
Windows, from the command line. Pedro
Freire cynergi.com