How to make a . (any character) in a PHP regex pattern also match newline characters?
Asked Answered
F

5

28

The . character in a PHP regex accepts all characters except a newline. What can I use to accept ALL characters, including newlines?

For example:

$text = <<<TEXT
foo
bar
TEXT;
preg_match('/.+/', $text, $match);
echo $match[0];

This returns foo, but I need the full string match to be returned as:

foo
bar
Feola answered 26/10, 2010 at 17:26 Comment(0)
Q
51

This is commonly used to capture all characters:

[\s\S]

You could use any other combination of "Type-X + Non-Type-X" in the same way:

[\d\D]
[\w\W]

but [\s\S] is recognized by convention as a shorthand for "really anything".

You can also use the . if you switch the regex into "dotall" (a.k.a. "single-line") mode via the "s" modifier. Sometimes that's not a viable solution (dynamic regex in a black box, for example, or if you don't want to modify the entire regex). In such cases the other alternatives do the same, no matter how the regex is configured.

Quarterphase answered 26/10, 2010 at 17:29 Comment(3)
And why would you think that?Quarterphase
Testing on regex101.com. It says the regex is any single space or non-space. Looks like (.*?) will allow any combination of characters (but still not newlines).Cerebritis
Of course you'd have to add a multiplier of your choice. "any character" was the question, after all.Quarterphase
B
12

It's the the . character that means "every character" (edit: OP edited). And you need to add the option s to your regexp, for example :

preg_match("`(.+)`s", "\n");
Bracteole answered 26/10, 2010 at 17:27 Comment(7)
Aren't there supposed to be forward slashes at the beginning and end of a regexp?Feola
Can be, but any pair of delimiters will do.Clava
Not in PHP. It has to start and end with a delimiter (you can choose it), and every character past the last delimiter is an option (i.e. U for ungreedy, i for case-insensitive, etc.)Bracteole
+1 Depending on your needs m is an option as well. But based on the OP, s is the way to go.Muriel
Someone should explain s (and perhaps m) to make this really complete.Dithyramb
/m makes ^ and $ apply to each line (instead of full string), /s makes . also match \n, and /ms applies both (check each line and full string). See PHP manual regarding modifiers and Perl RegEx manual (which PHP's preg_match is based on).Equalize
It is all too common to see Stack Overflow content adding unnecessary pattern modifiers. m will modify anchors such as ^ and $ to make them match the start and end of lines instead of the default behaviour of matching the start and end of the string. s is vital here to affect the ., there is absolutely no advantage in adding m here. m does NOT mean that "the input string might contain new line sequences.Wolfram
A
1

would

[.\n]+

not work?

How about (.|\n)+? I tested it and it seems to work.

I am quite sure this is the literal interpretation of exactly what you were asking for.

Alexandrite answered 26/10, 2010 at 17:30 Comment(6)
The . in a character class does not mean "any character". It means "a dot". Character classes have their own syntax. ;-)Quarterphase
@Tomalak: Thanks for the explanation, I just realized it now. I guess I should test my answers before I post them. I've edited my answer now.Alexandrite
Common error. I see people do [this|that|\d] a lot, when they really mean (this|that|\d). P.S.: (.|\n) works but it may be slightly less efficient than a character class.Quarterphase
Glad all I had was some metacharacter confusion. Dont think I would ever try to put an "|" inside "[]" I just like to avoid "()" whenever possible because they also are used to initialize special variables in Perl(and other languages) when something inside them matches.Alexandrite
I think a problem with this approach is that you "hardwire" the set of characters. If one day, one invents a character that is not matched by . (already the case: \t), one needs to rewrite all libraries that were based on such assumption...Monopolist
@WillemVanOnsem \t is matched by ..Classicize
B
0

The PHP Manual page for Dot states that:

If the PCRE_DOTALL option is set, then dots match newlines as well.

Burnsed answered 14/5, 2017 at 7:43 Comment(0)
C
0

An important thing is missing here. [\s\S] matches one character, whereas a newline can be a character sequence. (Windows uses two characters: \r\n.) Neither . (with DOT_ALL modifier) nor [\s\S] will match the newline sequence. Best way to match any character or any newline is (.|\R), "everything except a newline or a newline". \R matches \n, \r and \r\n.

Classicize answered 14/5, 2017 at 8:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.