Regex to match any character including new lines
Asked Answered
A

8

348

Is there a regex to match "all characters including newlines"?

For example, in the regex below, there is no output from $2 because (.+?) doesn't include new lines when matching.

$string = "START Curabitur mollis, dolor ut rutrum consequat, arcu nisl ultrices diam, adipiscing aliquam ipsum metus id velit. Aenean vestibulum gravida felis, quis bibendum nisl euismod ut. 

Nunc at orci sed quam pharetra congue. Nulla a justo vitae diam eleifend dictum. Maecenas egestas ipsum elementum dui sollicitudin tempus. Donec bibendum cursus nisi, vitae convallis ante ornare a. Curabitur libero lorem, semper sit amet cursus at, cursus id purus. Cras varius metus eu diam vulputate vel elementum mauris tempor. 

Morbi tristique interdum libero, eu pulvinar elit fringilla vel. Curabitur fringilla bibendum urna, ullamcorper placerat quam fermentum id. Nunc aliquam, nunc sit amet bibendum lacinia, magna massa auctor enim, nec dictum sapien eros in arcu. 

Pellentesque viverra ullamcorper lectus, a facilisis ipsum tempus et. Nulla mi enim, interdum at imperdiet eget, bibendum nec END";

$string =~ /(START)(.+?)(END)/;

print $2;
Alizaalizarin answered 28/11, 2011 at 22:47 Comment(1)
You may want to read about regex modifiers / flags such as: m,s (m/regex/ims...)Eyeglass
P
260

Add the s modifier to your regex to cause . to match newlines:

$string =~ /(START)(.+?)(END)/s;
Pintsize answered 28/11, 2011 at 22:49 Comment(6)
In JavaScript: (START)[\s\S]*(END) - See www.regexpal.com to testDanielledaniels
For more info regarding @Zymotik's comment, see: #1068780Ephrem
In Java you can use the inline modifier (?s) at the beginning of the regex, for example to replace any character including newlines after 'yourPattern' use "(?s)yourPattern.*"- Also see: rexegg.com/regex-modifiers.html#dotallSigmon
In Ruby, the modifier is m, not s. See: rubular.comEasing
JavaScript now supports this way. ES2018 added the s dotAll flag.Redfish
@Danielledaniels thanks, it works in Js, here is the demo.Leper
P
470

If you don't want add the /s regex modifier (perhaps you still want . to retain its original meaning elsewhere in the regex), you may also use a character class. One possibility:

[\S\s]

a character which is not a space or is a space. In other words, any character.

You can also change modifiers locally in a small part of the regex, like so:

(?s:.)
Panchito answered 28/11, 2011 at 22:53 Comment(3)
Is (?:.|\n) inferior in any way, except being less elegant?Easterner
@VlastimilOvčáčík That one can be really bad for runtime if you use it with * or + since there are 2^n different ways it can match any given string of length n.Stoa
(?s:.) was exactly what I needed. Thanks!Deannadeanne
P
260

Add the s modifier to your regex to cause . to match newlines:

$string =~ /(START)(.+?)(END)/s;
Pintsize answered 28/11, 2011 at 22:49 Comment(6)
In JavaScript: (START)[\s\S]*(END) - See www.regexpal.com to testDanielledaniels
For more info regarding @Zymotik's comment, see: #1068780Ephrem
In Java you can use the inline modifier (?s) at the beginning of the regex, for example to replace any character including newlines after 'yourPattern' use "(?s)yourPattern.*"- Also see: rexegg.com/regex-modifiers.html#dotallSigmon
In Ruby, the modifier is m, not s. See: rubular.comEasing
JavaScript now supports this way. ES2018 added the s dotAll flag.Redfish
@Danielledaniels thanks, it works in Js, here is the demo.Leper
D
31

This is very readable to me and matches "any character or newline"

(.|\n)*

It behaves the same as

[\S\s]*

and the same as

(?s:.)*

Plus you can also add a ? to the end to make the regex eager (stop on the first match) (.|\n)*?

// Eager (stop on first match)
start_string(.|\n)*?end_string

Otherwise with only (.|\n)* the regex is greedy and you can end up with multiple end_string's:

start_string some text
and newlines end_string
some more text end_string
Dudeen answered 4/3, 2023 at 11:33 Comment(0)
B
11

Yeap, you just need to make . match newline :

$string =~ /(START)(.+?)(END)/s;
Bilbe answered 28/11, 2011 at 22:49 Comment(0)
G
0

I like to use an empty negated set which matches any character not in the group, since it's empty it will match anything including newlines.

[^]

If you want more than zero characters

[^]*

Or more than one

[^]+

Tested in JavaScript.

Granados answered 31/7, 2023 at 14:23 Comment(2)
Not sure about this. What specific regex engine implementation are you using? I don't think this notation has a conventional or widely-adopted meaning. Notepad++, for example, rejects this expression as malformed. One problem is that, if the engine can't assume there is at least one character in the (negated) set, then you'd have to establish another escape sequence in order to negate the set of a single ']' character.Polley
I'm using Chrome (V8), if I paste /[^]*/.test('whatever') in the console it return true.Granados
L
0

If you are using JavaScript this regex works great:

/(START)[\s\S]*(END)/g

DEMO.

Leper answered 26/1 at 16:56 Comment(0)
E
-1

Go with the other answers that use the /s flag to let the . match every character in

Perl v5.12 added the \N as a character class shortcut to always match any character except a newline despite the setting of /s. This allows \n to have a partner like \s has \S.

With this, you can do like similar answers to use both sides of the complement: [\n\N], [\s\S], and so on.

However, you've also tagged this with javascript, which thinks \N is just capital N.

Endocrine answered 8/3, 2023 at 8:24 Comment(0)
B
-6

You want to use "multiline".

$string =~ /(START)(.+?)(END)/m;
Bolte answered 28/11, 2011 at 22:49 Comment(2)
No, m affects the ^ and $ anchors but not ..Pintsize
Interesting, thanks. Guess I've never tried to do exactly what the OP is asking.Bolte

© 2022 - 2024 — McMap. All rights reserved.