Regular Expression to match cross platform newline characters
Asked Answered
V

2

64

My program can accept data that has newline characters of \n, \r\n or \r (eg Unix, PC or Mac styles)

What is the best way to construct a regular expression that will match whatever the encoding is?

Alternatively, I could use universal_newline support on input, but now I'm interested to see what the regex would be.

Vasodilator answered 26/8, 2009 at 0:54 Comment(5)
Just note, \r is the old Mac style (and by "old" I mean "OS 9 and before"). Any Mac running OS X (a.k.a. made after 1999) is going to use \n like any other Unix.Marvelmarvella
When is it useful to match newlines vs. using '$' to match the end of the line ?Triceratops
@tonfa: When splitting a file into lines via regex.Meister
@too much php But wouldn't str.splitlines() works just as well?Triceratops
@tonfa: OK so it's not needed often, but it's good to know for other languages that don't have convenient functions like splitlines().Meister
M
99

The regex I use when I want to be precise is "\r\n?|\n".

When I'm not concerned about consistency or empty lines, I use "[\r\n]+", I imagine it makes my programs somewhere in the order of 0.2% faster.

Meister answered 26/8, 2009 at 1:2 Comment(2)
Usually when I'm not concerned about newlines, I'm also not concerned about spaces either.Scattering
What makes your programs faster?Seedbed
I
10

The pattern can be simplified to \r?\n for a little performance gain, as you probably don't have to deal with the old Mac style (OS 9 is unsupported since February 2002).

Inordinate answered 18/8, 2016 at 15:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.