Carets in Regular Expressions
Asked Answered
F

2

96

Specifically when does ^ mean "match start" and when does it mean "not the following" in regular expressions?

From the Wikipedia article and other references, I've concluded it means the former at the start and the latter when used with brackets, but how does the program handle the case where the caret is at the start and at a bracket? What does, say, ^[b-d]t$ match?

Fillip answered 5/6, 2013 at 15:50 Comment(0)
C
211

^ only means "not the following" when inside and at the start of [], so [^...].

When it's inside [] but not at the start, it means the actual ^ character.

When it's escaped (\^), it also means the actual ^ character.

In all other cases it means start of the string or line (which one is language or setting dependent).

So in short:

  • [^abc] -> not a, b or c
  • [ab^cd] -> a, b, ^ (character), c or d
  • \^ -> a ^ character
  • Anywhere else -> start of string or line.

So ^[b-d]t$ means:

  • Start of line
  • b/c/d character
  • t character
  • End of line
Checkroom answered 5/6, 2013 at 15:56 Comment(5)
When it's inside [] but not at the start, it means the actual ^ character. different possibility in Java.Playmate
In all other cases it means start of the string / line (which one is language / setting dependent). It's not really dependent, the meaning is specific to a regex engine, and their all the same on this mostly.Playmate
[^\^] not carat!Laufer
What about using a CARAT in PHP regular expressions to indicate that the expression reaches the end?Ax
@Laufer : you want "not caret" just [^^] and skip the backslash altogetherElanorelapid
P
0

Going to ignore block comments ? Ok, this ^\s* might be bad because \s can span lines. See if Dot-net supports horizontal whitespace \h if not [^\S\r\n] works also. Can use multi-line inline modifier (?m) (or RegexOptions.Multiline). That changes the meaning of ^ to mean the beginning of line as opposed to beginning of string (the default). So, it ends up being (?m)^\h*(#). The capture group should tell the position. If not, this is just as well (?m)(?<=^\h*)# and the position of the match is the offset.

See this for complete regex info https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

Note that ^\s* will work of course, but it matches a lot of unnecessary cruft that can span lines.

Playmate answered 17/11, 2019 at 16:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.