How can I match a pattern as long as it's not at the beginning with regex?
Asked Answered
F

6

68

Assume the following strings:

aaa bbb ccc
bbb aaa ccc

I want to match aaa as long as it is not at the start of the string. I'm trying to negate it by doing something like this:

[^^]aaa

But I don't think this is right. Using preg_replace.

Frontal answered 27/3, 2013 at 21:9 Comment(1)
Are you only matching aaa? Replacing it with what?Doorpost
E
120

You can use a look behind to make sure it is not at the beginning. (?<!^)aaa

Epigrammatist answered 27/3, 2013 at 21:11 Comment(5)
Anyway to negate the position assertion?Frontal
I still need to capture aaa. Just wrap in parenthesis?Frontal
For JavaScript only use (?!^)aaaSmuggle
[^\n]aaa should also do the job (did the job for me) and is much simpler/can hence be used when you don't have lookbehinds available.Aleuromancy
if you're doing this in R using gsub(), it works if you set the perl = T optionUnreserved
L
48

Since I came here via Google search, and was interested in a solution that is not using a lookbehind, here are my 2 cents.

The [^^]aaa pattern matches a character other than ^ and then 3 as anywhere inside a string. The [^...] is a negated character class where ^ is not considered a special character. Note the first ^ that is right after [ is special as it denotes a negation, and the second one is just a literal caret symbol.

Thus, a ^ cannot be inside [...] to denote the start of string.

A solution is to use any negative lookaround, these two will work equally well:

(?<!^)aaa

and a lookahead:

(?!^)aaa

Why lookahead works, too? Lookarounds are zero-width assertions, and anchors are zero-width, too - they consume no text. Literally speaking, (?<!^) checks if there is no start of string position immediately to the left of the current location, and (?!^) checks if there is no start of string position immediately to the right of the current location. The same locations are being checked, that is why both work well.

Livengood answered 5/4, 2017 at 13:28 Comment(3)
Thanks for the look-ahead alternative, as Javascript does not yet support look-behind.Gnarl
The lookbehind support has already been introduced in ECMAScript 2018+, so you may use (?<!^) in Node.JS and Chrome, but (?!^) is still good in VBA that follows the ES5 standard and older JS environments including IE.Dentilingual
Lookbehinds are supported only in Chrome 62+ and Opera 49+ (caniuse.com/#feat=js-regexp-lookbehind)Arrhenius
I
16

If you don't want to use lookbehind then use this regex:

/.(aaa)/

And use matched group # 1.

Illuminant answered 27/3, 2013 at 21:20 Comment(2)
This is the best answer because it works with other dialects of regexp as well.Juju
But unless your regex engine supports overlapping matches it won't match multiple occurrences of aaa after each other. xaaaaaa will only have one match.Trucking
D
5

This situation is the first time that I've seen lookarounds outperform \K. Interesting.

Typically capture groups and lookarounds cost additional steps. But due to the nature of this task, the regex engine can navigate the string faster in search of the aaa then look back for a start of the string anchor.

I'll add a couple of \K patterns for comparison.

I am using the s pattern modifier in case the leading character might be a newline character (which . would not normally match). I just thought I would add this consideration to preemptively address a fringe case that I may be posed.

Again, this is an enlightening scenario because in all other regex cases that I've dealt with \K beats out the other techniques.

Step Count Comparison Matrix:

              | `~.\Kaaa~s` | `~.+?\Kaaa~s` | `(?<!^)aaa` | `(?!^)aaa` | `.(aaa)` |
--------------|-------------|---------------|-------------|------------|----------|
`aaa bbb ccc` |   12 steps  |    67 steps   |   8 steps   |  8 steps   | 16 steps |
--------------|-------------|---------------|-------------|------------|----------|
`bbb aaa ccc` |   15 steps  |    12 steps   |   6 steps   |  6 steps   | 12 steps |

The take away is: To learn about the efficiency of your patterns, spit them into regex101.com and compare the step counts.

Also, if you know exactly what substring you are looking for and you don't need a regex pattern, then you should be using strpos() as a matter of best practice (and just check that the returned value is > 0)

...in other words:

if (strpos($haystack, 'aaa')) {
    // 'aaa' is "truthy"
    // 'aaa' is found and not positioned at offset zero
}
Delanty answered 23/4, 2018 at 8:36 Comment(1)
Useful advice about checking the efficiency of regex patterns in regex101.com.Gear
L
3

This will work to find what you are looking for:

(?<!^)aaa

Example in use: http://regexr.com?34ab2

Lazare answered 27/3, 2013 at 21:20 Comment(0)
V
1

I came here looking at a solution for the re2 engine, used by google spreadsheets, which doesn't support lookarounds. But the answers here gave me the idea of using the following. I don't understand why i have to replace by the captured group but anyhow, it works.

aaa bbb ccc
bbb aaa ccc

([^^])aaa

replace by:

$1zzz

reuslts in:

aaa bbb ccc
bbb zzz ccc

Virgilio answered 23/1, 2019 at 16:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.