Python regex to avoid a character earlier in the string
Asked Answered
O

2

1

I'd like to use a regex to find an exact string, but not if it's part of a comment, as designated by //.

So for example, in the string:

hello apple apples // eat an apple

It should match the first apple but not the second or third.

So, I think the regex would be something like this. It would find the string with word breaks around it, but not if the // is behind it:

(?<!\/\/)\bapple\b

The problem with negative look-behind in this case is that it only looks immediately next to the word. I'd need it to look farther back, to make sure the comment symbol does not exist earlier in the string.

Offish answered 19/1, 2016 at 0:35 Comment(1)
(?<!//.*)\bapple\b with Python regex package could also work.Hyphenated
H
4

this pattern will catch what you want in the first sub-pattern

\/\/.*|\b(apple)\b

Demo

Heliopolis answered 19/1, 2016 at 0:43 Comment(3)
By using \/\/.*, are you forcing Python to find any mention of apple in a comment first so that it won't be matched again when looking for \b(apple)\b? Because that's a brilliant approach that I would have never thought of.Catamaran
not necessarily find comment first, but find and CAPTURE what you want, find but don't capture what you don't want.Heliopolis
I agree. This is a very clever answer, thank you! It even works in the other direction too. .*\/\/|\b(apple)\b would get you the strings that ARE present in the commented section.Offish
G
0

I think you just need to escape your comment for the lookbehind assertion;

    (?<!\/\/)\b(apple)\b ## doesn't work, don't use this.

Try it -- regex101.com

Gdynia answered 19/1, 2016 at 0:57 Comment(4)
Thanks, but I don't think this works. Try putting a g in the modifier field, and you'll see that it detects the last apple in the string. regex101.com/r/rG7aH9/1 ...but, I did update the question to escape the slashes.Offish
Well, you didn't say you wanted it to find more than one. @alpha bravo has the right solution either way.Gdynia
No, that's my point, it's not supposed to match the last apple in the string, but does match it. "It should match the first apple but not the second or third."Offish
You're right, just \b(apple)\b finds only the first when not using the g modifier, and matches the comment when the first isn't present. Ignore my answer entirely, it's wrong.Gdynia

© 2022 - 2024 — McMap. All rights reserved.