re.sub('a(b)','d','abc')
yields dc
, not adc
.
Why does re.sub
replace the entire capturing group, instead of just capturing group'(b)'?
re.sub('a(b)','d','abc')
yields dc
, not adc
.
Why does re.sub
replace the entire capturing group, instead of just capturing group'(b)'?
Because it's supposed to replace the whole occurrence of the pattern:
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.
If it were to replace only some subgroup, then complex regexes with several groups wouldn't work. There are several possible solutions:
re.sub('ab', 'ad', 'abc')
- my favorite, as it's very readable and explicit.re.sub('(a)b', r'\1d', 'abc')
repl
argument and make it process the Match
object and return required result.re.sub('(?<=a)b', r'd', 'abxb')
yields adxb
. The ?<=
in the beginning of the group says "it's a lookahead".\1
in you regex: re.match(r'([la]{2})-\1', 'la-la')
. It'll match what to group referenced (1
in this cased) matched (not it's pattern), so this regex wouldn't match la-al
for example. –
Quoit I'm aware that this is not strictly answering the OP question, but this question can be hard to google (flooded by \1 explanation ...)
for those who like me came here because they'd like to actually replace a capture group that is not the first one by a string, without special knowledge of the string nor of the regex :
#find offset [start, end] of a captured group within string
r = regex.search(oldText).span(groupNb)
#slice the old string and insert replacementText in the middle
newText = oldText[:r[0]] + replacementText + oldText[r[1]:]
I know it's the wanted behavior, but I still do not understand why re.sub can't specify the actual capture group that it should substitute on...
Because that's exactly what re.sub()
doc tells you it's supposed to do:
'a(b)'
says "match 'a', with optional trailing 'b'". (It could match 'a' on its own, but there is no way it could ever match 'b' on its own as you seem to expect. If you meant that, use a non-greedy (a)??b
).If you want your desired output, you'd need a non-greedy match on the '(a)??'
:
>>> re.sub('(a)??b','d','abc')
'dc'
import re
pattern = re.compile(r"I am (\d{1,2}) .*", re.IGNORECASE)
text = "i am 32 years old"
if re.match(pattern, text):
print(
re.sub(pattern, r"Your are \1 years old.", text, count=1)
)
As above, first we compile a regex pattern with case insensitive flag.
Then we check if the text matches the pattern, if it does, we reference the only group in the regex pattern (age) with group number \1.
if re.match(...)
. If there is no match, the re.sub
call is essentially a no op. –
Kuhl © 2022 - 2024 — McMap. All rights reserved.
re.sub('ab','ad','abc')
orre.sub('(a)b',r'\1d','abc')
, where"\1"
refers to the capturing group. – Fotheringhayre.sub
doc says it does exactly that, no mention of capturing groups: "replacing the leftmost non-overlapping occurrences of the pattern in string" – Despatch