Replace named captured groups with arbitrary values in Python
Asked Answered
S

5

7

I need to replace the value inside a capture group of a regular expression with some arbitrary value; I've had a look at the re.sub, but it seems to be working in a different way.

I have a string like this one :

s = 'monthday=1, month=5, year=2018'

and I have a regex matching it with captured groups like the following :

regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')

now I want to replace the group named d with aaa, the group named m with bbb and group named Y with ccc, like in the following example :

'monthday=aaa, month=bbb, year=ccc'

basically I want to keep all the non matching string and substitute the matching group with some arbitrary value.

Is there a way to achieve the desired result ?

Note

This is just an example, I could have other input regexs with different structure, but same name capturing groups ...

Update

Since it seems like most of the people are focusing on the sample data, I add another sample, let's say that I have this other input data and regex :

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'

as you can see I still have the same number of capturing groups(3) and they are named the same way, but the structure is totally different... What I need though is as before replacing the capturing group with some arbitrary text :

'ccc-bbb-aaa'

replace capture group named Y with ccc, the capture group named m with bbb and the capture group named d with aaa.

In the case, regexes are not the best tool for the job, I'm open to some other proposal that achieve my goal.

Saleh answered 20/11, 2017 at 16:38 Comment(12)
regex.sub('monthday=aaa, month=bbb, year=ccc', s)Hierarchy
@Rawing wth your solution I need to hardcode the new result, but it is not what I'm asking for ... I want to replace the matching group with some arbitrary value . This is just an example, I could have other input regex with different structure, but same name capturing groups ...Saleh
You didn't ask anything other than how "to achieve the desired result". Are there any other restrictions we should be aware of? Can you change the regex or is it fixed?Hierarchy
@Rawing read the first line of the question : "I need to replace the value inside a capture group of a regular expression with some arbitrary value", this is not what your solution is actually doing ...Saleh
Are you allowed to change the regex or not?Hierarchy
@aleroot, your current regex requires a strict keywords order. If you need matching keywords in dynamic order - another way should be consideredMat
@Rawing the input regex and the input text could change, what is fixed is the name of the capturing groups that I need to replace with some other data, if you want I could add another dozen sample data with different structure but same number and naming of the capturing groups ...Saleh
@Mat I have updated the question to make it clearer.Saleh
You use capture groups for the parts of the string that you want to copy into the replacement, not for the parts you want to replace. Regular expressions are not a template mechanism.Catboat
@Catboat so, what do you propose to achieve what I need ?Saleh
Use F-strings, perhaps?Catboat
@Catboat At the moment I'm on python 2.7, it is for this reason I was looking to solve the issue with something else ...Saleh
H
9

This is a completely backwards use of regex. The point of capture groups is to hold text you want to keep, not text you want to replace.

Since you've written your regex the wrong way, you have to do most of the substitution operation manually:

"""
Replaces the text captured by named groups.
"""
def replace_groups(pattern, string, replacements):
    pattern = re.compile(pattern)
    # create a dict of {group_index: group_name} for use later
    groupnames = {index: name for name, index in pattern.groupindex.items()}

    def repl(match):
        # we have to split the matched text into chunks we want to keep and
        # chunks we want to replace
        # captured text will be replaced. uncaptured text will be kept.
        text = match.group()
        chunks = []
        lastindex = 0
        for i in range(1, pattern.groups+1):
            groupname = groupnames.get(i)
            if groupname not in replacements:
                continue

            # keep the text between this match and the last
            chunks.append(text[lastindex:match.start(i)])
            # then instead of the captured text, insert the replacement text for this group
            chunks.append(replacements[groupname])
            lastindex = match.end(i)
        chunks.append(text[lastindex:])
        # join all the junks to obtain the final string with replacements
        return ''.join(chunks)

    # for each occurence call our custom replacement function
    return re.sub(pattern, repl, string)
>>> replace_groups(pattern, s, {'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
'monthday=aaa, month=bbb, year=ccc'
Hierarchy answered 20/11, 2017 at 17:37 Comment(1)
+1 for This is a completely backwards use of regex. The point of capture groups is to hold text you want to keep, not text you want to replace. This fixed my mental model and my problem, too.Fabulous
D
2

You can use string formatting with a regex substitution:

import re
s = 'monthday=1, month=5, year=2018'
s = re.sub('(?<=\=)\d+', '{}', s).format(*['aaa', 'bbb', 'ccc'])

Output:

'monthday=aaa, month=bbb, year=ccc'

Edit: given an arbitrary input string and regex, you can use formatting like so:

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'
new_s = re.sub(regex, '{}', input).format(*["aaa", "bbb", "ccc"])
Definite answered 20/11, 2017 at 16:50 Comment(3)
This seems to be positional dependent... what if the input and relative regex change to this format : year=2018, monthday=1, month=5 ? As already written, don't care too much at the sample data, the requirement of the question is : "I need to replace the value inside a capture group of a regular expression with some arbitrary value". The proposed workaround doesn't seem to be doing that ...Saleh
see the updated answer that should clarify what I need, and what the question is really asking. thanks .Saleh
@Saleh it appears that the input you are posting and the groups that you are matching are quite arbitrary. I suggest you approach this problem from a templating point of view.Definite
M
2

Extended Python 3.x solution on extended example (re.sub() with replacement function):

import re

d = {'d':'aaa', 'm':'bbb', 'Y':'ccc'}  # predefined dict of replace words
pat = re.compile('(monthday=)(?P<d>\d{1,2})|(month=)(?P<m>\d{1,2})|(year=)(?P<Y>20\d{2})')

def repl(m):
    pair = next(t for t in m.groupdict().items() if t[1])
    k = next(filter(None, m.groups()))  # preceding `key` for currently replaced sequence (i.e. 'monthday=' or 'month=' or 'year=')
    return k + d.get(pair[0], '')

s = 'Data: year=2018, monthday=1, month=5, some other text'
result = pat.sub(repl, s)

print(result)

The output:

Data: year=ccc, monthday=aaa, month=bbb, some other text

For Python 2.7 : change the line k = next(filter(None, m.groups())) to:

k = filter(None, m.groups())[0]
Mat answered 20/11, 2017 at 17:27 Comment(2)
k = next(filter(None, m.groups())) # preceding key for currently replaced sequence (i.e. 'monthday=' or 'month=' or 'year=') TypeError: tuple object is not an iterator I'm on python 2.7Saleh
I'm on python 2.7, is there a way to make it work on 2.7 ? I can't upgrade at the time.Saleh
T
0

I suggest you use a loop

import re
regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')
s = 'monthday=1, month=1, year=2017   \n'
s+= 'monthday=2, month=2, year=2019'


regex_as_str =  'monthday={d}, month={m}, year={Y}'
matches = [match.groupdict() for match in regex.finditer(s)]
for match in matches:
    s = s.replace(
        regex_as_str.format(**match),
        regex_as_str.format(**{'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
    )    

You can do this multile times wiht your different regex patterns

Or you can join ("or") both patterns together

Thoma answered 20/11, 2017 at 16:53 Comment(0)
C
0
def replace_named_group_with_dict_values(pattern:str,text:str,map:dict):
    for k,v in map.items():
        if match := re.search(pattern, text):
            if k in match.groupdict():
                text = text[:match.start(k)] + str(v) + text[match.end(k):]
    return text

values = {
    'd' : 'aaa',
    'm': 'bbb',
    'Y': 'ccc',
}
s = 'monthday=1, month=5, year=2018'
p = r'monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})'
print(replace_named_group_with_dict_values(p,s,values))
Cespitose answered 5/7 at 8:48 Comment(1)
Thank you for contributing to the Stack Overflow community. This may be a correct answer, but it’d be really useful to explain your code further so developers can understand your reasoning. This is especially useful for new developers who aren’t as familiar with the syntax or are struggling to understand the concepts. Would you kindly edit your answer to include additional details to benefit the community?Baseline

© 2022 - 2024 — McMap. All rights reserved.