I want to make multiple substitutions to a string using multiple regular expressions. I also want to make the substitutions in a single pass to avoid creating multiple instances of the string.
Let's say for argument that I want to make the substitutions below, while avoiding multiple use of re.sub(), whether explicitly or with a loop:
import re
text = "local foals drink cola"
text = re.sub("(?<=o)a", "w", text)
text = re.sub("l(?=a)", "co", text)
print(text) # "local fowls drink cocoa"
The closest solution I have found for this is to compile a regular expression from a dictionary of substitution targets and then to use a lambda function to replace each matched target with its value in the dictionary. However, this approach does not work when using metacharacters, thus removing the functionality needed from regular expressions in this example.
Let me demonstrate first with an example that works without metacharacters:
import re
text = "local foals drink cola"
subs_dict = {"a":"w", "l":"co"}
subs_regex = re.compile("|".join(subs_dict.keys()))
text = re.sub(subs_regex, lambda match: subs_dict[match.group(0)], text)
print(text) # "coocwco fowcos drink cocow"
Now observe that adding the desired metacharacters to the dictionary keys results in a KeyError:
import re
text = "local foals drink cola"
subs_dict = {"(?<=o)a":"w", "l(?=a)":"co"}
subs_regex = re.compile("|".join(subs_dict.keys()))
text = re.sub(subs_regex, lambda match: subs_dict[match.group(0)], text)
>>> KeyError: "a"
The reason for this is that the sub() function correctly finds a match for the expression "(?<=o)a"
, so this must now be found in the dictionary to return its substitution, but the value submitted for dictionary lookup by match.group(0)
is the corresponding matched string "a"
. It also does not work to search for match.re
in the dictionary (i.e. the expression that produced the match) because the value of that is the whole disjoint expression that was compiled from the dictionary keys (i.e. "(?<=o)a|l(?=a)"
).
EDIT: In case anyone would benefit from seeing thejonny's solution implemented with a lambda function as close to my originals as possible, it would work like this:
import re
text = "local foals drink cola"
subs_dict = {"(?<=o)a":"w", "l(?=a)":"co"}
subs_regex = re.compile("|".join("("+key+")" for key in subs_dict))
group_index = 1
indexed_subs = {}
for target, sub in subs_dict.items():
indexed_subs[group_index] = sub
group_index += re.compile(target).groups + 1
text = re.sub(subs_regex, lambda match: indexed_subs[match.lastindex], text)
print(text) # "local fowls drink cocoa"