I need to replace multiple words in a html document. Atm I am doing this by calling replace_with once for each replacement. Calling replace_with twice on a NavigableString leads to a ValueError (see example below) cause the replaced element is no longer in the tree.
Minimal example
#!/usr/bin/env python3
from bs4 import BeautifulSoup
import re
def test1():
html = \
'''
Identify
'''
soup = BeautifulSoup(html,features="html.parser")
for txt in soup.findAll(text=True):
if re.search('identify',txt,re.I) and txt.parent.name != 'a':
newtext = re.sub('identify', '<a href="test.html"> test </a>', txt.lower())
txt.replace_with(BeautifulSoup(newtext, features="html.parser"))
txt.replace_with(BeautifulSoup(newtext, features="html.parser"))
# I called it twice here to make the code as small as possible.
# Usually it would be a different newtext ..
# which was created using the replaced txt looking for a different word to replace.
return soup
print(test1())
Expected Result:
The txt is == newstring
Result:
ValueError: Cannot replace one element with another when the element to be replaced is not
part of the tree.
An easy solution would be just to tinker around with the newstring and only replacing all at once in the end, but I would like to understand the current phenomenon.