EDIT : This question doesn't really make sense once you have picked up what the "r" flag means. More details here. For people looking for a quick anwser, I added on below.
If I enter a regexp manually in a Python script, I can use 4 combinations of flags for my pattern strings :
- p1 = "pattern"
- p2 = u"pattern"
- p3 = r"pattern"
- p4 = ru"pattern"
I have a bunch a unicode strings coming from a Web form input and want to use them as regexp patterns.
I want to know what process I should apply to the strings so I can expect similar result from the usage of the manual form above. Something like :
import re
assert re.match(p1, some_text) == re.match(someProcess1(web_input), some_text)
assert re.match(p2, some_text) == re.match(someProcess2(web_input), some_text)
assert re.match(p3, some_text) == re.match(someProcess3(web_input), some_text)
assert re.match(p4, some_text) == re.match(someProcess4(web_input), some_text)
What would be someProcess1 to someProcessN and why ?
I suppose that someProcess2 doesn't need to do anything while someProcess1 should do some unicode conversion to the local encoding. For the raw string literals, I am clueless.
re.match
creates a regexp match object and you cannot compare those with==
if they contain the same data (==
checks on identity, which means that the references need to be the same). – Equinox