Use Python's string.replace vs re.sub
Asked Answered
T

4

82

For Python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements?

In PHP, this was explicitly stated but I can't find a similar note for Python.

Trust answered 14/4, 2011 at 19:58 Comment(6)
Avoid regex at all costs! ...Until absolutely necessary...Hero
@jathanism: I respectfully disagree. I avoided regex for decades until I finally took the time to sit down and and actually learn them. Now I can't live without them. Regular expressions are extremely useful for many day-to-day tasks and should be a familiar tool in every programmer's toolbox.Vtehsta
@ridgerunner: Agreed, but it is also important to know when to use them. For simple string manipulations such as this, regular expressions are over the top. My rule of thumb is that if you can do it with the built-in string functions (split(), replace(), find() et al) without needing multiple status variables, complicated slicing etc you should. If it starts getting complex, then you move alternate tools such as regular expressions.Lavation
Oh, and a general comment on the speed of regular expressions: it depends on the context. In a script you run occasionally with a few regular expressions, you won't notice the overhead. On the other hand, in a script which does some intensive/high volume processing you might find the overhead unacceptable when you are using regular expressions lots. This is where profiling is important to determine where the bottleneck is (and I suppose I should trot out the premature optimisation is the root of all evil line at this point too).Lavation
@Blair: I wholeheartedly agree. But many seem to be averse to regex because they find them "difficult" and this is simply because they have not taken the time to learn tem beyond a superficial level. Yes, if a simple string replace solves the problem, then by all means use that, (which is also very likely the fastest solution as well). But I see way too many convoluted, complex string manipulation solutions to problems which are easily solved with a single, well crafted regex.Vtehsta
@ridgerunner: I didn't say not to use regex. It really depends on your use case. I think anyone who has to do parsing--and we all end up doing parsing at some point--will agree that you simply can't live without regex, but you can (and should) avoid it whenever possible.Hero
C
84

As long as you can make do with str.replace(), you should use it. It avoids all the pitfalls of regular expressions (like escaping), and is generally faster.

Celik answered 14/4, 2011 at 19:59 Comment(3)
If you are going to many times substitute, the replace is more fast than subFaker
@SvenMarnach Does this still apply to Python 2.7?Brufsky
@jsc123: This advice is about avoiding pitfalls and unnecessary complexity; so yes, it applies to any Python version. :)Celik
T
67

str.replace() should be used whenever it's possible to. It's more explicit, simpler, and faster.

In [1]: import re

In [2]: text = """For python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements.
In PHP, this was explicitly stated but I can't find a similar note for python.
"""

In [3]: timeit text.replace('e', 'X')
1000000 loops, best of 3: 735 ns per loop

In [4]: timeit re.sub('e', 'X', text)
100000 loops, best of 3: 5.52 us per loop
Tatia answered 14/4, 2011 at 20:13 Comment(6)
Out of curiosity, how were you executing timeit in your example output? Is that something special to iPython allowing you to use that syntax? (Oh, and +1!)Hero
Yup, ipython includes it magically. scienceoss.com/…Tatia
Unsure if this is a typo or I'm missing something, but your str.replace() run has 10x the number of loops as the regex run.Anthropopathy
@alavin89 IPython chooses a "fitting value" for the iteration count if one is not specified (ipython.org/ipython-doc/3/interactive/magics.html#magic-timeit). It's possible that the value it chooses scales based on the time it takes to execute the snippet some small number of times. Since the timing numbers it reports are per loop, the difference in loop counts does not matter significantly.Beberg
What if you had chained multiple replace vs a single regex. At some point a single regex replace should be faster than having N chained replace's on a string, no?Conversational
very interesting but confusingly presented. other than the amount of loops being different by a factor of 10, there is also a difference in units for the time per loop (us vs ns) text.replace took 735 nano-seconds re.sub took 5,520 nano-seconds which is 7.5 times slower.Barayon
T
39

String manipulation is usually preferable to regex when you can figure out how to adapt it. Regex is incredibly powerful, but it's usually slower, and usually harder to write, debug, and maintain.

That being said, notice the amount of "usually" in the above paragraph! It's possible (and I've seen it done) to write a zillion lines of string manipulation for something you could've done with a 20-character regex. It's also possible to waste valuable time using "efficient" string functions on tasks a good regex engine could do almost as fast. Then there's maintainability: Regex can be horribly complex, but sometimes a regex will be simpler and easier to read than a giant block of procedural code.

Regex is fantastic for its intended purpose: searching for highly-variable needles in highly-variable haystacks. Think of it as a precision torque wrench: It's the perfect tool for a specific set of jobs, but it makes a lousy hammer.

Some guidelines you should follow when you aren't sure what to use:

If the answer to any of these questions is "yes", you probably want string manipulation. Otherwise, consider regex.

Tartaric answered 14/4, 2011 at 22:19 Comment(0)
H
11

Another thing to consider is that if you're doing rather complex replacements, str.translate() might be what you're looking for.

Hero answered 14/4, 2011 at 20:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.