Why is startswith slower than slicing

Asked 7/11, 2012 at 13:35 Answered 7/11, 2012 at 14:44

Why is the implementation of startwith slower than slicing?

In [1]: x = 'foobar'

In [2]: y = 'foo'

In [3]: %timeit x.startswith(y)
1000000 loops, best of 3: 321 ns per loop

In [4]: %timeit x[:3] == y
10000000 loops, best of 3: 164 ns per loop

Surprisingly, even including calculation for the length, slicing still appears significantly faster:

In [5]: %timeit x[:len(y)] == y
1000000 loops, best of 3: 251 ns per loop

Note: the first part of this behaviour is noted in Python for Data Analysis (Chapter 3), but no explanation for it is offered.

If helpful: here is the C code for startswith; and here is the output of dis.dis:

In [6]: import dis

In [7]: dis_it = lambda x: dis.dis(compile(x, '<none>', 'eval'))

In [8]: dis_it('x[:3]==y')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_CONST               0 (3)
              6 SLICE+2             
              7 LOAD_NAME                1 (y)
             10 COMPARE_OP               2 (==)
             13 RETURN_VALUE        

In [9]: dis_it('x.startswith(y)')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_ATTR                1 (startswith)
              6 LOAD_NAME                2 (y)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE

Unijugate answered 7/11, 2012 at 13:35 Comment(6)

Slicing is probably optimized more heavily at the C level – Elle 7/11, 2012 at 13:36

this is not equivalent code. you can pass a tuple to startswith, you can pass start and end parameters, taking care of this takes time. – Codify 7/11, 2012 at 13:40

When you talk about optimization the underlying OS and architecture of the CPU are very important. Is this an x86 (Intel or AMD) or x86-64bit? or an ARM core, on linux or windows...? – Myna 7/11, 2012 at 13:42

@LastCoder This is an Intel x86-64 running osx. I had thought that (broadly) C (and hence CPython) implementations across platforms would be comparable (i.e. if one architecture runs the code slower then others will too)...? – Unijugate 7/11, 2012 at 15:47

@hayden - version differences of OS standard libraries, the version of the compiler used, the amount of available memory when you run the test, the background tasks/processes running, whether the processor is Intel or AMD, etc etc can ALL cause variations in benchmark results. Especially when you're dealing with a repeated test (where the values are the same and nanosecond measurements are being compared. You could be running into caching and branch prediction optimizations on the architecture level that could skew your results. – Myna 7/11, 2012 at 18:32

@LastCoder I agree the numbers themselves will be different (if I run them twice they are different!), but I would be surprised if considerable differences (i.e. one being faster than the other) were not reflected across all platforms given the maturity of the C compiler. I hadn't considered "caching and branch predictions" at an architecture level... interesting. – Unijugate 7/11, 2012 at 19:46

Some of the performance difference can be explained by taking into account the time it takes the . operator to do its thing:

>>> x = 'foobar'
>>> y = 'foo'
>>> sw = x.startswith
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 316 ns per loop
>>> %timeit sw(y)
1000000 loops, best of 3: 267 ns per loop
>>> %timeit x[:3] == y
10000000 loops, best of 3: 151 ns per loop

Another portion of the difference can be explained by the fact that startswith is a function, and even no-op function calls take a bit of time:

>>> def f():
...     pass
... 
>>> %timeit f()
10000000 loops, best of 3: 105 ns per loop

This does not totally explain the difference, since the version using slicing and len calls a function and is still faster (compare to sw(y) above -- 267 ns):

>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 213 ns per loop

My only guess here is that maybe Python optimizes lookup time for built-in functions, or that len calls are heavily optimized (which is probably true). It might be possible to test that with a custom len func. Or possibly this is where the differences identified by LastCoder kick in. Note also larsmans' results, which indicate that startswith is actually faster for longer strings. The whole line of reasoning above applies only to those cases where the overhead I'm talking about actually matters.

Livelong answered 7/11, 2012 at 13:48 Comment(0)

The comparison isn't fair since you're only measuring the case where startswith returns True.

>>> x = 'foobar'
>>> y = 'fool'
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 221 ns per loop
>>> %timeit x[:3] == y  # note: length mismatch
10000000 loops, best of 3: 122 ns per loop
>>> %timeit x[:4] == y
10000000 loops, best of 3: 158 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 210 ns per loop
>>> sw = x.startswith
>>> %timeit sw(y)
10000000 loops, best of 3: 176 ns per loop

Also, for much longer strings, startswith is a lot faster:

>>> import random
>>> import string
>>> x = '%030x' % random.randrange(256**10000)
>>> len(x)
20000
>>> y = r[:4000]
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 211 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 469 ns per loop
>>> sw = x.startswith
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop

This is still true when there's no match.

# change last character of y
>>> y = y[:-1] + chr((ord(y[-1]) + 1) % 256)
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 210 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 470 ns per loop
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop
# change first character of y
>>> y = chr((ord(y[0]) + 1) % 256) + y[1:]
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 210 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 442 ns per loop
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop

So, startswith is probably slower for short strings because it's optimized for long ones.

(Trick to get random strings taken from this answer.)

Turf answered 7/11, 2012 at 14:3 Comment(0)

startswith is more complex than slicing...

2924 result = _string_tailmatch(self,
2925 PyTuple_GET_ITEM(subobj, i),
2926 start, end, -1);

This isn't a simple character compare loop for needle in beginning of haystack that's happening. We're looking at a for loop that is iterating through a vector/tuple (subobj) and calling another function (_string_tailmatch) on it. Multiple function calls have overhead with regards to the stack, argument sanity checks etc...

startswith is a library function while the slicing appears to be built into the language.

2919 if (!stringlib_parse_args_finds("startswith", args, &subobj, &start, &end))
2920 return NULL;

Myna answered 7/11, 2012 at 13:43 Comment(6)

"StartsWith can process vectors." - no idea what you mean by that. – Bane 7/11, 2012 at 13:46

@Livelong - no c#... lol... anyways your technically correct so I cased it properly. – Myna 7/11, 2012 at 13:59

@Bane - look at the source it can process multiple arguments, while the slicing is is intrinsic. – Myna 7/11, 2012 at 14:1

Ok, +1, this makes more sense now. – Livelong 7/11, 2012 at 14:5

@LastCoder: What does the multiple argument usage look like? – Bane 7/11, 2012 at 14:6

@Bane - My wording was poor, I only meant there was more added complexity because of the extra processing involved in having optional arguments str.startswith(prefix[, start[, end]]). The source code makes it appear as though startsWith can process multiple "prefixes" but the documentation doesn't outline this case. – Myna 7/11, 2012 at 14:24

To quote the docs, startswith does more you might think:

str.startswith(prefix[, start[, end]])

Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.

Bane answered 7/11, 2012 at 14:44 Comment(0)

Calling a function is quite expensive. I don't know, however, if this is the case as well for built-in functions written in C.

Be aware, though, that slicing may involve a function call as well depending on the object which is used.

Jussive answered 7/11, 2012 at 13:46 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags