Why is sorting a python list of tuples faster when I explicitly provide the key as the first element?

Sorting a list of tuples (dictionary keys,values pairs where the key is a random string) is faster when I do not explicitly specify that the key should be used (edit: added operator.itemgetter(0) from comment by @Chepner and the key version is now faster!):

import timeit

setup ="""
import random
import string

random.seed('slartibartfast')
d={}
for i in range(1000):
    d[''.join(random.choice(string.ascii_uppercase) for _ in range(16))] = 0
"""
print min(timeit.Timer('for k,v in sorted(d.iteritems()): pass',
        setup=setup).repeat(7, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=lambda x: x[0]): pass',
        setup=setup).repeat(7, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass',
        setup=setup).repeat(7, 1000))

Gives:

0.575334150664
0.579534521128
0.523808984422 (the itemgetter version!)

If however I create a custom object passing the key=lambda x: x[0] explicitly to sorted makes it faster:

setup ="""
import random
import string

random.seed('slartibartfast')
d={}

class A(object):
    def __init__(self):
        self.s = ''.join(random.choice(string.ascii_uppercase) for _ in
              range(16))
    def __hash__(self): return hash(self.s)
    def __eq__(self, other):
        return self.s == other.s
    def __ne__(self, other): return self.s != other.s
    # def __cmp__(self, other): return cmp(self.s ,other.s)

for i in range(1000):
    d[A()] = 0
"""
print min(timeit.Timer('for k,v in sorted(d.iteritems()): pass',
        setup=setup).repeat(3, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=lambda x: x[0]): pass',
        setup=setup).repeat(3, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass',
        setup=setup).repeat(3, 1000))

Gives:

4.65625458083
1.87191002252
1.78853626684

Is this expected ? Seems like second element of the tuple is used in the second case but shouldn't the keys compare unequal ?

Note: uncommenting the comparison method gives worse results but still the times are at one half:

8.11941771831
5.29207000173
5.25420037046

As expected built in (address comparison) is faster.

EDIT: here are the profiling results from my original code that triggered the question - without the key method:

         12739 function calls in 0.007 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 <string>:1(<module>)
        1    0.000    0.000    0.007    0.007 __init__.py:6527(_refreshOrder)
        1    0.002    0.002    0.006    0.006 {sorted}
     4050    0.003    0.000    0.004    0.000 bolt.py:1040(__cmp__) # here is the custom object
     4050    0.001    0.000    0.001    0.000 {cmp}
     4050    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
      291    0.000    0.000    0.000    0.000 __init__.py:6537(<lambda>)
      291    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 bolt.py:1240(iteritems)
        1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

and here are the results when I specify the key:

         7027 function calls in 0.004 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.004    0.004 <string>:1(<module>)
        1    0.000    0.000    0.004    0.004 __init__.py:6527(_refreshOrder)
        1    0.001    0.001    0.003    0.003 {sorted}
     2049    0.001    0.000    0.002    0.000 bolt.py:1040(__cmp__)
     2049    0.000    0.000    0.000    0.000 {cmp}
     2049    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
      291    0.000    0.000    0.000    0.000 __init__.py:6538(<lambda>)
      291    0.000    0.000    0.000    0.000 __init__.py:6533(<lambda>)
      291    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 bolt.py:1240(iteritems)
        1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Apparently it is the __cmp__ and not the __eq__ method that is called (edit cause that class defines __cmp__ but not __eq__, see here for the order of resolution of equal and compare).

In the code here __eq__ method is indeed called (8605 times) as seen by adding debug prints (see the comments).

So the difference is as stated in the answer by @chepner. The last thing I am not quite clear on is why are those tuple equality calls needed (IOW why we need to call eq and we don't call cmp directly).

FINAL EDIT: I asked this last point here: Why in comparing python tuples of objects is __eq__ and then __cmp__ called? - turns out it's an optimization, tuple's comparison calls __eq__ in the tuple elements, and only call cmp for not eq tuple elements. So this is now perfectly clear. I thought it called directly __cmp__ so initially it seemed to me that specifying the key is just unneeded and after Chepner's answer I was still not getting where the equal calls come in.

Gist: https://gist.github.com/Utumno/f3d25e0fe4bd0f43ceb9178a60181a53

There are two issues at play.

Comparing two values of builtin types (such as int) happens in C. Comparing two values of a class with an __eq__ method happens in Python; repeatedly calling __eq__ imposes a significant performance penalty.
The function passed with key is called once per element, rather than once per comparison. This means that lambda x: x[0] is called once to build a list of A instances to be compared. Without key, you need to make O(n lg n) tuple comparisons, each of which requires a call to A.__eq__ to compare the first element of each tuple.

The first explains why your first pair of results is under a second while the second takes several seconds. The second explains why using key is faster regardless of the values being compared.

Recommended topics

Hot tags