Why is sorting a python list of tuples faster when I explicitly provide the key as the first element?
Asked Answered
F

1

10

Sorting a list of tuples (dictionary keys,values pairs where the key is a random string) is faster when I do not explicitly specify that the key should be used (edit: added operator.itemgetter(0) from comment by @Chepner and the key version is now faster!):

import timeit

setup ="""
import random
import string

random.seed('slartibartfast')
d={}
for i in range(1000):
    d[''.join(random.choice(string.ascii_uppercase) for _ in range(16))] = 0
"""
print min(timeit.Timer('for k,v in sorted(d.iteritems()): pass',
        setup=setup).repeat(7, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=lambda x: x[0]): pass',
        setup=setup).repeat(7, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass',
        setup=setup).repeat(7, 1000))

Gives:

0.575334150664
0.579534521128
0.523808984422 (the itemgetter version!)

If however I create a custom object passing the key=lambda x: x[0] explicitly to sorted makes it faster:

setup ="""
import random
import string

random.seed('slartibartfast')
d={}

class A(object):
    def __init__(self):
        self.s = ''.join(random.choice(string.ascii_uppercase) for _ in
              range(16))
    def __hash__(self): return hash(self.s)
    def __eq__(self, other):
        return self.s == other.s
    def __ne__(self, other): return self.s != other.s
    # def __cmp__(self, other): return cmp(self.s ,other.s)

for i in range(1000):
    d[A()] = 0
"""
print min(timeit.Timer('for k,v in sorted(d.iteritems()): pass',
        setup=setup).repeat(3, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=lambda x: x[0]): pass',
        setup=setup).repeat(3, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass',
        setup=setup).repeat(3, 1000))

Gives:

4.65625458083
1.87191002252
1.78853626684

Is this expected ? Seems like second element of the tuple is used in the second case but shouldn't the keys compare unequal ?

Note: uncommenting the comparison method gives worse results but still the times are at one half:

8.11941771831
5.29207000173
5.25420037046

As expected built in (address comparison) is faster.

EDIT: here are the profiling results from my original code that triggered the question - without the key method:

         12739 function calls in 0.007 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 <string>:1(<module>)
        1    0.000    0.000    0.007    0.007 __init__.py:6527(_refreshOrder)
        1    0.002    0.002    0.006    0.006 {sorted}
     4050    0.003    0.000    0.004    0.000 bolt.py:1040(__cmp__) # here is the custom object
     4050    0.001    0.000    0.001    0.000 {cmp}
     4050    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
      291    0.000    0.000    0.000    0.000 __init__.py:6537(<lambda>)
      291    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 bolt.py:1240(iteritems)
        1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

and here are the results when I specify the key:

         7027 function calls in 0.004 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.004    0.004 <string>:1(<module>)
        1    0.000    0.000    0.004    0.004 __init__.py:6527(_refreshOrder)
        1    0.001    0.001    0.003    0.003 {sorted}
     2049    0.001    0.000    0.002    0.000 bolt.py:1040(__cmp__)
     2049    0.000    0.000    0.000    0.000 {cmp}
     2049    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
      291    0.000    0.000    0.000    0.000 __init__.py:6538(<lambda>)
      291    0.000    0.000    0.000    0.000 __init__.py:6533(<lambda>)
      291    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 bolt.py:1240(iteritems)
        1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Apparently it is the __cmp__ and not the __eq__ method that is called (edit cause that class defines __cmp__ but not __eq__, see here for the order of resolution of equal and compare).

In the code here __eq__ method is indeed called (8605 times) as seen by adding debug prints (see the comments).

So the difference is as stated in the answer by @chepner. The last thing I am not quite clear on is why are those tuple equality calls needed (IOW why we need to call eq and we don't call cmp directly).

FINAL EDIT: I asked this last point here: Why in comparing python tuples of objects is __eq__ and then __cmp__ called? - turns out it's an optimization, tuple's comparison calls __eq__ in the tuple elements, and only call cmp for not eq tuple elements. So this is now perfectly clear. I thought it called directly __cmp__ so initially it seemed to me that specifying the key is just unneeded and after Chepner's answer I was still not getting where the equal calls come in.

Gist: https://gist.github.com/Utumno/f3d25e0fe4bd0f43ceb9178a60181a53

Fourflush answered 24/12, 2015 at 16:35 Comment(8)
Using lambda x: x[0] only considers the first elementCarlson
@That1Guy, sorry just deleted the comment by mistake, you are talking about c speed vs python so you will get hammered using the methods above in pure pythonCarlson
@That1Guy, the main difference here is if you add a print(self, other ) in eq you won't see it being called at all for the lambda solution, for the non lambda it is called 88 times so you have 88 slow python method callsCarlson
Comparing random objects of the same type should result in comparing their addresses (in CPython) so the first elements should compare unequal in both setups - will edit making the keys 16 chars so they always compare unequal in both setups - same results. So it seems the second element would not be needed in the comparison - so why does it make it faster explicitly omitting it in the second case ? @PadraicCunningham: is the __eq__ call you believe ? Why is it called in the non lambda case ?Fourflush
@Mr_and_Mrs_D, in the non-lambda case eq is called, that is what is killing you.Carlson
@Padraic Cunningham: but why is it called ? I would expect to compare first elements of the tuple in both cases (lambda and non lambda) and those first elements compare unequal in both cases - so I would expect same performance. Is tuple comparison the difference and if so why ? So I would imagine the non lambda to be roughly equivalent to cmp= lambda x, y: x[0] < y[0] or x[1] < y[1] - where is eq here ?Fourflush
Your first example seems to show the opposite of what you claim it does.Popper
@AdamSmith: actually yes - changed when I switched from random strings to random ints. Reverting to the random string version. Random ints version: stackoverflow.com/revisions/…Fourflush
C
9

There are two issues at play.

  1. Comparing two values of builtin types (such as int) happens in C. Comparing two values of a class with an __eq__ method happens in Python; repeatedly calling __eq__ imposes a significant performance penalty.

  2. The function passed with key is called once per element, rather than once per comparison. This means that lambda x: x[0] is called once to build a list of A instances to be compared. Without key, you need to make O(n lg n) tuple comparisons, each of which requires a call to A.__eq__ to compare the first element of each tuple.

The first explains why your first pair of results is under a second while the second takes several seconds. The second explains why using key is faster regardless of the values being compared.

Cerated answered 24/12, 2015 at 18:3 Comment(6)
Thanks - why in my edited int version the key version is faster: stackoverflow.com/revisions/… while here is actually slower ? Also the x[0] are instances of A and I still need O(nlogn) comparisons to sort the list returned by using the lambda key - is not in those comparisons __eq__ called ?Fourflush
It's still O(n lg n), but with a smaller constant. (Instead of making O(n lg n) tuple comparisons and O(n lg n) A comparisons, you are making O(n) key lookups and O(n lg n) A comparisons).Cerated
Aha - thanks - so the eq is called in the tuple comparison ? Why is then the key example in the question slower in the first case ?Fourflush
Tuples and ints are both compared in C; there are no additional Python functions being called. Calling the lambda to extract the first integer from the tuple is slower than just comparing the tuples directly. Try it with key=operator.itemgetter(0) instead (after importing operator, of course).Cerated
It is faster in the integers case: stackoverflow.com/revisions/… but I guess cause there the tuple eq is completed bypassed (some optimization for integer tuples ?) The last bit that's still not clear for me why we need the extra O(n lg n) __eq__ in the non key case ?Fourflush
Ok finally completely got that - the ingredient I was missing is that tuple comparison does not call __cmp__ directly on the tuple elements - it first calls __eq__: https://mcmap.net/q/1166291/-why-in-comparing-python-tuples-of-objects-is-__eq__-and-then-__cmp__-called/281545 - now the comments above make senseFourflush

© 2022 - 2024 — McMap. All rights reserved.