How does Python's cmp_to_key function work?
Asked Answered
J

1

66

I came across this function here.

I am baffled as to how this would be implemented -- how does the key function generated by cmp_to_key know what "position" a given element should be without checking how the given element compares with every other element of interest?

Jessikajessup answered 3/5, 2013 at 15:41 Comment(0)
F
87

The cmp_to_key method returns a special object that acts as a surrogate key:

class K(object):
    __slots__ = ['obj']
    def __init__(self, obj, *args):
        self.obj = obj
    def __lt__(self, other):
        return mycmp(self.obj, other.obj) < 0
    def __gt__(self, other):
        return mycmp(self.obj, other.obj) > 0
    def __eq__(self, other):
        return mycmp(self.obj, other.obj) == 0
    def __le__(self, other):
        return mycmp(self.obj, other.obj) <= 0
    def __ge__(self, other):
        return mycmp(self.obj, other.obj) >= 0
    def __ne__(self, other):
        return mycmp(self.obj, other.obj) != 0
    def __hash__(self):
        raise TypeError('hash not implemented')

When sorting, each key will get compared to most other keys in the sequence. Is this element at position 0 lower than or greater than that other object?

Whenever that happens, the special method hooks are invoked, so __lt__ or __gt__ is called, which the surrogate key turns into a call to the cmp method instead.

So the list [1, 2, 3] is sorted as [K(1), K(2), K(3)], and if, say, K(1) is compared with K(2) to see if K(1) is lower, then K(1).__lt__(K(2)) is called, which is translated to mycmp(1, 2) < 0.

This is how the old cmp method was working anyway; return -1, 0 or 1 depending on wether the first argument is lower than, equal to or greater than the second argument. The surrogate key translates those numbers back to boolean values for the comparison operators.

At no point does the surrogate key need to know anything about absolute positions. It only needs to know about one other object it is being compared with, and the special method hooks provide that other object.

Flinger answered 3/5, 2013 at 15:44 Comment(10)
Wow, an actual instance were Java has a much simpler solution to the same problem than Python. There aren't many.Grissom
@jeremyjjbrown: the cmp_to_key function only exists to support legacy Python 2 sorting codebases. The simpler pythonic method is to use a key function for sorted() instead, and if the sorting algorithm call for a cmp()-like approach, implement custom rich comparison hooks directly.Flinger
Sometimes it's very difficult to come up with a key but easy to come up with a compare function... e.g., I want to sort a list of numbers so that all odd numbers come first, in ascending order, and then all even numbers come next, in descending order.Mucilaginous
@Mucilaginous return a tuple with (n % 2 == 0, (n % 2 and 1 or -1) * n) for the key then.Flinger
I am aware that you can make a key for that problem. But I could easily come up with something more difficult.Mucilaginous
@rlbond: then create objects that define their own ordering; you can use those as the key. That includes defining your own rich comparison hooks, as the K class used by cmp_to_key() does.Flinger
Come to this post after I noticed the cmp_to_key function in python doc. Just want to point out that the key function generated like this doesn't have the key benefit of key parameter: to avoid computing the "key" multiple times for an element.Breda
@user49117: The key is computed just once for each element, after which the key's rich comparison hooks will be used exactly as often as a cmp function in Python 2 would have been used. Yes, if you can it is better to use a Swartzian transform by specifying a straight-up sort key function instead, but it is no more inefficient than using the cmp callback in a Python 2 sort would have been.Flinger
Thanks for the great explanation. I have a question that does Python create a new list [K(1), K(2), K(3)] indeed?Vehicle
@roachsinai: When using the key argument to sorted() and list.sort(), an array (a fixed-size C-level data structure) of key(value) entries is created, yes. For cmp_to_key(value) returns K(value) so that's one K() object for every value in the input sequence being sorted.Flinger

© 2022 - 2024 — McMap. All rights reserved.