Using queue.PriorityQueue, not caring about comparisons
Asked Answered
W

5

4

I'm trying to use queue.PriorityQueue in Python 3(.6).

I would like to store objects with a given priority. But if two objects have the same priority, I don't mind PriorityQueue.get to return either. In other words, my objects can't be compared at integers, it won't make sense to allow them to be, I just care about the priority.

In Python 3.7's documentation, there's a solution involving dataclasses. And I quote:

If the data elements are not comparable, the data can be wrapped in a class that ignores the data item and only compares the priority number:

from dataclasses import dataclass, field
from typing import Any

@dataclass(order=True)
class PrioritizedItem:
    priority: int
    item: Any=field(compare=False)

Alas, I'm using Python 3.6. In the documentation of this version of Python, there's no comment on using PriorityQueue for the priorities, not bothering about the "object value" which wouldn't be logical in my case.

Is there a better way than to define __le__ and other comparison methods on my custom class? I find this solution particularly ugly and counter-intuitive, but that might be me.

Weeds answered 3/1, 2019 at 18:28 Comment(2)
Do your objects actually throw an exception if you compare them to each other? If not, then there's nothing to worry about.Chladek
@Chladek just because two objects compare without an error being thrown does not mean that the objects are sorting in a reasonable order. Look at this question: https://mcmap.net/q/1067388/-python-default-comparisonBrno
C
8

dataclasses is just a convenience method to avoid having to create a lot of boilerplate code.

You don't actually have to create a class. A tuple with a unique counter value too:

from itertools import count

unique = count()

q.put((priority, next(unique), item))

so that ties between equal priority are broken by the integer that follows; because it is always unique the item value is never consulted.

You can also create a class using straight-up rich comparison methods, made simpler with @functools.total_ordering:

from functools import total_ordering

@total_ordering
class PrioritizedItem:
    def __init__(self, priority, item):
        self.priority = priority
        self.item = item

    def __eq__(self, other):
        if not isinstance(other, __class__):
            return NotImplemented
        return self.priority == other.priority

    def __lt__(self, other):
        if not isinstance(other, __class__):
            return NotImplemented
        return self.priority < other.priority
Callum answered 3/1, 2019 at 19:9 Comment(4)
very elegant way to get the distinguishing elementHeffernan
I really like the counter() solution. It has the advantage of avoiding to create a class (some lines of code, but not much) and the added benefit that it doesn't require to create instances just for the priority counter. This is a performance issue and probably a very minor issue except when dealing with millions of objects, but I still find this solution more elegant. Thanks!Weeds
Note that your handling of the counter is not thread-safe; the queue module is explicitly thread-safe (I personally use the heapq module directly when thread-safety is not required), so this not being thread-safe could be a big issue.Callum
@AndyTheEntity: please don't add self. to the __class__ references. It's deliberate, the (unqualified) __class__ reference is a closure that gives you the class where the method is defined, and not that of a current subclass.Callum
H
2

See priority queue implementation notes - just before the section you quoted (regarding using dataclasses) it tells you how to do it whitout them:

... is to store entries as 3-element list including the priority, an entry count, and the task. The entry count serves as a tie-breaker so that two tasks with the same priority are returned in the order they were added. And since no two entry counts are the same, the tuple comparison will never attempt to directly compare two tasks.

So simply add your items as 3rd element in a tuple (Prio, Count, YourElem) when adding to your queue.

Contreived example:

from queue import PriorityQueue

class CompareError(ValueError): pass

class O:
    def __init__(self,n):
        self.n = n

    def __lq__(self):
        raise CompareError

    def __repr__(self): return str(self)
    def __str__(self): return self.n

def add(prioqueue,prio,item):
    """Adds the 'item' with 'prio' to the 'priorqueue' adding a unique value that
    is stored as member of this method 'add.n' which is incremented on each usage."""
    prioqueue.put( (prio, add.n, item))
    add.n += 1

# no len() on PrioQueue - we ensure our unique integer via method-param
# if you forget to declare this, you get an AttributeError
add.n = 0

h = PriorityQueue()

add(h, 7, O('release product'))
add(h, 1, O('write spec 3'))
add(h, 1, O('write spec 2'))
add(h, 1, O('write spec 1'))
add(h, 3, O('create tests'))

for _ in range(4):
    item = h.get()
    print(item)

Using h.put( (1, O('write spec 1')) ) leads to

TypeError: '<' not supported between instances of 'O' and 'int'`

Using def add(prioqueue,prio,item): pushes triplets as items wich have guaranteed distinct 2nd values so our O()-instances are never used as tie-breaker.

Output:

(1, 2, write spec 3)
(1, 3, write spec 2)
(1, 4, write spec 1)
(3, 5, create tests)

see MartijnPieters answer @here for a nicer unique 2nd element.

Heffernan answered 3/1, 2019 at 18:45 Comment(6)
Why are you delving into heappush when this question is specifically about using queue.PriorityQueue?Brno
@NickChapman I what? heappush? where? :o) thanks for hinting my blunderHeffernan
Thanks a lot, I find this solution much more elegant and optimized. A Counter object actually allows to reduce the boiler-plate code even more. That proves: I don't read the documentation right!Weeds
@Weeds well, that documentation is part of the heapq module. It so happens that queue.PriorityQueue is implemented on top op heapq, but it takes some thorough reading to get from the queue documentation to that advice in the heapq module docs.Callum
@MartijnPieters heapq is referenced on top of the site that he quoted from: With a priority queue, the entries are kept sorted (using the heapq module) and the lowest valued entry is retrieved first thats why I landed thereHeffernan
@PatrickArtner yes, I didn’t say otherwise! But to expect someone looking at the queue module docs to read the heapq module documentation in enough detail to find that section is quite a big ask.Callum
A
0

Let's assume that we don't want to write a decorator with equivalent functionality to dataclass. The problem is that we don't want to have to define all of the comparison operators in order to make our custom class comparable based on priority. The @functools.total_ordering decorator can help. Excerpt:

Given a class defining one or more rich comparison ordering methods, this class decorator supplies the rest. This simplifies the effort involved in specifying all of the possible rich comparison operations:

The class must define one of __lt__(), __le__(), __gt__(), or __ge__(). In addition, the class should supply an __eq__() method.

Using the provided example:

from functools import total_ordering

@total_ordering
class PrioritizedItem:
    # ...

    def __eq__(self, other):
        return self.priority == other.priority

    def __lt__(self, other):
        return self.priority < other.priority
Abrahan answered 3/1, 2019 at 18:49 Comment(7)
you only need __lt__ in order to sort so you don't even need to worry about generating the rest of the rich comparison operators.Brno
@NickChapman: list.sort promises to only use <, but the fact that heapq or queue.PriorityQueue only use < is an implementation detail. Also, relying on <-only comparisons is a recipe for weird surprises anyway. It's best to provide all comparison operations.Exorcist
Also, you should really return NotImplemented when other is of an unrecognized type instead of just unconditionally accessing other.priority.Exorcist
@Exorcist can you explain why it should be that way? Also wouldn't a TypeError be more appropriate?Abrahan
@ChrisHunt: The TypeError isn't __lt__'s responsibility. Comparison methods are supposed to return NotImplemented if they don't know how to perform a comparison, to give the other side a chance to implement it. If both sides return NotImplemented, the comparison machinery raises a TypeError (or returns False/True for ==/!=).Exorcist
@user2357112, thanks. The text here coupled with the description of NotImplemented helped me understand. Is the convention to use duck-typing and catch AttributeError on accessing the relevant properties on other or explicitly check the type as in the arithmetic operations and @MartijnPieters answer?Abrahan
@ChrisHunt: Explicit type checking. In this kind of situation, you usually care about the actual type, not what attributes the argument has.Exorcist
B
0

All you need is a wrapper class that implements __lt__ in order for PriorityQueue to work correctly. This is noted here:

The sort routines are guaranteed to use __lt__() when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an __lt__() method

It's as simple as something like this

class PriorityElem:
    def __init__(self, elem_to_wrap):
        self.wrapped_elem = elem_to_wrap

    def __lt__(self, other):
        return self.wrapped_elem.priority < other.wrapped_elem.priority

If your elements do not have priorities then it's as simple as:

class PriorityElem:
    def __init__(self, elem_to_wrap, priority):
        self.wrapped_elem = elem_to_wrap
        self.priority = other.priority

    def __lt__(self, other):
        return self.priority <  other.priority

Now you can use PriorityQueue like so

queue = PriorityQueue()
queue.put(PriorityElem(my_custom_class1, 10))
queue.put(PriorityElem(my_custom_class2, 10))
queue.put(PriorityElem(my_custom_class3, 30))

first_returned_elem = queue.get()
# first_returned_elem is PriorityElem(my_custom_class1, 10)
second_returned_elem = queue.get()
# second_returned_elem is PriorityElem(my_custom_class2, 10)
third_returned_elem = queue.get()
# third_returned_elem is PriorityElem(my_custom_class3, 30)

Getting at your original elements in that case would be as simple as

elem = queue.get().wrapped_elem

Since you don't care about sort stability that's all you need.

Edit: As noted in the comments and confirmed here, heappush is not stable:

unlike sorted(), this implementation is not stable.

Brno answered 3/1, 2019 at 18:55 Comment(8)
list.sort and sorted may be stable, but queue.PriorityQueue is based on a heap, and the heap implementation isn't stable.Exorcist
The current heapq implementation indeed only uses < to make comparisons, but this is an implementation detail. I'd not rely on such details and instead provide more comparison methods.Callum
@MartijnPieters if it is an implementation detail then we should create an issue against the documentation excerpted in the post which guarantees that it will only use __lt__().Abrahan
@ChrisHunt: a priority queue doesn't sort. It is built using the heapq module, it's a binary tree stored in a list.Callum
Also see Is it safe to just implement __lt__ for a class that will be sorted?, where Raymond Hettinger points out you should not really only on __lt__. I'll look at why the HOWTO still claims __lt__ is enough, because no other Python documentation page makes the same claim.Callum
@MartijnPieters my mistake, I thought the linked documentation was for PriorityQueue. The answer you linked makes a pretty strong argument for total_ordering over just defining __lt__.Abrahan
Thanks, this does work and is easy to read. A three-tuple with a counter in the middle might actually be more optimized, it would avoid having to create instances just to feed the priority queue, but that's more of a detail.Weeds
I’ve filed a documentation issue to see if that claim can be removed. It has no provenance that I can find. See bugs.python.org/issue35654Callum
H
0

another interesting solution in building off Martin's answer is to create a counter and use walrus operator to increment unique tie breakers within the argument.

counter = 0
q.put((priority, counter:= counter+1, item))
Hookah answered 12/4, 2023 at 5:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.