Python: Priority queue does'nt keep order on same priority elements
Asked Answered
C

4

9

I'm using python's Queue.PriorityQueue, and ran into the following problem: when inserting several elements to the queue which have the same priority, I would expect the queue to serve them in the order of insertion (FIFO). For some reason this is not the case:

>>> from Queue import PriorityQueue
>>>
>>> j1 = (1, 'job1')
>>> j2 = (1, 'job2')
>>> j3 = (1, 'job3')
>>> j4 = (1, 'job4')
>>> 
>>> q = PriorityQueue()
>>> q.put(j1)
>>> q.put(j2)
>>> q.put(j3)
>>> q.put(j4)
>>> q.queue
[(1, 'job1'), (1, 'job2'), (1, 'job3'), (1, 'job4')]
>>> q.get()
(1, 'job1')
>>> q.queue
[(1, 'job2'), (1, 'job4'), (1, 'job3')]

As can be seen from the example, the order has been mixed after one get(). What's the reason? how to overcome (keep the order of same prio elements)?

EDIT:

I was asked to add an example that shows that q.get() actually mess things up with the FIFO ordering, so here's an elaborate example:

class Job(object):
    def __init__(self, type_, **data):
        self.type_ = type_
        self.priority = 0 if self.type_ == 'QUIT' else 1
        self.data = data

    def __cmp__(self, other):
        return cmp(self.priority, other.priority)

    def __repr__(self):
        return 'Job("' + self.type_ + '", data=' + repr(self.data) + ')' 

q = PriorityQueue()
q.put(Job('Build'))
q.put(Job('Clean'))
q.put(Job('QUIT'))
q.put(Job('Create'))
q.put(Job('Build'))
q.put(Job('Clean'))

Now I'll dequeue the elements one by one. The expected result: QUIT goes out first, and then the rest, FIFO ordered: Build, Clean, Create, Build, Clean:

>>> q.get()
Job("QUIT", data={})
>>> q.get()
Job("Build", data={})
>>> q.get()
Job("Clean", data={})
>>> q.get()
Job("Build", data={}) # <<---
>>> q.get()
Job("Clean", data={})
>>> q.get()
Job("Create", data={})
Cretaceous answered 25/12, 2017 at 14:45 Comment(5)
Why would you expect insertion order?Grounds
While not exactly the same data structure, an OrderedDict does remember insertion order.Eugeneeugenia
@BradSolomon Why the python-2.7 tag? It happens in Python 3 as well.Grounds
@StefanPochmann because PriorityQueue is subclass of Queue, so I assume FIFO, unless priority winsCretaceous
@OmerDagan Ok, that's reasonable I guess. (Btw note the second paragraph I added to my answer after you accepted it.)Grounds
G
8

Priority queues "are often implemented with heaps" and Python is no exception. As the documentation says, it's "using the heapq module". And heaps don't naturally offer stability. That's also why heapsort "is not a stable sort". If you want stability, you'll need to enforce it yourself. Fortunately it's as simple as storing entries "as 3-element list including the priority, an entry count, and the task".

Note that you give Python's priority queue pairs of priority and task, but the queue doesn't care. It doesn't think of the two values as priority and task. It just thinks of the pair as one "item" and it never even looks into it. Only we the users think of the pair as priority and task. So you could also give it task strings alone, without extra priorities. The queue wouldn't even notice. It doesn't try to extract some priority. For its prioritization it just asks the whole item whether it's smaller than another. That's why, when you want to prioritize tasks not just by their natural order (e.g., the string 'job1' being smaller than the string 'job2'), you use a tuple of priority and task. Tuples are ordered lexicographically, so (a, b) is smaller than (c, d) if a is smaller than c or if they're equal and b is smaller than d. So when the queue asks such a tuple whether it's smaller than another, it's the tuple that looks into itself and considers the priority first and then potentially the task second.

Also, with q.queue you're inspecting the queue's underlying data structure. You shouldn't care about that. Not sure why it's even accessible. But if you do inspect it, you need to look at it as the heap it is, not think of it as a sorted list. It's not that "the order has been mixed" as you put it, it's that you misinterpreted that list. Anyway... the order you should instead care about is the order you actually get. With q.get(). If you just get all four items of that example with q.get(), you'll see that it does give them to you in your insertion order. Although that's because you're inserting them in sorted order and they only have one possible order, as there are no equal items. You'll get (1, 'job1') first not because it was inserted first but because it's the smallest of the four tuples (because the priorities are the same and 'job1' is the smallest of the four strings). And you'll get (1, 'job2') second not because it was inserted second but because it's the second-smallest item. And so on. If you inserted them in any other order, you'd still get them in order (1, 'job1'), (1, 'job2'), (1, 'job3'), (1, 'job4').

About your added example: Your Job objects only compare themselves by their priority. And those Build, Clean, Create, Build and Clean objects all have the same priority. So as far as the queue can tell, they're all equal! That's not like your first example, where your four tuples only allow one possible order. So we're back at what I said at the start, heaps don't naturally offer stability and if you want stability, you should add an entry count. Check out the explanation and recipe I linked there. It uses a list as heap and uses heapq functions, but you can easily adapt it to use a PriorityQueue instead. Though instead of those separate top-level helper functions, maybe better define your own StablePriorityQueue class, as subclass or wrapper of PriorityQueue.

Grounds answered 25/12, 2017 at 15:10 Comment(4)
regarding your second paragraph- it doesn't matter whether I use q.queue or q.get(), it does not preserve ordering within the same prioCretaceous
@OmerDagan You're wrong. Not sure how you're wrong, though. I'd need to see what you're doing now. Can you show it? Here's demonstration that it does what I say: ideone.com/jXj737 Check the code and output there.Grounds
Added an elaborate example showing the problem (using q.get() rather than q.queue)Cretaceous
@OmerDagan Well you didn't just use q.get(). You also completely changed the example. I was talking about your original example. In your new example, your order is vulnerable. Because unlike in your original example, there are "equal" items. So you're back to my first paragraph, heaps don't naturally offer stability and you should add an entry count. Have a look again, I just rewrote pretty much everything except the first paragraph.Grounds
A
3

As explained here, the Python PriorityQueue is implemented with a binary heap.

A binary heap is a binary tree where each node's value is equal or greater the values of both its children. Hence in a binary heap the root always contains the minimum value. Once you remove the minimum node, the heap is reorganized so that the basic heap property is still in effect.

A heap is usually implemented using an array, where a[k] is the parent of a[2*k] and a[2*k+1]. In Python, q.queue is this array. After you remove an element from the heap, the array is reordered in a way that doesn't preserve the original order.

Apollus answered 25/12, 2017 at 15:11 Comment(0)
L
2

The other 2 answers explained what happens.

Although I want to offer you another representation that will help you have a better understanding.

I took a snapshot from this documentation page about heapq. First of all, you can see that PriorityQueue uses a heappop here

Now, to the image.

heappop

In this image, when you pop the first item 0 (job1), 1 ('job2') will take its place, and then, 3 (job4) will take 1 (job2) place. We should conclude by saying this is a normal behaviour.

Latreshia answered 25/12, 2017 at 15:21 Comment(0)
K
0

Here's actual implementable code to make PriorityQueue FIFO. I adapted from momo's original answer to a different question here:

from dataclasses import dataclass, field
from typing import Any, ClassVar

@dataclass(order=True)
class FifoPriorityQueueItem:
    data: Any=field(default=None, compare=False)
    priority: int=10
    sequence: int=field(default_factory=lambda: {0})
    counter: ClassVar[int] = 0

    def get_data(self):
        return self.data

    def __post_init__(self):
        self.sequence = FifoPriorityQueueItem.next_seq()

    @staticmethod
    def next_seq():
        FifoPriorityQueueItem.counter += 1
        return FifoPriorityQueueItem.counter

def main():
    import asyncio
    print('with FifoPriorityQueueItem is FIFO')
    q = asyncio.PriorityQueue()
    q.put_nowait(FifoPriorityQueueItem('z'))
    q.put_nowait(FifoPriorityQueueItem('y'))
    q.put_nowait(FifoPriorityQueueItem('b', priority=1))
    q.put_nowait(FifoPriorityQueueItem('x'))
    q.put_nowait(FifoPriorityQueueItem('a', priority=1))
    while not q.empty():
        print(q.get_nowait().get_data())

    print('without FifoPriorityQueueItem is no longer FIFO')
    q.put_nowait((10, 'z'))
    q.put_nowait((10, 'y'))
    q.put_nowait((1, 'b'))
    q.put_nowait((10, 'x'))
    q.put_nowait((1, 'a'))
    while not q.empty():
        print(q.get_nowait()[1])

if __name__ == '__main__':
    main()
Keitloa answered 5/10, 2022 at 1:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.