How to rebuild the indices of a FIFO PriorityQueue memory-efficiently
Asked Answered
C

2

2

Working with the PriorityQueue<TElement, TPriority> collection, frequently I have the need to preserve the insertion order of elements with the same priority (FIFO order), and AFAIK the only way to do it is by coupling the TPriority with an extra incremented int index: PriorityQueue<TElement, (TPriority, int)>. This works well, but it makes me feel uncomfortable knowing that eventually, if the queue is used for an extensive amount of time, the int indices will wrap around the Int32.MaxValue limit, breaking the FIFO functionality. I can patch this problem by switching from the int to a long, making it practically impossible for the index to wrap, but still feels dirty. I wonder if it's possible to do better, by rebuilding the indices of the TElement+TPriority pairs when the index is about to wrap with the next Enqueue operation. So I wrote this extension method for indexed FIFO priority queues:

public static void Reindex<TElement, TPriority>(
    this PriorityQueue<TElement, (TPriority, int)> source)
{
    ArgumentNullException.ThrowIfNull(source);
    (TElement, (TPriority, int))[] entries = source.UnorderedItems
        .OrderBy(e => e.Priority, source.Comparer)
        .ToArray();
    source.Clear();
    for (int i = 0; i < entries.Length; i++)
        source.Enqueue(entries[i].Item1, (entries[i].Item2.Item1, i));
}

My problem with this implementation is that it allocates a disproportional amount of memory during the re-indexing. For example 61,456 bytes are allocated for rebuilding the indices of a PriorityQueue<object, (char, int)> with 1,000 elements:

PriorityQueue<object, (char, int)> queue = new(Comparer<(char, int)>.Create((x, y) =>
{
    int result = x.Item1.CompareTo(y.Item1);
    if (result == 0) result = x.Item2.CompareTo(y.Item2);
    return result;
}));

Random random = new(0);
for (int i = 0; i < 100_000; i++)
{
    char key = (char)random.Next(65, 91);
    queue.Enqueue(new object(), (key, i));
}
while (queue.Count > 1_000) queue.Dequeue();

long mem0 = GC.GetTotalAllocatedBytes(true);
queue.Reindex();
long mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Count: {queue.Count:#,0}, Allocated: {mem1 - mem0:#,0} bytes");

Output:

Count: 1,000, Allocated: 61,456 bytes

Live demo.

I would like to ask if it's possible to rebuild the indices with zero allocations (in-place), or at least with no more than System.Runtime.CompilerServices.Unsafe.SizeOf<(TElement, TPriority)>() * queue.Count allocated bytes (16,000 bytes in the above example).

Carousel answered 5/4, 2023 at 14:38 Comment(5)
Most implementations do just use the ever-incrementing "timestamp" as part of the priority calculation, as per your code. You might find this discussion interesting.Prue
@MatthewWatson thanks for the interesting link. Regarding the statement that most implementations just use an ever-incrementing timestamp, I would say that, assuming it's true, most implementations are broken!Carousel
You mean broken because the timestamp will wrap around eventually? Yeah, I guess they use long like you did.Prue
That sounds like a pretty strict definition of "broken". There are ~3x as many positive long values as it would take to represent every "Tick" from now until the year 9999, and on my machine it consistently takes at least a tick to put an item into a PriorityQueue. I'd say you're fine just using a long, and if that "feels dirty" then you can detect an overflow and only re-index when that (never) happens.Heman
@MatthewWatson assuming an extreme frequency of 1,000,000,000 Enqueue operations per second, it would take almost 300 years to wrap a long value, which makes it practically safe. But in order to get this statistical safety you have to increase the per element state by 4 bytes, which is not ideal. And this question transcends practicality. It's about ideals! :-)Carousel
S
2

Based on my tests the biggest memory consumer was the LINQ with ordering (with or without materialization), so just copying the values into array and sorting with Array.Sort in-place reduced the memory consumption drastically:

public static void ReindexNew<TElement, TPriority>(this PriorityQueue<TElement, (TPriority, int)> source)
{
    ArgumentNullException.ThrowIfNull(source);
    (TElement, (TPriority, int))[] entries = new (TElement, (TPriority, int))[source.UnorderedItems.Count];
    int counter = 0;
    foreach (var item in source.UnorderedItems)
    {
        entries[counter++] = item;
    }
    // hope got comparisons right
    Array.Sort(entries, (left, right) => source.Comparer.Compare(left.Item2, right.Item2)); 
    source.Clear();
    for (int i = 0; i < entries.Length; i++)
    {
        entries[i].Item2.Item2 = i;
    }
    source.EnqueueRange(entries);

    // or something like this, requires time benchmarking 
    // for (int i = 0; i < entries.Length; i++)
    // {
    //     source.Enqueue(entries[i].Item1, (entries[i].Item2.Item1, i));
    // }
}

And "tests":

void Test()
{
    long mem0 = GC.GetTotalAllocatedBytes(true);
    queue.ReindexNew();
    long mem1 = GC.GetTotalAllocatedBytes(true);
    Console.WriteLine($"Count: {queue.Count:#,0}, Allocated: {mem1 - mem0:#,0} bytes");
}

Test(); // Count: 1,000, Allocated: 16,136 bytes
Test(); // Count: 1,000, Allocated: 16,112 bytes
Subtotal answered 5/4, 2023 at 17:31 Comment(3)
This is also a great answer! I think you could optimize the line entries[i] = (entries[i].Item1, (entries[i].Item2.Item1, i)); with entries[i].Item2.Item2 = i;. Value-tuples are mutable.Carousel
@TheodorZoulias Updated! Was thinking about that but originally went with "safe" approach and then forgot to check it out.Subtotal
I am accepting this answer, although Jeb's answer is also meeting the requirements, because this one is more performant.Carousel
T
2

I was just playing around with this puzzle. I'm not an expert. Actually i never used a PriorityQueue. But the TryDequeue would reduce the allocations somewhere around 16k bytes

Count: 1.000, Allocated: 16.152 bytes

because it allocates only one item at a time. I can not say anything about the performance thought.

 public static void Reindex2<TElement, TPriority>(
                this PriorityQueue<TElement, (TPriority, int)> source)
            {
                ArgumentNullException.ThrowIfNull(source);
    
                int count = source.Count;
                var orderedItems = new (TElement, (TPriority, int))[count];
    
                int i = 0;
                while (source.TryDequeue(out var item, out var priority))
                {
                    orderedItems[i] = (item, (priority.Item1, priority.Item2));
                    i++;
                }
    
                Array.Sort(orderedItems, (a, b) 
                    => source.Comparer.Compare(a.Item2, b.Item2));
    
                for (i = 0; i < count; i++)
                {
                    source.Enqueue(orderedItems[i].Item1, (orderedItems[i].Item2.Item1, i));
                }
            }
Trustful answered 5/4, 2023 at 15:42 Comment(1)
Thanks! This looks like a pretty good solution to this problem!Carousel
S
2

Based on my tests the biggest memory consumer was the LINQ with ordering (with or without materialization), so just copying the values into array and sorting with Array.Sort in-place reduced the memory consumption drastically:

public static void ReindexNew<TElement, TPriority>(this PriorityQueue<TElement, (TPriority, int)> source)
{
    ArgumentNullException.ThrowIfNull(source);
    (TElement, (TPriority, int))[] entries = new (TElement, (TPriority, int))[source.UnorderedItems.Count];
    int counter = 0;
    foreach (var item in source.UnorderedItems)
    {
        entries[counter++] = item;
    }
    // hope got comparisons right
    Array.Sort(entries, (left, right) => source.Comparer.Compare(left.Item2, right.Item2)); 
    source.Clear();
    for (int i = 0; i < entries.Length; i++)
    {
        entries[i].Item2.Item2 = i;
    }
    source.EnqueueRange(entries);

    // or something like this, requires time benchmarking 
    // for (int i = 0; i < entries.Length; i++)
    // {
    //     source.Enqueue(entries[i].Item1, (entries[i].Item2.Item1, i));
    // }
}

And "tests":

void Test()
{
    long mem0 = GC.GetTotalAllocatedBytes(true);
    queue.ReindexNew();
    long mem1 = GC.GetTotalAllocatedBytes(true);
    Console.WriteLine($"Count: {queue.Count:#,0}, Allocated: {mem1 - mem0:#,0} bytes");
}

Test(); // Count: 1,000, Allocated: 16,136 bytes
Test(); // Count: 1,000, Allocated: 16,112 bytes
Subtotal answered 5/4, 2023 at 17:31 Comment(3)
This is also a great answer! I think you could optimize the line entries[i] = (entries[i].Item1, (entries[i].Item2.Item1, i)); with entries[i].Item2.Item2 = i;. Value-tuples are mutable.Carousel
@TheodorZoulias Updated! Was thinking about that but originally went with "safe" approach and then forgot to check it out.Subtotal
I am accepting this answer, although Jeb's answer is also meeting the requirements, because this one is more performant.Carousel

© 2022 - 2024 — McMap. All rights reserved.