Collection that maintains sort order C#
Asked Answered
C

3

13

I have a class Foo which contains a list of objects: List<Bar>. Each Bar has a property which they can be ordered on (of type TimeSpan, representing a duration), and Bar is an immutable object - that is, the duration does not change over the running of the algorithm. At the moment, for each Foo I also maintain the Bar that would be first in the list if it were to be ordered (i.e. the Bar of shortest duration). Something like this:

public class Foo
{
    public List<Bar> AllBars { get; set; }

    public Bar FirstBar { get; set; }

    public Foo (Bar bar)
    {
        FirstBar = bar;

        AllBars = new List<Bar>() { bar };
    }

    public AddBar(Bar bar)
    {
        if(bar.Duration < FirstBar.Duration)
        {
            FirstBar = bar;
        }

        AllBars.Add(bar);
    }
}

This class Foo is used in an algorithm where processing performance (speed) is critical. Memory is important but not as much as speed. There is a list of n Foos, each of which has up to m Bars. This class has served me well up until this point. I now wish to offer the user several choices, meaning I will need to provide random access to the first few Bars in the list.

I would thus like to store my Bars in order so that I can access them by index in order. In my Bar class I implemented IComparable to allow Bars to be compared on duration but I am stuck at choosing an appropriate data type. I looked at System.Collections.SortedList but (unless I am wrong) this appears to reference elements by key as it implements IDictionary. What collection could I use that would maintain my objects such that they stay sorted, and such that they are traversable in order of index?

Calvaria answered 23/7, 2015 at 13:37 Comment(33)
Can't you just sort a regular List<T> using its Sort method? This will need calling after every insert, but allows you to also suppress sorting if you know you're about to add a batch of items. You could decorate List<T> with your own implementation that does the sorting call for you so from the outside you're still using an IList<T>.Basidiospore
Try SortedSet but note that it does not allow duplicates.Calabash
@AdamHouldsworth dont think it goes along with 'performance'Anastomose
@AlexanderKozlov So far as I can tell from the provided code, that was the current solution.Basidiospore
I would go with SortedList. SortedSet doesnt allow you to get element by indexAnastomose
Why not use SortedList and use Duration as the key rather then implementing a comparator?Calabash
@DStanley That would assume knowing the durations by which to index up front, whereas it seems like a regular array index is desired, just in duration order. Seems neither SortedList nor SortedSet achieve that. I don't yet see why sorting a regular list isn't working, or where the performance degradation comes in.Basidiospore
Do you need them to be accessed randomly by index or just in order? In other words - would you need to get AllBars[2] or do you just need to maintain order when iterating with foreach?Calabash
Using a SortedList would perform extremely poorly. It would make every addition to the list quite expensive.Mulholland
@AdamHouldsworth Plain List wasnt intended to store elements in a sorted order. So each time you call .Sort() it sorts whole array. But Sorted* classes maintain elements in a sorted way resulting in faster insertionsAnastomose
I would personally just order it at the point of use in the algorithm. Maintain a flag to state if it needs ordering based on whether it had additions since the last sort, and then make it the caller's responsibility to sort prior to running the algorithm. At the end of the day, the implementation of the sort is specific to this algorithm so I see no issue moving responsibility across to it.Basidiospore
@AlexanderKozlov I know, but I want to understand why something as simple as var sorted = list.OrderBy(_ => _).ToArray() can't be used prior to this algorithm. We can't suggest an implementation without knowing how many insertions occur, how big the list gets, how often this algorithm is called between sorts or w/e.Basidiospore
Can Duration change over the life of an object? If so then re-sorting periodically is the only option (either explicitly or implicitly by the collection).Calabash
@AdamHouldsworth Got your point. Of course if we need to sort all elements once prior to an algorithm, List would be just fineAnastomose
If you want the best X items and you never delete Items, you could maintain a separate sorted collection BestTimes of (TimeSpan, Item). When you insert an item, update the simple List of Item, and then update BestTimes as follows: check if the toBeInserted Item's TimeSpan is better than the worst (TimeSpan, Item) and if so, replace the worst and bubble the the new (TimeSpan, Item) up to its appropriate place.Cattegat
@AdamHouldsworth I could but that would devastate my currently O(n) algorithm, which only ever inserts one Bar at a time. The provided code only compares the inserted item once, to see if FirstBar needs updating. Re: ordering it at point of use - I should have pointed out there are m objects of type Foo.Calvaria
@DStanley I need to access randomly by index. Each Foo (of about 100 000) ends up having 50 to 100 Bars. I will use the first 10 or so that need to be accessed randomly (they are options for the user). And no, Bars are immutable - duration never changes. I will update the question with this information.Calvaria
This is one of those things where you have to decided what the best compromise is. If you want a sorted list, with indexed access and the ability to insert (and maybe remove) items while maintaining the sorted order, then at least one of those operations is going to end up being slow in order to have the others be fast. You need to figure out what the usage is going to look like. Are you going to be inserting a lot? Reading a lot? Reading at random or in order? How big is the list? Etc.Libertinism
@Ivan Have you ever encountered the Wintellect Power Collections?Basidiospore
Is Durantion (timspan) value unique in the collection?Pushing
@Blam No, it is not, however if durations are equal it doesn't matter which is first.Calvaria
@AdamHouldsworth No, I noticed you mentioned it in your answer, thanks I will take a look!Calvaria
If you can live with having "values" that mean nothing, just use a SortedList<Bar, object> where you do not use the value part. Add with yourSortedList.Add(yourBar, null) in O(n) time (the list will have to move "up" all entries after the point where you insert). Retrieve the ith entry in O(1) time with yourSortedList.Keys[i]. EDIT: You will not be able to have duplicates in your SortedList<,>, so two members can not CompareTo each other with return value zero.Thirty
What is "traversable in order of index". Is that index the Duration order or the order of the insert?Pushing
Do Bars get removed?Eustis
What's your ratio of write operations that change the collection to read operations that just examine it? If that ratio is low then just doing a BinarySearch on inserts (and if adding a large number in one go, a plain AddRange followed by a Sort) would be the way to go as while it's expensive, the read-only accesses are as fast as with a list.Chucklehead
@JeppeStigNielsen That sounds like a good approach and should be an answer, not a comment. There are possibly some questions and comments I'd like to make on this approach without getting lost in the above extended discussion.Calvaria
@Blam Sorry, I should be more clear. I want the Bar items in order of duration and accessed in order of duration. Insertion order is unimportant.Calvaria
@Eustis No, Bars never get removed, and they are also immutable.Calvaria
Then my answer worksPushing
@Blam I think the problem with your answer is it didn't maintain the insertion order it's my understanding the OP wants insertion order AND sort order. Your answer ensures they are the same thing, but in doing so, losses the original insertion orderEustis
@Eustis OP clearly states differently?Pushing
@Blam Oh, I read "by index in order" as "by index and by order" the first time. Guess it's time to delete my answer :) . I agree yours worksEustis
T
3

(promoted from a comment, as requested by the asker)

If you can live with having "values" that mean nothing, just use a SortedList<Bar, object> where you do not use the value part.

Add with yourSortedList.Add(yourBar, null) in O(n) time (the list will have to move "up" all entries after the point where you insert). Retrieve the ith entry in O(1) time with yourSortedList.Keys[i].

See the SortedList<,>.Keys property documentation for some "proof" that the above description is correct. Note that a SortedList<,> actually consists of a "list" (i.e. an array of length Capacity, subject to substitution by a larger array when necessary). This is different from SortedDictionary<,> which I believe is a binary search tree.

Note however: You will not be able to have duplicates in your SortedList<,>, so two members in the list are not allowed to CompareTo each other with return value zero.

Thirty answered 24/7, 2015 at 7:34 Comment(0)
L
6

I prefer to use SortedSet<T>, which is a binary tree where the key and value are the same object. This once again means that adding/removing/lookups are logarithmic - O(log n) - but you gain the ability to iterate over the items in order. For this collection to be effective, type T must implement IComparable<T> or you need to supply an external IComparer<T>.

Libeler answered 23/7, 2015 at 14:7 Comment(8)
How do you plan on getting the nth item out of a SortedSet?Sepulcher
You can iterate over in its order. This is probably what the requester need. Now to make it more harder, nth element can't be picked up due to its backing store. However, you can use ( i haven't done any performance testing on this) Enumerable.ElementAt<TSource>().Libeler
I cannot see how GetHashCode() is relevant for a SortedSet<>. It uses only the int values returned by CompareTo (or more precisely the Compare method of the comparer).Thirty
@JeppeStigNielsen, The backing store for SortedSet<T> SortedDictionary<TKey,TValue> is a binary tree where the key and value are the same object. Correct me if I am wrong.Libeler
The binary tree there will not use GetHashCode()?Thirty
@Nair So thus retrieval complexity is O(n)?Anastomose
@AlexanderKozlov, If you want to look at nth element then it is O(n). Otherwise it is O(lon n), due to unchanged binary tree. Did I answer your question?Libeler
@AlexanderKozlov SortedDictionary retrieval is O(log n), #936121Eustis
T
3

(promoted from a comment, as requested by the asker)

If you can live with having "values" that mean nothing, just use a SortedList<Bar, object> where you do not use the value part.

Add with yourSortedList.Add(yourBar, null) in O(n) time (the list will have to move "up" all entries after the point where you insert). Retrieve the ith entry in O(1) time with yourSortedList.Keys[i].

See the SortedList<,>.Keys property documentation for some "proof" that the above description is correct. Note that a SortedList<,> actually consists of a "list" (i.e. an array of length Capacity, subject to substitution by a larger array when necessary). This is different from SortedDictionary<,> which I believe is a binary search tree.

Note however: You will not be able to have duplicates in your SortedList<,>, so two members in the list are not allowed to CompareTo each other with return value zero.

Thirty answered 24/7, 2015 at 7:34 Comment(0)
P
-1

Why not just use List.Insert ?
Insert is O(n) and lets you insert at a specific index.
n + n is still O(n)

public AddBar(Bar bar)
{
    int index = 0;    
    foreach (bar b in AllBar)
    {
       if(bar.Duration < b.Duration)
         break;
       index++;
    }
    AllBars.Insert(index, bar);
}

So you have sort and O(1) index
At a cost of O(n) Add
The current Add in also O(n)

A SortedList in NlogN and then you don't have index as the key is Duration and the key in not unique

A SortedSet insert is LogN but a ToList is O(n) so you are still O(n)

Calling the the Sort method on the list is NlogN

This answers the stated question of: What collection could I use that would maintain my objects such that they stay sorted, and such that they are traversable in order of index?

I don't think you are going to do that with better than O(n) Add.

Pushing answered 23/7, 2015 at 16:48 Comment(2)
Every insert will be O(N)Sorrell
@LasseV.Karlsen Yes I very clearly stated insert is O(n). List.Add is O(n). Did you vote me down? It is sorted with an O(1) index access. Do you have a better solution to the stated problem?Pushing

© 2022 - 2024 — McMap. All rights reserved.