Why are there no sorted containers in Python's standard libraries?
Asked Answered
K

7

114

Is there a Python design decision (PEP) that precludes a sorted container from being added to Python?

(OrderedDict is not a sorted container since it is ordered by insertion order.)

Knorr answered 10/5, 2011 at 16:24 Comment(8)
like collections.OrderedDict?Fascinating
It's just faster. O(1) for hashmap vs O(log n) for ordered set.Amour
@utdmr: OrderedDict is sorted by insertion order — not by an arbitrary key, like a sorted container.Knorr
@NeilG so you mean, it's actually unsorted, because "insertion order", seems to me, is the only order by which elements may end up in a container.Backspin
@Backspin No, that's not what sorted container means. E.g.Knorr
@NeilG I'm unsure what you wanted to say. The post you refer to tangentially explains that a sorted container is one that sorts elements upon insertion. My point is, if you insert an element, and then don't do the sorting, you end up with "insertion order" in the container. And since I don't know other way for elements to end up in a container, it's sort of "natural order". You don't need to sort the container for elements to be that way.Backspin
"sorted container is one that sorts elements upon insertion". Not exactly: I would say that a sorted container is a container whose interface has efficient sorted (according to an arbitrary key) iteration and search. Your misunderstanding stems from your unusual definition.Knorr
It seems now sortedcontainers is installed by default from Python 3.7.5 You can used it directly in coderpad.io and leetcode, without installation. from sortedcontainers import SortedList sl = SortedList(['e', 'a', 'c', 'd', 'b']) print(sl)Epithelioma
B
96

It's a conscious design decision on Guido's part (he was even somewhat reluctant regarding the addition of the collections module). His goal is to preserve "one obvious way to do it" when it comes to the selection of data types for applications.

The basic concept is that if a user is sophisticated enough to realise that the builtin types aren't the right solution for their problem, then they're also up to the task of finding an appropriate third party library.

Given that list+sorting, list+heapq and list+bisect cover many of the use cases that would otherwise rely on inherently sorted data structures, and packages like blist exist, there isn't a huge drive to add more complexity in this space to the standard library.

In some ways, it is similar to the fact that there's no multi-dimensional array in the standard library, instead ceding that task to the NumPy folks.

Backsword answered 11/5, 2011 at 3:39 Comment(6)
Thanks, I was looking for the motivations behind this design decision. This is exactly the kind of answer I was looking for. My initial instinct wouldn't have been to do things this way, but the argument is very convincing.Knorr
collections.Counter can be used as sorted set. Though it may not be efficient.Camber
@coderek: collections.Counter is unsorted and not appropriate for representing a sorted set.Fantoccini
But shouldn't be at least built-in dictionary sorted? The dictionary have to be stored sorted in order to provide fast access to elements, this seems odd to me that when you iterate over it, you still somehow end up with unsorted items.Backspin
@Backspin dict is a hash table.Knorr
Note that as of Python 3.7.0, builtin dictionaries are guaranteed to be insertion order preserving (i.e. iteration and string conversion uses insertion ordering, rather than an arbitrary implementation dependent order).Backsword
A
117

There's also a python sortedcontainers module that implements sorted list, dict, and set types. It's very similar to blist but implemented in pure-Python and in most cases faster.

>>> from sortedcontainers import SortedSet
>>> ss = SortedSet([3, 7, 2, 2])
>>> ss
SortedSet([2, 3, 7])

It also has functionality uncommon to other packages:

>>> from sortedcontainers import SortedDict
>>> sd = SortedDict((num, num) for num in range(100000))
>>> sd.iloc[-5] # Lookup the fifth-to-last key.
99995

Disclosure: I am the author of the sortedcontainers module.

Aludel answered 21/3, 2014 at 19:4 Comment(19)
Nice! You might want to consider updating your documentation to specify that the underlying storage is a rope.Knorr
Also, I believe that blist is written in pure Python as well.Knorr
@NeilG Thanks! Couple notes: blist is not written in pure Python. The sorted set, list, and dict types are based on the blist type which is a B+-tree implemented in C. Also, the underlying structure isn't really a rope; it's more similar to a B+-tree but only has one level of nodes.Aludel
Did you click on the wikipedia article I linked? I think you're using a rope in the same way that a lot of string libraries in C++ do.Knorr
@NeilG Yes, I read that article but it doesn't work like that. It's simpler. I'm not sure how C++ libraries implement a rope but sortedcontainers can only have one level of nodes.Aludel
I see. You could have used multiple levels if you wanted to. It might actually be more efficient as the number of contiguous blocks becomes large.Knorr
It's actually a great example of how big-O can be misleading. It would probably slow around a trillion elements but most people haven't got a terabyte of memory to worry about that. I tested it into the billions of elements and it was as fast as C-implementations. It also uses much less memory by maintaining such a simple, list-based structure.Aludel
Yeah, absolutely. It's the same argument they use to justify using this kind of data structure for strings especially long strings that are used in an editor.Knorr
Anyway, thanks for writing this. I'll keep it in mind if I need this data structure.Knorr
Hi @GrantJ. How can I prevent creation of instances of each character in a string, when I add the string to a SortedSet, via update()? Have just encountered that unintuitive side effect! - Answer seems use add() instead!Marche
Also for @Aludel reddit.com/r/python3/comments/8n7vz2/…Marche
Worked it out in the end.. add takes value, update takes iterator, so when I add a set, I iterate over the set adding one at a time. Would be nice to have overloaded method, or massAdd(), something to that effect..Marche
And.. When you do the add(), be sure to create instance with SortedSet() first then do add(ValueToAdd), rather than SortedSet(ValueToAdd)...Marche
Also, do you still use Set from typings if you pass out a SortedSet - a la def func() -> Set[int]? Or are there specific typings for SortedSet too?Marche
@Marche You can initialize a SortedSet using an iterable: SortedSet(set('abracadabra')). I'm not sure the right answer about typing things.Aludel
While useful, this doesn't answer the question as it's not in the standard library.Merino
@Aludel is your library available as a one file - to include by simple copy-paste, not by a pip? - I would like to use it in an environment, where I can't use pip. I need it for programming contest where I'm allowed to only post one python file, containing both my code, and all available libraries included.Carnes
@Aludel It seems to me that graphs at grantjenks.com/docs/sortedcontainers/performance.html#id2 are wrong. It seems to me that all of them show linear complexity which is wrong in most cases. Eg SortedDict.__contains__() shows linear time complexity which is true for skip-list, but wrong for AVL-tree, B-tree, RB-tree - contains has O(log(N)) time.Carnes
@Carnes I replied to your questions in the duplicate issues opened on the tracker at github.com/grantjenks/python-sortedcontainers/issues . Please use the issue tracker for further conversation.Aludel
B
96

It's a conscious design decision on Guido's part (he was even somewhat reluctant regarding the addition of the collections module). His goal is to preserve "one obvious way to do it" when it comes to the selection of data types for applications.

The basic concept is that if a user is sophisticated enough to realise that the builtin types aren't the right solution for their problem, then they're also up to the task of finding an appropriate third party library.

Given that list+sorting, list+heapq and list+bisect cover many of the use cases that would otherwise rely on inherently sorted data structures, and packages like blist exist, there isn't a huge drive to add more complexity in this space to the standard library.

In some ways, it is similar to the fact that there's no multi-dimensional array in the standard library, instead ceding that task to the NumPy folks.

Backsword answered 11/5, 2011 at 3:39 Comment(6)
Thanks, I was looking for the motivations behind this design decision. This is exactly the kind of answer I was looking for. My initial instinct wouldn't have been to do things this way, but the argument is very convincing.Knorr
collections.Counter can be used as sorted set. Though it may not be efficient.Camber
@coderek: collections.Counter is unsorted and not appropriate for representing a sorted set.Fantoccini
But shouldn't be at least built-in dictionary sorted? The dictionary have to be stored sorted in order to provide fast access to elements, this seems odd to me that when you iterate over it, you still somehow end up with unsorted items.Backspin
@Backspin dict is a hash table.Knorr
Note that as of Python 3.7.0, builtin dictionaries are guaranteed to be insertion order preserving (i.e. iteration and string conversion uses insertion ordering, rather than an arbitrary implementation dependent order).Backsword
B
12

There is also the blist module that contains a sortedset data type:

sortedset(iterable=(), key=None)

>>> from blist import sortedset
>>> my_set = sortedset([3,7,2,2])
sortedset([2, 3, 7]
Bargainbasement answered 30/5, 2012 at 10:10 Comment(0)
A
7

Not exactly a "sorted container", but you might be interested in the standard library's bisect module, which "provides support for maintaining a list in sorted order without having to sort the list after each insertion".

Andesine answered 30/5, 2012 at 13:48 Comment(0)
P
3

There is a heapq in the standard library, it is not exactly sorted, but kind of. There is also a blist package, but it is not in the standard library.

Ponton answered 10/5, 2011 at 16:47 Comment(0)
D
1

For the specific case of a sorted set I find Flag useful, e.g.:

from enum import Flag
Color = Flag('Color', 'RED GREEN BLUE')

This can be used like a set, | is union, & is intersection, and ~ is inversion, e.g.:

set1 = Color.RED | Color.GREEN
set2 = Color.BLUE
union = set1 | set2
intersection = set1 & set2
inversion = ~set1
empty = Color(0)
universal = ~empty
print(universal)

Which prints:

Color.RED|GREEN|BLUE

The sets are automatically sorted in declaration order (point of discussion w.r.t. sets) and the universal set is closed (which I like).

Dammar answered 27/3, 2024 at 23:28 Comment(0)
H
-7

Python lists are ordered. If you sort them, they stay that way. In Python 2.7 an OrderedDict type was added to maintain an explicitly ordereded dictionary.

Python also has sets (a collection in which the members must be unique), but by definition they are unordered. Sorting a set just returns a list.

Hyetography answered 10/5, 2011 at 16:56 Comment(3)
Thanks for taking the time to answer. OrderedDict is sorted by insertion order rather than by an arbitrary key like a sorted container. set is also not a sorted container.Knorr
Is btree perhaps what you're looking for? stackoverflow.com/questions/628192#628432Hyetography
thanks, btree is exactly the kind of thing I was looking for. I'm going to go with blist since it's in MacPorts and has a bunch of handy data structures.Knorr

© 2022 - 2025 — McMap. All rights reserved.