Python has an ordered dictionary. What about an ordered set?
There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.
OrderedSet([1, 2, 3])
This is a MutableSet, so the signature for .union
doesn't match that of set, but since it includes __or__
something similar can easily be added:
@staticmethod
def union(*sets):
union = OrderedSet()
union.union(*sets)
return union
def union(self, *sets):
for set in sets:
self |= set
update
, union
, intersection
. –
Fossorial union
in the same class. The last one will "win" and the first one will fail to exist at runtime. This is because OrderedSet.union
(no parens) has to refer to a single object. –
Adversaria The answer is no, but as of Python 3.7 you can use the simple dict
from the Python standard library with just keys (and values as None
) for the same purpose.
Here's an example of how to use dict
as an ordered set to filter out duplicate items while preserving order, thereby emulating an ordered set. Use the dict
class method fromkeys()
to create a dict, then simply ask for the keys()
back.
>>> keywords = ['foo', 'bar', 'bar', 'foo', 'baz', 'foo']
>>> list(dict.fromkeys(keywords))
['foo', 'bar', 'baz']
For older versions of Python, use the collections.OrderedDict
dict.fromkeys()
. But in that case, key order is only preserved in CPython 3.6+ implementations, so OrderedDict
is a more portable solution when order matters. –
Lingo keys = (1,2,3,1,2,1)
list(OrderedDict.fromkeys(keys).keys())
-> [1, 2, 3]
, python-3.7. It works. –
Charlacharlady dict
, set
in Python 3.7+ unfortunately does not preserve order. –
Chaffer dict
is not guaranteed to preserve order. From the link, "The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon." –
Proteiform dict
retains insertion order, but it doesn't necessarily retain order after operations on its contents. It also points out several features that OrderedDict
has, but dict
does not. dict
may be sufficient for the specific problem posed in this question, but one shouldn't take dict
's support for retaining insertion order to mean that it is a complete replacement for OrderedDict
. –
Moua There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.
OrderedSet([1, 2, 3])
This is a MutableSet, so the signature for .union
doesn't match that of set, but since it includes __or__
something similar can easily be added:
@staticmethod
def union(*sets):
union = OrderedSet()
union.union(*sets)
return union
def union(self, *sets):
for set in sets:
self |= set
update
, union
, intersection
. –
Fossorial union
in the same class. The last one will "win" and the first one will fail to exist at runtime. This is because OrderedSet.union
(no parens) has to refer to a single object. –
Adversaria Update: This answer is obsolete as of Python 3.7. See jrc's answer above for a better solution. Will keep this answer here only for historical reasons.
An ordered set is functionally a special case of an ordered dictionary.
The keys of a dictionary are unique. Thus, if one disregards the values in an ordered dictionary (e.g. by assigning them None
), then one has essentially an ordered set.
As of Python 3.1 and 2.7 there is collections.OrderedDict
. The following is an example implementation of an OrderedSet. (Note that only few methods need to be defined or overridden: collections.OrderedDict
and collections.MutableSet
do the heavy lifting.)
import collections
class OrderedSet(collections.OrderedDict, collections.MutableSet):
def update(self, *args, **kwargs):
if kwargs:
raise TypeError("update() takes no keyword arguments")
for s in args:
for e in s:
self.add(e)
def add(self, elem):
self[elem] = None
def discard(self, elem):
self.pop(elem, None)
def __le__(self, other):
return all(e in other for e in self)
def __lt__(self, other):
return self <= other and self != other
def __ge__(self, other):
return all(e in self for e in other)
def __gt__(self, other):
return self >= other and self != other
def __repr__(self):
return 'OrderedSet([%s])' % (', '.join(map(repr, self.keys())))
def __str__(self):
return '{%s}' % (', '.join(map(repr, self.keys())))
difference = property(lambda self: self.__sub__)
difference_update = property(lambda self: self.__isub__)
intersection = property(lambda self: self.__and__)
intersection_update = property(lambda self: self.__iand__)
issubset = property(lambda self: self.__le__)
issuperset = property(lambda self: self.__ge__)
symmetric_difference = property(lambda self: self.__xor__)
symmetric_difference_update = property(lambda self: self.__ixor__)
union = property(lambda self: self.__or__)
OrderedSet
which subclasses OrderedDict
and abc.Set
and then define __len__
, __iter__
and __contains__
. –
Quietus collections
, but otherwise a good suggestion –
Singhalese OrderedSet([1,2,3])
raises a TypeError. How does the constructor even work? Missing usage example. –
Fossorial dict
(since it is now ordered) via composition rather than inheritance, and (3) use collections.abc.MutableSet
. –
Substance __sub__
etc are not normally defined. Needs to be imported or referenced differently. –
Jilly NameError: name '__sub__' is not defined
(Python 3.7 and 3.8). –
Murderous TypeError: 'int' object is not iterable
when trying the most basic use-case of OrderedSet([1, 3, 2])
–
Procurator Implementations on PyPI
While others have pointed out that there is no built-in implementation of an insertion-order preserving set in Python (yet), I am feeling that this question is missing an answer which states what there is to be found on PyPI.
There are the packages:
- ordered-set (Python based)
- collections-extended
- boltons (under setutils.IndexedSet, Python-based)
- oset (last updated in 2012)
Some of these implementations are based on the recipe posted by Raymond Hettinger to ActiveState which is also mentioned in other answers here.
Some differences
- ordered-set (version 1.1)
- advantage: O(1) for lookups by index (e.g.
my_set[5]
) - oset (version 0.1.3)
- advantage: O(1) for
remove(item)
- disadvantage: apparently O(n) for lookups by index
Both implementations have O(1) for add(item)
and __contains__(item)
(item in my_set
).
set.union
don't work on it though, even though it inherits collections.abc.Set
. –
Crying OrderedSet
now supports remove
–
Otiliaotina SortedSet
from sortedcontainers 2.3.0 with a bunch of other sorted stuff. –
Eichelberger I can do you one better than an OrderedSet: boltons has a pure-Python, 2/3-compatible IndexedSet
type that is not only an ordered set, but also supports indexing (as with lists).
Simply pip install boltons
(or copy setutils.py
into your codebase), import the IndexedSet
and:
>>> from boltons.setutils import IndexedSet
>>> x = IndexedSet(list(range(4)) + list(range(8)))
>>> x
IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
>>> x - set(range(2))
IndexedSet([2, 3, 4, 5, 6, 7])
>>> x[-1]
7
>>> fcr = IndexedSet('freecreditreport.com')
>>> ''.join(fcr[:fcr.index('.')])
'frecditpo'
Everything is unique and retained in order. Full disclosure: I wrote the IndexedSet
, but that also means you can bug me if there are any issues.
If you're using the ordered set to maintain a sorted order, consider using a sorted set implementation from PyPI. The sortedcontainers module provides a SortedSet for just this purpose. Some benefits: pure-Python, fast-as-C implementations, 100% unit test coverage, hours of stress testing.
Installing from PyPI is easy with pip:
pip install sortedcontainers
Note that if you can't pip install
, simply pull down the sortedlist.py and sortedset.py files from the open-source repository.
Once installed you can simply:
from sortedcontainers import SortedSet
help(SortedSet)
The sortedcontainers module also maintains a performance comparison with several alternative implementations.
For the comment that asked about Python's bag data type, there's alternatively a SortedList data type which can be used to efficiently implement a bag.
SortedSet
class there requires members to be comparable and hashable. –
Wakashan set
and frozenset
also require elements to be hashable. The comparable constraint is the addition for SortedSet
, but it's also an obvious constraint. –
Emersed As other answers mention, as for python 3.7+, the dict
is ordered by definition. Instead of subclassing OrderedDict
we can subclass abc.collections.MutableSet
or typing.MutableSet
using the dict
's keys to store our values.
import typing
T = typing.TypeVar("T")
class OrderedSet(typing.MutableSet[T]):
"""A set that preserves insertion order by internally using a dict."""
def __init__(self, iterable: typing.Iterable[T]):
self._d = dict.fromkeys(iterable)
def add(self, x: T) -> None:
self._d[x] = None
def discard(self, x: T) -> None:
self._d.pop(x, None)
def __contains__(self, x: object) -> bool:
return self._d.__contains__(x)
def __len__(self) -> int:
return self._d.__len__()
def __iter__(self) -> typing.Iterator[T]:
return self._d.__iter__()
def __str__(self):
return f"{{{', '.join(str(i) for i in self)}}}"
def __repr__(self):
return f"<OrderedSet {self}>"
Then just:
x = OrderedSet([1, 2, -1, "bar"])
x.add(0)
assert list(x) == [1, 2, -1, "bar", 0]
I added this code, with some tests, in a small library, so anyone can just pip install
it.
discard
should never ever raise a KeyError
. Also note that this doesn't provide a sensible __repr__
–
Mystagogue In case you're already using pandas in your code, its Index
object behaves pretty like an ordered set, as shown in this article.
Examples from the article:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
indA & indB # intersection
indA | indB # union
indA - indB # difference
indA ^ indB # symmetric difference
indA.difference(indB)
, the minus sign performs standard subtraction –
Hornsby pd.Index
allows for duplicate elements, which one would not expect from an actual Python set
. –
Busch A little late to the game, but I've written a class setlist
as part of collections-extended
that fully implements both Sequence
and Set
>>> from collections_extended import setlist
>>> sl = setlist('abracadabra')
>>> sl
setlist(('a', 'b', 'r', 'c', 'd'))
>>> sl[3]
'c'
>>> sl[-1]
'd'
>>> 'r' in sl # testing for inclusion is fast
True
>>> sl.index('d') # so is finding the index of an element
4
>>> sl.insert(1, 'd') # inserting an element already in raises a ValueError
ValueError
>>> sl.index('d')
4
GitHub: https://github.com/mlenzen/collections-extended
Documentation: http://collections-extended.lenzm.net/en/latest/
There's no OrderedSet
in official library.
I make an exhaustive cheatsheet of all the data structure for your reference.
DataStructure = {
'Collections': {
'Map': [
('dict', 'OrderDict', 'defaultdict'),
('chainmap', 'types.MappingProxyType')
],
'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
},
'Sequence': {
'Basic': ['list', 'tuple', 'iterator']
},
'Algorithm': {
'Priority': ['heapq', 'queue.PriorityQueue'],
'Queue': ['queue.Queue', 'multiprocessing.Queue'],
'Stack': ['collection.deque', 'queue.LifeQueue']
},
'text_sequence': ['str', 'byte', 'bytearray']
}
As others have said, OrderedDict
is a superset of an ordered set in terms of functionality, but if you need a set for interacting with an API and don't need it to be mutable, OrderedDict.keys()
is actually an implementation abc.collections.Set
:
import random
from collections import OrderedDict, abc
a = list(range(0, 100))
random.shuffle(a)
# True
a == list(OrderedDict((i, 0) for i in a).keys())
# True
isinstance(OrderedDict().keys(), abc.Set)
The caveats are immutability and having to build up the set like a dict, but it's simple and only uses built-ins.
The ParallelRegression package provides a setList( ) ordered set class that is more method-complete than the options based on the ActiveState recipe. It supports all methods available for lists and most if not all methods available for sets.
There is a pip library that does this:
pip install ordered-set
Then you can use it:
from ordered_set import OrderedSet
Just use pd.unique
from pandas
- does exactly what you need!
>>> import pandas as pd
>>> pd.unique([3, 1, 4, 5, 2, 2])
array([3, 1, 4, 5, 2])
This answer is for completeness sake. If the length of your set
is small, and your code is single-threaded, a list
could work just fine as it's implicitly ordered.
if not new_item in my_list:
my_list.append(new_item)
If using this approach:
- To append or remove an item, first check for presence as in the code above.
- To compare equality, use
set(my_list)
.
Checking for presence in a list of course has an O(n) complexity, but this could be acceptable for a small list, especially if high performance isn't required.
For many purposes simply calling sorted will suffice. For example
>>> s = set([0, 1, 2, 99, 4, 40, 3, 20, 24, 100, 60])
>>> sorted(s)
[0, 1, 2, 3, 4, 20, 24, 40, 60, 99, 100]
If you are going to use this repeatedly, there will be overhead incurred by calling the sorted function so you might want to save the resulting list, as long as you're done changing the set. If you need to maintain unique elements and sorted, I agree with the suggestion of using OrderedDict from collections with an arbitrary value such as None.
© 2022 - 2024 — McMap. All rights reserved.
collections.Counter
is Python's bag. – Jamikajamildict
is now insertion-ordered (guaranteed since Python 3.7) – AnnatesOrderedSet
to the standardcollections
library: discuss.python.org/t/add-orderedset-to-stdlib/12730. – Kovno