In Python, which data structure is more efficient/speedy? Assuming that order is not important to me and I would be checking for duplicates anyway, is a Python set slower than a Python list?
It depends on what you are intending to do with it.
Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s
), but its elements are not ordered so you cannot access items by index as you would in a list. Sets are also somewhat slower to iterate over in practice.
You can use the timeit module to see which is faster for your situation.
Lists are slightly faster than sets when you just want to iterate over the values.
Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.
It turns out tuples perform in almost exactly the same way as lists, except for their immutability.
Iterating
>>> def iter_test(iterable):
... for i in iterable:
... pass
...
>>> from timeit import timeit
>>> timeit(
... "iter_test(iterable)",
... setup="from __main__ import iter_test; iterable = set(range(10000))",
... number=100000)
12.666952133178711
>>> timeit(
... "iter_test(iterable)",
... setup="from __main__ import iter_test; iterable = list(range(10000))",
... number=100000)
9.917098999023438
>>> timeit(
... "iter_test(iterable)",
... setup="from __main__ import iter_test; iterable = tuple(range(10000))",
... number=100000)
9.865639209747314
Determine if an object is present
>>> def in_test(iterable):
... for i in range(1000):
... if i in iterable:
... pass
...
>>> from timeit import timeit
>>> timeit(
... "in_test(iterable)",
... setup="from __main__ import in_test; iterable = set(range(1000))",
... number=10000)
0.5591847896575928
>>> timeit(
... "in_test(iterable)",
... setup="from __main__ import in_test; iterable = list(range(1000))",
... number=10000)
50.18339991569519
>>> timeit(
... "in_test(iterable)",
... setup="from __main__ import in_test; iterable = tuple(range(1000))",
... number=10000)
51.597304821014404
Set
wins due to near instant 'contains' checks: https://en.wikipedia.org/wiki/Hash_table
List implementation: usually an array, low level close to the metal good for iteration and random access by element index.
Set implementation: https://en.wikipedia.org/wiki/Hash_table, it does not iterate on a list, but finds the element by computing a hash from the key, so it depends on the nature of the key elements and the hash function. Similar to what is used for dict. I suspect list
could be faster if you have very few elements (< 5), the larger element count the better the set
will perform for a contains check. It is also fast for element addition and removal. Also always keep in mind that building a set has a cost !
NOTE: If the list
is already sorted, searching the list
could be quite fast on small lists, but with more data a set
is faster for contains checks.
list
is already sorted, searching the list
could be quite fast on small lists, but with more data a set
is faster for contains checks." To avoid confusion, you should probably make it clear that sorting only helps if you take advantage of the sorted order with something like the bisect
module; a plain in
check on a list
is O(n)
regardless of whether or not it's sorted, while in
checks on set
are O(1)
. The bisect
module can get the test down to O(log n)
on a pre-sorted list
, but it's more complicated to use than a simple in
check. –
Kinney List performance:
>>> import timeit
>>> timeit.timeit(stmt='10**6 in a', setup='a = list(range(10**6))', number=1000)
15.08
Set performance:
>>> timeit.timeit(stmt='10**6 in a', setup='a = set(range(10**6))', number=1000)
3.90e-05
You may want to consider Tuples as they're similar to lists but can’t be modified. They take up slightly less memory and are faster to access. They aren’t as flexible but are more efficient than lists. Their normal use is to serve as dictionary keys.
Sets are also sequence structures but with two differences from lists and tuples. Although sets do have an order, that order is arbitrary and not under the programmer’s control. The second difference is that the elements in a set must be unique.
set
by definition. [python | wiki].
>>> x = set([1, 1, 2, 2, 3, 3])
>>> x
{1, 2, 3}
set
built-in type link (docs.python.org/2/library/stdtypes.html#set) not the deprecated sets
library. Second, "Sets are also sequence structures", read the following from the built-in type link: "Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior." –
Megathere range
is not list
. range
is a special class with custom __contains__
magic method. –
Surfacetoair xrange
) –
Arrogate timeit
is clearly done using python 3 with the range
object __contains__
method as mentioned by @RyneWang. Updated with apt comparison. –
Acantho tl;dr
Data structures (DS) are important because they are used to perform operations on data which basically implies: take some input, process it, and give back the output.
Some data structures are more useful than others in some particular cases. Therefore, it is quite unfair to ask which (DS) is more efficient/speedy. It is like asking which tool is more efficient between a knife and fork. I mean all depends on the situation.
Lists
A list is mutable sequence, typically used to store collections of homogeneous items.
Sets
A set object is an unordered collection of distinct hashable objects. It is commonly used to test membership, remove duplicates from a sequence, and compute mathematical operations such as intersection, union, difference, and symmetric difference.
Usage
From some of the answers, it is clear that a list is quite faster than a set when iterating over the values. On the other hand, a set is faster than a list when checking if an item is contained within it. Therefore, the only thing you can say is that a list is better than a set for some particular operations and vice-versa.
I was interested in the results when checking, with CPython, if a value is one of a small number of literals. set
wins in Python 3 vs tuple
, list
and or
:
from timeit import timeit
def in_test1():
for i in range(1000):
if i in (314, 628):
pass
def in_test2():
for i in range(1000):
if i in [314, 628]:
pass
def in_test3():
for i in range(1000):
if i in {314, 628}:
pass
def in_test4():
for i in range(1000):
if i == 314 or i == 628:
pass
print("tuple")
print(timeit("in_test1()", setup="from __main__ import in_test1", number=100000))
print("list")
print(timeit("in_test2()", setup="from __main__ import in_test2", number=100000))
print("set")
print(timeit("in_test3()", setup="from __main__ import in_test3", number=100000))
print("or")
print(timeit("in_test4()", setup="from __main__ import in_test4", number=100000))
Output:
tuple
4.735646052286029
list
4.7308746771886945
set
3.5755991376936436
or
4.687681658193469
For 3 to 5 literals, set
still wins by a wide margin, and or
becomes the slowest.
In Python 2, set
is always the slowest. or
is the fastest for 2 to 3 literals, and tuple
and list
are faster with 4 or more literals. I couldn't distinguish the speed of tuple
vs list
.
When the values to test were cached in a global variable out of the function, rather than creating the literal within the loop, set
won every time, even in Python 2.
These results apply to 64-bit CPython on a Core i7.
list
and set
cases would need to be rebuilt on every test (which would destroy their performance), and on older Python (definitely 2.x, not sure if older 3.x omitted the optimization) it does in fact rebuild the set
literal on every pass, making it slower (Python 3 caches it as a constant frozenset
to avoid the work). On both versions, your list
test is actually being optimized to a tuple
constant, so it's identical to the tuple
case. –
Kinney Sets are faster, morover you get more functions with sets, such as lets say you have two sets :
set1 = {"Harry Potter", "James Bond", "Iron Man"}
set2 = {"Captain America", "Black Widow", "Hulk", "Harry Potter", "James Bond"}
We can easily join two sets:
set3 = set1.union(set2)
Find out what is common in both:
set3 = set1.intersection(set2)
Find out what is different in both:
set3 = set1.difference(set2)
And much more! Just try them out, they are fun! Moreover if you have to work on the different values within 2 list or common values within 2 lists, I prefer to convert your lists to sets, and many programmers do in that way. Hope it helps you :-)
I would recommend a Set implementation where the use case is limit to referencing or search for existence and Tuple implementation where the use case requires you to perform iteration. A list is a low-level implementation and requires significant memory overhead.
In the same vein as @Ellis Percival's tests, I'd like to add that lists perform in a similar way to sets when it comes to adding an element.
Adding an element
>>> def add_test_set(iterable):
... for i in range(10000):
... iterable.add(i)
...
>>> def add_test_list(iterable):
... for i in range(10000):
... iterable.append(i)
...
>>> timeit("add_test_set(iterable)",
... setup="from __main__ import add_test_set; iterable = set()",
... number=10000)
7.073143866999999
>>> timeit("add_test_list(iterable)",
... setup="from __main__ import add_test_list; iterable = list()",
... number=10000)
6.80650725000001
(I would have edited his post to include this but the edit queue was full)
from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)
#Source Code
def calc(data, type):
start = datetime.now()
if data in type:
print ""
end = datetime.now()
print end-start
calc(9999, listA)
calc(9999, tupA)
calc(9999, setA)
Output after comparing 10 iterations for all 3 : Comparison
Sets seem very fast to find if a value is present in it or not when compared to Lists.
A = list(range(1, 1000000))
A.pop(84559)
def find_in_set(A):
maxA = max(A)
for i in range(1,maxA+1):
if i not in set(A):
return i
This below find+in+list function takes almost ages to find missing 84559 from list 'A'. However the above function find_in_set takes just couple of seconds.
def find_in_list(A):
maxA = max(A)
for i in range(1,maxA+1):
if i not in A:
return i
Here you can find clear difference to the Sets vs. Lists
© 2022 - 2024 — McMap. All rights reserved.
x in s
), what about construction + search ? It's relevant for instance in this case: which is faster:x in [a, b, c]
orx in {a, b, c}
? I.e. which should be our default go-to when constructing a set just to do a one-time search operation on it (as in an if clause)? – Cyclopedia