How to sort a list/tuple of lists/tuples by the element at a given index
Asked Answered
C

11

912

I have some data either in a list of lists or a list of tuples, like this:

data = [[1,2,3], [4,5,6], [7,8,9]]
data = [(1,2,3), (4,5,6), (7,8,9)]

And I want to sort by the 2nd element in the subset. Meaning, sorting by 2,5,8 where 2 is from (1,2,3), 5 is from (4,5,6). What is the common way to do this? Should I store tuples or lists in my list?

Commentate answered 25/6, 2010 at 23:1 Comment(1)
With regard to "Should I store tuples or lists in my list?", a rule of thumb is to make things as immutable as possible. If you don't need to modify the sublists in place, make them tuples.Godhead
T
1500
sorted_by_second = sorted(data, key=lambda tup: tup[1])

or:

data.sort(key=lambda tup: tup[1])  # sorts in place

The default sort mode is ascending. To sort in descending order use the option reverse=True:

sorted_by_second = sorted(data, key=lambda tup: tup[1], reverse=True)

or:

data.sort(key=lambda tup: tup[1], reverse=True)  # sorts in place
Tenerife answered 25/6, 2010 at 23:4 Comment(8)
Any idea how to sort it bigger to smaller?Luxuriate
@Luxuriate : help(sorted). reverse=True.Tenerife
@Tenerife using itemgetter is faster and simpler: key=itemgetter(1) and at the beginning of the file: from operator import itemgetterClipboard
@Cemre as for the second example, sort here is a method of List object of Python, which receives a lambda function as its key parameter. You may name it as tup, or t, or whatever you like and it'll still work. tup here specifies index of the list's tuple, so 1 means that sorting will be performed by the second values of tuples from the original list (2, 5, 8).Klos
I was mildly sceptical of the unsubstantiated claim that "using itemgetter is faster and simpler." While I subjectively regard the intuitive lambda approach to be simpler than the unintuitive itemgetter class, itemgetter does indeed appear to be faster. I'm curious as to why this is. My crude suspicion is that a lambda incurs the hidden cost of capturing all local variables into a closure context, whereas an itemgetter instance does not. tl;dr: Always use itemgetter, because speed wins.Packhorse
I have done a more thorough benchmark between lambda and itemgetter used in sort here. itemgetter is always faster than lambda.Scilla
What if have tuple [(2,'John'), (1, 'Simon'), (3, 'Rober')] and need to sort on - key1 ascending and key2 descending. Thanks.Acrophobia
First sort on key2, then on key1.Arly
R
310
from operator import itemgetter
data.sort(key=itemgetter(1))
Relevance answered 11/11, 2013 at 8:18 Comment(2)
This should be the accepted answer. See also Charlie's posted timings, demonstrating the itemgetter class to sort 126% faster on average than the equivalent lambda function.Packhorse
You can also sort by multiple indices hierarchically, e.g. data.sort(key=itemgetter(3,1))Lan
D
92

For sorting by multiple criteria, namely for instance by the second and third elements in a tuple, let

data = [(1,2,3),(1,2,1),(1,1,4)]

and so define a lambda that returns a tuple that describes priority, for instance

sorted(data, key=lambda tup: (tup[1],tup[2]) )
[(1, 1, 4), (1, 2, 1), (1, 2, 3)]
Demonstration answered 19/12, 2015 at 21:27 Comment(0)
T
67

I just want to add to Stephen's answer if you want to sort the array from high to low, another way other than in the comments above is just to add this to the line:

reverse = True

and the result will be as follows:

data.sort(key=lambda tup: tup[1], reverse=True)
Tenacious answered 18/11, 2014 at 18:53 Comment(0)
K
30

Stephen's answer is the one I'd use. For completeness, here's the DSU (decorate-sort-undecorate) pattern with list comprehensions:

decorated = [(tup[1], tup) for tup in data]
decorated.sort()
undecorated = [tup for second, tup in decorated]

Or, more tersely:

[b for a,b in sorted((tup[1], tup) for tup in data)]

As noted in the Python Sorting HowTo, this has been unnecessary since Python 2.4, when key functions became available.

Killer answered 25/6, 2010 at 23:44 Comment(1)
So this answer is useful for Python 2.3-? Are there any valid uses in more-current Python versions around which you might elaborate a bit? If not, no bother...was just passing by, saw this and the old noggin got to churning just a wee bit. Anyway, cheers and thanks for this walk back into the earlier days of Python.Elevation
S
29

In order to sort a list of tuples (<word>, <count>), for count in descending order and word in alphabetical order:

data = [
('betty', 1),
('bought', 1),
('a', 1),
('bit', 1),
('of', 1),
('butter', 2),
('but', 1),
('the', 1),
('was', 1),
('bitter', 1)]

I use this method:

sorted(data, key=lambda tup:(-tup[1], tup[0]))

and it gives me the result:

[('butter', 2),
('a', 1),
('betty', 1),
('bit', 1),
('bitter', 1),
('bought', 1),
('but', 1),
('of', 1),
('the', 1),
('was', 1)]
Sb answered 15/2, 2017 at 5:17 Comment(1)
what if tup[1] is a string?Thermistor
S
16

Without lambda:

def sec_elem(s):
    return s[1]

sorted(data, key=sec_elem)
Suilmann answered 11/7, 2016 at 10:42 Comment(0)
D
11

itemgetter() is somewhat faster than lambda tup: tup[1], but the increase is relatively modest (around 10 to 25 percent).

(IPython session)

>>> from operator import itemgetter
>>> from numpy.random import randint
>>> values = randint(0, 9, 30000).reshape((10000,3))
>>> tpls = [tuple(values[i,:]) for i in range(len(values))]

>>> tpls[:5]    # display sample from list
[(1, 0, 0), 
 (8, 5, 5), 
 (5, 4, 0), 
 (5, 7, 7), 
 (4, 2, 1)]

>>> sorted(tpls[:5], key=itemgetter(1))    # example sort
[(1, 0, 0), 
 (4, 2, 1), 
 (5, 4, 0), 
 (8, 5, 5), 
 (5, 7, 7)]

>>> %timeit sorted(tpls, key=itemgetter(1))
100 loops, best of 3: 4.89 ms per loop

>>> %timeit sorted(tpls, key=lambda tup: tup[1])
100 loops, best of 3: 6.39 ms per loop

>>> %timeit sorted(tpls, key=(itemgetter(1,0)))
100 loops, best of 3: 16.1 ms per loop

>>> %timeit sorted(tpls, key=lambda tup: (tup[1], tup[0]))
100 loops, best of 3: 17.1 ms per loop
Dato answered 27/9, 2016 at 5:11 Comment(1)
Please see the itemgetter sorting solution for varying reverse arguments for multiple columns here, you then need to arrange your sorting in multiple steps in a row: #14466568Chaechaeronea
C
7

@Stephen 's answer is to the point! Here is an example for better visualization,

Shout out for the Ready Player One fans! =)

>>> gunters = [('2044-04-05', 'parzival'), ('2044-04-07', 'aech'), ('2044-04-06', 'art3mis')]
>>> gunters.sort(key=lambda tup: tup[0])
>>> print gunters
[('2044-04-05', 'parzival'), ('2044-04-06', 'art3mis'), ('2044-04-07', 'aech')]

key is a function that will be called to transform the collection's items for comparison.. like compareTo method in Java.

The parameter passed to key must be something that is callable. Here, the use of lambda creates an anonymous function (which is a callable).
The syntax of lambda is the word lambda followed by a iterable name then a single block of code.

Below example, we are sorting a list of tuple that holds the info abt time of certain event and actor name.

We are sorting this list by time of event occurrence - which is the 0th element of a tuple.

Note - s.sort([cmp[, key[, reverse]]]) sorts the items of s in place

Caesar answered 25/5, 2017 at 18:25 Comment(0)
E
1

I use this in my code:

#To sort the list based on each element's second integer (elem[1])
sorted(d2, key=lambda elem: elem[1])

Depending on which element you want to sort it by you can put it in the

(elem[*insert the index of the element you are sorting it by*])
Effulgent answered 9/3, 2021 at 0:7 Comment(1)
sorted creates new list. To do in-place sorting use .sort(key=...)Mairamaire
P
-4

Sorting a tuple is quite simple:

tuple(sorted(t))
Propolis answered 4/3, 2014 at 3:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.