Sorting list according to corresponding values from a parallel list [duplicate]
Asked Answered
K

20

640

I have a list of strings like this:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,   0,   1,   2,   2,   0,   1 ]

What is the shortest way of sorting X using values from Y to get the following output?

["a", "d", "h", "b", "c", "e", "i", "f", "g"]

The order of the elements having the same "key" does not matter. I can resort to the use of for constructs but I am curious if there is a shorter way. Any suggestions?

Kalimantan answered 7/7, 2011 at 23:56 Comment(3)
The answer of riza might be useful when plotting data, since zip(*sorted(zip(X, Y), key=lambda pair: pair[0])) returns both the sorted X and Y sorted with values of X.Materialize
More general case (sort list Y by any key instead of the default order)Jagged
Though it might not be obvious, this is exactly equivalent to sorting Y, and rearranging X in the same way that Y was sorted. I had both of these questions in my saves for quite some time - and anguished over them, because something seemed not quite right - until today when I realized the duplication (after working the other one to make it clearer and improve the title).Bulbul
F
846

Shortest Code

[x for _, x in sorted(zip(Y, X))]

Example:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

Z = [x for _,x in sorted(zip(Y,X))]
print(Z)  # ["a", "d", "h", "b", "c", "e", "i", "f", "g"]

Generally Speaking

[x for _, x in sorted(zip(Y, X), key=lambda pair: pair[0])]

Explained:

  1. zip the two lists.
  2. create a new, sorted list based on the zip using sorted().
  3. using a list comprehension extract the first elements of each pair from the sorted, zipped list.

For more information on how to set\use the key parameter as well as the sorted function in general, take a look at this.


Filiform answered 8/7, 2011 at 0:2 Comment(8)
This is correct, but I'll add the note that if you're trying to sort multiple arrays by the same array, this won't neccessarily work as expected, since the key that is being used to sort is (y,x), not just y. You should instead use [x for (y,x) in sorted(zip(Y,X), key=lambda pair: pair[0])]Rossanarosse
good solution! But it should be: The list is ordered regarding the first element of the pairs, and the comprehension extracts the 'second' element of the pairs.Maryannmaryanna
This solution is poor when it comes to storage. An in-place sort is preferred whenever possible.Psoriasis
@Psoriasis interesting, could you point to a reference on how to achieve that?Wendiwendie
@Wendiwendie I recommend using Quicksort or an in-place merge sort implementation. Once you have that, define your own comparison function which compares values based on the indexes of list Y. The end result should be list Y being untouched and list X being changed into the expected solution without ever having to create a temp list.Psoriasis
Z = [e[1] for e in sorted(zip(Y,X))] is just as good, and, at least for me, it is easier to understand.Overtone
The "Shortest Code" method does not work if list X contains objects. The "general speaking" method should be provided first. For reference, a shorter general method; ` Z = sorted(X, key=lambda x: Y[X.index(x)])`Cerebritis
Small rant: how is it possible that there is not some built in function that just takes two lists, one to be sorted and the other as the key. If you can call sorted() on a generic list and it is able to figure out how to sort it regardless the data type of the elements, then why isn't there some syntax that is my_list = sorted(my_list, some_other_list)? IMHO all these solutions, although elegant and pythonic, feel like a workaround around a problem that should not be there in the first place.Patter
R
147

Zip the two lists together, sort it, then take the parts you want:

>>> yx = zip(Y, X)
>>> yx
[(0, 'a'), (1, 'b'), (1, 'c'), (0, 'd'), (1, 'e'), (2, 'f'), (2, 'g'), (0, 'h'), (1, 'i')]
>>> yx.sort()
>>> yx
[(0, 'a'), (0, 'd'), (0, 'h'), (1, 'b'), (1, 'c'), (1, 'e'), (1, 'i'), (2, 'f'), (2, 'g')]
>>> x_sorted = [x for y, x in yx]
>>> x_sorted
['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

Combine these together to get:

[x for y, x in sorted(zip(Y, X))]
Reichard answered 8/7, 2011 at 0:3 Comment(3)
This is fine if X is a list of str, but be careful if there is a possibility that < is not defined for some pairs of items in X, eg - if some of them were NoneFidele
When we try to use sort over a zip object, AttributeError: 'zip' object has no attribute 'sort' is what I am getting as of now.Enchantment
You are using Python 3. In Python 2, zip produced a list. Now it produces an iterable object. sorted(zip(...)) should still work, or: them = list(zip(...)); them.sort()Reichard
G
136

Also, if you don't mind using numpy arrays (or in fact already are dealing with numpy arrays...), here is another nice solution:

people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]

import numpy
people = numpy.array(people)
ages = numpy.array(ages)
inds = ages.argsort()
sortedPeople = people[inds]

I found it here: http://scienceoss.com/sort-one-list-by-another-list/

Geophyte answered 12/1, 2014 at 16:18 Comment(3)
For bigger arrays / vectors, this solution with numpy is beneficial!Maryannmaryanna
If they are already numpy arrays, then it's simply sortedArray1= array1[array2.argsort()]. And this also makes it easy to sort multiple lists by a particular column of a 2D array: e.g. sortedArray1= array1[array2[:,2].argsort()] to sort array1 (which may have multiple columns) by the values in the third column of array2.Pantaloon
@Geophyte Jim + Pam = Jam!Lambard
T
49

The most obvious solution to me is to use the key keyword arg.

>>> X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
>>> Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]
>>> keydict = dict(zip(X, Y))
>>> X.sort(key=keydict.get)
>>> X
['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

Note that you can shorten this to a one-liner if you care to:

>>> X.sort(key=dict(zip(X, Y)).get)

As Wenmin Mu and Jack Peng have pointed out, this assumes that the values in X are all distinct. That's easily managed with an index list:

>>> Z = ["A", "A", "C", "C", "C", "F", "G", "H", "I"]
>>> Z_index = list(range(len(Z)))
>>> Z_index.sort(key=keydict.get)
>>> Z = [Z[i] for i in Z_index]
>>> Z
['A', 'C', 'H', 'A', 'C', 'C', 'I', 'F', 'G']

Since the decorate-sort-undecorate approach described by Whatang is a little simpler and works in all cases, it's probably better most of the time. (This is a very old answer!)

Trotyl answered 8/7, 2011 at 0:2 Comment(1)
Does this require that the values in X are unqiue?Picoline
W
40

more_itertools has a tool for sorting iterables in parallel:

Given

from more_itertools import sort_together


X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

Demo

sort_together([Y, X])[1]
# ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')
Wobbling answered 4/8, 2017 at 19:53 Comment(2)
I like this because I can do multiple lists with one index sort_together([Index,X,Y,Z])Rubefaction
Oh, ignore, I can do sorted(zip(Index,X,Y,Z)) too.Rubefaction
G
27

I actually came here looking to sort a list by a list where the values matched.

list_a = ['foo', 'bar', 'baz']
list_b = ['baz', 'bar', 'foo']
sorted(list_b, key=lambda x: list_a.index(x))
# ['foo', 'bar', 'baz']
Goblin answered 17/8, 2018 at 23:7 Comment(3)
This is a bad idea. index will perform an O(N) search on list_a resulting in an O(N² log N) sort.Community
@Richard: the keys are computed once before sorting; so the complexity is actually O(N^2).Councilwoman
@Councilwoman true, but still a bad idea.Surculose
S
18

Another alternative, combining several of the answers.

zip(*sorted(zip(Y,X)))[1]

In order to work for python3:

list(zip(*sorted(zip(B,A))))[1]
Suasion answered 15/10, 2013 at 13:21 Comment(1)
original: stackoverflow.com/a/6620187Cindacindee
C
16

I like having a list of sorted indices. That way, I can sort any list in the same order as the source list. Once you have a list of sorted indices, a simple list comprehension will do the trick:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

sorted_y_idx_list = sorted(range(len(Y)),key=lambda x:Y[x])
Xs = [X[i] for i in sorted_y_idx_list ]

print( "Xs:", Xs )
# prints: Xs: ["a", "d", "h", "b", "c", "e", "i", "f", "g"]

Note that the sorted index list can also be gotten using numpy.argsort().

Churchy answered 15/3, 2015 at 21:47 Comment(1)
Do you know if there is a way to sort multiple lists at once by one sorted index list? Something like this? X1= ["a", "b", "c", "d", "e", "f", "g", "h", "i"] X2 = ["a", "b", "c", "d", "e", "f", "g", "h", "i"] X1s, X2s = [X1[i], X2[i] for i in sorted_y_idx_list ]Czarism
C
8

zip, sort by the second column, return the first column.

zip(*sorted(zip(X,Y), key=operator.itemgetter(1)))[0]
Cheriecherilyn answered 8/7, 2011 at 5:7 Comment(5)
Note: the key=operator.itemgetter(1) solves the duplicate issueGrammalogue
zip is not subscriptable... you must actually use list(zip(*sorted(zip(X,Y), key=operator.itemgetter(1))))[0]Bespread
@Grammalogue what duplicate issue?Bechtel
If there is more than one matching it gets the firstGrammalogue
also see https://mcmap.net/q/64163/-transpose-unzip-function-inverse-of-zipCindacindee
S
5

This is an old question but some of the answers I see posted don't actually work because zip is not scriptable. Other answers didn't bother to import operator and provide more info about this module and its benefits here.

There are at least two good idioms for this problem. Starting with the example input you provided:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,   0,   1,   2,   2,   0,   1 ]

Using the "Decorate-Sort-Undecorate" idiom

This is also known as the Schwartzian_transform after R. Schwartz who popularized this pattern in Perl in the 90s:

# Zip (decorate), sort and unzip (undecorate).
# Converting to list to script the output and extract X
list(zip(*(sorted(zip(Y,X)))))[1]                                                                                                                       
# Results in: ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')

Note that in this case Y and X are sorted and compared lexicographically. That is, the first items (from Y) are compared; and if they are the same then the second items (from X) are compared, and so on. This can create unstable outputs unless you include the original list indices for the lexicographic ordering to keep duplicates in their original order.

Using the operator module

This gives you more direct control over how to sort the input, so you can get sorting stability by simply stating the specific key to sort by. See more examples here.

import operator    

# Sort by Y (1) and extract X [0]
list(zip(*sorted(zip(X,Y), key=operator.itemgetter(1))))[0]                                                                                                 
# Results in: ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')
Simons answered 15/5, 2020 at 16:30 Comment(1)
I think in most cases I would just use lambda x: x[1] instead of operator.itemgetter(1), since it easier to understand and doesn't require an additional package. Is there an advantage to using operator.itemgetter?Hennie
S
2

You can create a pandas Series, using the primary list as data and the other list as index, and then just sort by the index:

import pandas as pd
pd.Series(data=X,index=Y).sort_index().tolist()

output:

['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']
Swallow answered 7/1, 2018 at 23:53 Comment(0)
V
2

Here is Whatangs answer if you want to get both sorted lists (python3).

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

Zx, Zy = zip(*[(x, y) for x, y in sorted(zip(Y, X))])

print(list(Zx))  # [0, 0, 0, 1, 1, 1, 1, 2, 2]
print(list(Zy))  # ['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

Just remember Zx and Zy are tuples. I am also wandering if there is a better way to do that.

Warning: If you run it with empty lists it crashes.

Villatoro answered 2/3, 2018 at 15:57 Comment(0)
L
2

I have created a more general function, that sorts more than two lists based on another one, inspired by @Whatang's answer.

def parallel_sort(*lists):
    """
    Sorts the given lists, based on the first one.
    :param lists: lists to be sorted

    :return: a tuple containing the sorted lists
    """

    # Create the initially empty lists to later store the sorted items
    sorted_lists = tuple([] for _ in range(len(lists)))

    # Unpack the lists, sort them, zip them and iterate over them
    for t in sorted(zip(*lists)):
        # list items are now sorted based on the first list
        for i, item in enumerate(t):    # for each item...
            sorted_lists[i].append(item)  # ...store it in the appropriate list

    return sorted_lists
Lamella answered 26/3, 2018 at 15:12 Comment(0)
O
2
X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,   0,   1,   2,   2,   0,   1 ]

You can do so in one line:

X, Y = zip(*sorted(zip(Y, X)))
Oblong answered 9/4, 2021 at 17:48 Comment(1)
The previous answer is sorting B using values from A. It's correct but misleading. I fixed it, thank you for reminding.Oblong
S
2

Most of the solutions above are complicated and I think they will not work if the lists are of different lengths or do not contain the exact same items. The solution below is simple and does not require any imports.

list1 = ['B', 'A', 'C']  # Required sort order
list2 = ['C', 'B']       # Items to be sorted according to list1

result = list1
for item in list1:
    if item not in list2: result.remove(item)

print(result)

Output:

['B', 'C']
  • Note: Any item not in list1 will be ignored since the algorithm will not know what's the sort order to use.
Sheilahshekel answered 24/11, 2022 at 7:53 Comment(2)
You posted your solution two times. Maybe you can delete one of them. In addition, the proposed solution won't work for the initial question as the lists X and Y contain different entries.Clearcole
That's right but the solutions use completely different methods which could be used for different applications. If you already have a df...why converting it to a list, process it, then convert to df again? you can leverage that solution directly in your existing df. The second one is easier and faster if you're not using Pandas in your program. As for won't work..that's right because he posted the wrong question in the title when he talked about lists. His title should have been 'How to sort a dictionary?'. People will search this post looking to sort lists not dictionaries. Thanks.Sheilahshekel
L
1

A quick one-liner.

list_a = [5,4,3,2,1]
list_b = [1,1.5,1.75,2,3,3.5,3.75,4,5]

Say you want list a to match list b.

orderedList =  sorted(list_a, key=lambda x: list_b.index(x))

This is helpful when needing to order a smaller list to values in larger. Assuming that the larger list contains all values in the smaller list, it can be done.

Langan answered 9/1, 2018 at 20:49 Comment(2)
This does not solve the OP’s question. Did you try it with the sample lists X and Y?Akkadian
This is a bad idea. index will perform an O(N) search on list_b resulting in an O(N² log N) sort.Community
D
1

This function should work for arrays.

def sortBoth(x,y,reverse=False):
    '''
    Sort both x and y, according to x. 
    '''
    xy_sorted=array(sorted(zip(x,y),reverse=reverse)).T
    return xy_sorted[0],xy_sorted[1]
Dianadiandra answered 25/4, 2022 at 9:14 Comment(0)
S
1

I think most of the solutions above will not work if the 2 lists are of different sizes or contain different items. The solution below is simple and should fix those issues:

import pandas as pd

list1 = ['B', 'A', 'C']  # Required sort order
list2 = ['C', 'A']       # Items to be sorted according to list1

result = pd.merge(pd.DataFrame(list1), pd.DataFrame(list2))
print(list(result[0]))

output:

['A', 'C']
  • Note: Any item not in list1 will be ignored since the algorithm will not know what's the sort order to use.
Sheilahshekel answered 24/11, 2022 at 6:10 Comment(0)
J
0
list1 = ['a','b','c','d','e','f','g','h','i']
list2 = [0,1,1,0,1,2,2,0,1]

output=[]
cur_loclist = []

To get unique values present in list2

list_set = set(list2)

To find the loc of the index in list2

list_str = ''.join(str(s) for s in list2)

Location of index in list2 is tracked using cur_loclist

[0, 3, 7, 1, 2, 4, 8, 5, 6]

for i in list_set:
cur_loc = list_str.find(str(i))

while cur_loc >= 0:
    cur_loclist.append(cur_loc)
    cur_loc = list_str.find(str(i),cur_loc+1)

print(cur_loclist)

for i in range(0,len(cur_loclist)):
output.append(list1[cur_loclist[i]])
print(output)
Jena answered 16/2, 2018 at 5:3 Comment(0)
S
-2

I think that the title of the original question is not accurate. If you have 2 lists of identical number of items and where every item in list 1 is related to list 2 in the same order (e.g a = 0 , b = 1, etc.) then the question should be 'How to sort a dictionary?', not 'How to sorting list based on values from another list?'. The solution below is the most efficient in this case:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,   0,   1,   2,   2,   0,   1 ]

dict1 = dict(zip(X,Y))
result = sorted(dict1, key=dict1.get)
print(result)

Result:

['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']
Sheilahshekel answered 30/11, 2022 at 16:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.