In python, how do I take the highest occurrence of something in a list, and sort it that way?
Asked Answered
D

5

4
[3, 3, 3, 4, 4, 2]

Would be:

[ (3, 3), (4, 2), (2, 1) ]

The output should be sorted by highest count first to lowest count. In this case, 3 to 2 to 1.

Dominate answered 25/1, 2011 at 9:6 Comment(6)
Do the groups all occur together or could you have [3,4,2,3,4,3] and expect the same output? Does the order of the output list matter in any way?Verisimilitude
In your example, the output would still be the same.Dominate
Appears to be a duplicate. The accepted answer there directly gives you working code for this, in any case: freq_sorted([3, 3, 3, 4, 4, 2], include_freq=True) gives the exact result you ask for.Haemocyte
At 855 Questions, I'm beginning to wonder if you ever try anything yourself first or are you trying to crowd-source a software project...Kamila
@John actually I like this kind of questions, I hope to see thousands more from TIMEX.Rurik
Not only is this a duplicate. It's also homework.Tomboy
R
3
data = [3, 3, 3, 4, 4, 2]
result = []
for entry in set(data):
    result.append((entry, data.count(entry)))
result.sort(key = lambda x: -x[1])
print result

>>[(3, 3), (4, 2), (2, 1)]
Rurik answered 25/1, 2011 at 9:15 Comment(6)
The output should be sorted by highest count first to lowest count.Dominate
so print sorted(result, key=lambda x: -x[1]) instead of print resultStoneham
and result = sorted(result, key = lambda x: -x[1]) can be written as result.sort(key = lambda x: -x[1])Stoneham
See O(n**2) comments at #1829970Haemocyte
@TIMEX: Yes, it's a loop bounded by O(n) (data.count) inside a loop bounded by O(n) (the outer for loop). In other words, each item has to do something to every other item, or n*n, which is n**2.Haemocyte
The solution can be made more succinct. Rather than initiating result as an empty list and using a for loop with append, one can use list comprehension: result = [(entry, data.count(entry)) for entry in set(data)] (cf. docs.python.org/2/tutorial/datastructures.html, section 5.1.4).Luannaluanne
G
13

You can use a Counter in Python 2.7+ (this recipe works on 2.5+):

from collections import Counter
print Counter([3, 3, 3, 4, 4, 2]).most_common()
# [(3, 3), (4, 2), (2, 1)]
Grandam answered 25/1, 2011 at 9:18 Comment(2)
TIMEX, haven't you heard that Python comes with 'batteries included'?Verisimilitude
But Counter is a new collection class.:) This is correct answer if on 2.7 or 3.xGranule
R
3
data = [3, 3, 3, 4, 4, 2]
result = []
for entry in set(data):
    result.append((entry, data.count(entry)))
result.sort(key = lambda x: -x[1])
print result

>>[(3, 3), (4, 2), (2, 1)]
Rurik answered 25/1, 2011 at 9:15 Comment(6)
The output should be sorted by highest count first to lowest count.Dominate
so print sorted(result, key=lambda x: -x[1]) instead of print resultStoneham
and result = sorted(result, key = lambda x: -x[1]) can be written as result.sort(key = lambda x: -x[1])Stoneham
See O(n**2) comments at #1829970Haemocyte
@TIMEX: Yes, it's a loop bounded by O(n) (data.count) inside a loop bounded by O(n) (the outer for loop). In other words, each item has to do something to every other item, or n*n, which is n**2.Haemocyte
The solution can be made more succinct. Rather than initiating result as an empty list and using a for loop with append, one can use list comprehension: result = [(entry, data.count(entry)) for entry in set(data)] (cf. docs.python.org/2/tutorial/datastructures.html, section 5.1.4).Luannaluanne
V
2

Try using a collections.Counter:

from collections import Counter
data = [3,4,2,3,4,3]
Counter(data).most_common()
Verisimilitude answered 25/1, 2011 at 9:19 Comment(1)
Which version of Python are you using? I think Counter was introduced in 2.6.Verisimilitude
J
2

Why would you choose an O(n**2) algorithm to do this. The alternative to Counter (if you have <2.7) is not too difficult

>>> from operator import itemgetter
>>> from collections import defaultdict
>>> L=[3, 3, 3, 4, 4, 2]
>>> D=defaultdict(int)
>>> for i in L:
...     D[i]+=1
... 
>>> sorted(D.items(), key=itemgetter(1), reverse=True)
[(3, 3), (4, 2), (2, 1)]
Jobie answered 25/1, 2011 at 10:16 Comment(0)
G
0
def myfun(x,y):
    return x[1]-y[1]

list1 = [3, 3, 3, 4, 4, 2]
s1 = set(list1)
newlist = []
for e in s1:
    newlist.append((e,list1.count(e)))
print sorted(newlist,cmp=myfun)

I think, this is what you asked for. Sorry for hurry with the first answer. But just note that cmp argument for sorted is not available in python3

Granule answered 25/1, 2011 at 9:13 Comment(1)
See O(n**2) comments at #1829970.Haemocyte

© 2022 - 2024 — McMap. All rights reserved.