In python, how do I take the highest occurrence of something in a list, and sort it that way?

D

5

4

[3, 3, 3, 4, 4, 2]

Would be:

[ (3, 3), (4, 2), (2, 1) ]

The output should be sorted by highest count first to lowest count. In this case, 3 to 2 to 1.

Dominate answered 25/1, 2011 at 9:6 Comment(6)

Do the groups all occur together or could you have [3,4,2,3,4,3] and expect the same output? Does the order of the output list matter in any way? – Verisimilitude 25/1, 2011 at 9:12

In your example, the output would still be the same. – Dominate 25/1, 2011 at 9:13

Appears to be a duplicate. The accepted answer there directly gives you working code for this, in any case: freq_sorted([3, 3, 3, 4, 4, 2], include_freq=True) gives the exact result you ask for. – Haemocyte 25/1, 2011 at 9:18

At 855 Questions, I'm beginning to wonder if you ever try anything yourself first or are you trying to crowd-source a software project... – Kamila 25/1, 2011 at 9:27

@John actually I like this kind of questions, I hope to see thousands more from TIMEX. – Rurik 25/1, 2011 at 9:32

Not only is this a duplicate. It's also homework. – Tomboy 25/1, 2011 at 11:0

R

3

data = [3, 3, 3, 4, 4, 2]
result = []
for entry in set(data):
    result.append((entry, data.count(entry)))
result.sort(key = lambda x: -x[1])
print result

>>[(3, 3), (4, 2), (2, 1)]

Rurik answered 25/1, 2011 at 9:15 Comment(6)

The output should be sorted by highest count first to lowest count. – Dominate 25/1, 2011 at 9:19

so print sorted(result, key=lambda x: -x[1]) instead of print result – Stoneham 25/1, 2011 at 9:22

and result = sorted(result, key = lambda x: -x[1]) can be written as result.sort(key = lambda x: -x[1]) – Stoneham 25/1, 2011 at 9:38

See O(n**2) comments at #1829970 – Haemocyte 25/1, 2011 at 9:41

@TIMEX: Yes, it's a loop bounded by O(n) (data.count) inside a loop bounded by O(n) (the outer for loop). In other words, each item has to do something to every other item, or n*n, which is n**2. – Haemocyte 26/1, 2011 at 1:34

The solution can be made more succinct. Rather than initiating result as an empty list and using a for loop with append, one can use list comprehension: result = [(entry, data.count(entry)) for entry in set(data)] (cf. docs.python.org/2/tutorial/datastructures.html, section 5.1.4). – Luannaluanne 28/8, 2016 at 20:48

G

13

You can use a Counter in Python 2.7+ (this recipe works on 2.5+):

from collections import Counter
print Counter([3, 3, 3, 4, 4, 2]).most_common()
# [(3, 3), (4, 2), (2, 1)]

Grandam answered 25/1, 2011 at 9:18 Comment(2)

TIMEX, haven't you heard that Python comes with 'batteries included'? – Verisimilitude 25/1, 2011 at 9:23

But Counter is a new collection class.:) This is correct answer if on 2.7 or 3.x – Granule 25/1, 2011 at 9:27

R

3

data = [3, 3, 3, 4, 4, 2]
result = []
for entry in set(data):
    result.append((entry, data.count(entry)))
result.sort(key = lambda x: -x[1])
print result

>>[(3, 3), (4, 2), (2, 1)]

Rurik answered 25/1, 2011 at 9:15 Comment(6)

The output should be sorted by highest count first to lowest count. – Dominate 25/1, 2011 at 9:19

so print sorted(result, key=lambda x: -x[1]) instead of print result – Stoneham 25/1, 2011 at 9:22

and result = sorted(result, key = lambda x: -x[1]) can be written as result.sort(key = lambda x: -x[1]) – Stoneham 25/1, 2011 at 9:38

See O(n**2) comments at #1829970 – Haemocyte 25/1, 2011 at 9:41

@TIMEX: Yes, it's a loop bounded by O(n) (data.count) inside a loop bounded by O(n) (the outer for loop). In other words, each item has to do something to every other item, or n*n, which is n**2. – Haemocyte 26/1, 2011 at 1:34

The solution can be made more succinct. Rather than initiating result as an empty list and using a for loop with append, one can use list comprehension: result = [(entry, data.count(entry)) for entry in set(data)] (cf. docs.python.org/2/tutorial/datastructures.html, section 5.1.4). – Luannaluanne 28/8, 2016 at 20:48

V

2

Try using a collections.Counter:

from collections import Counter
data = [3,4,2,3,4,3]
Counter(data).most_common()

Verisimilitude answered 25/1, 2011 at 9:19 Comment(1)

Which version of Python are you using? I think Counter was introduced in 2.6. – Verisimilitude 25/1, 2011 at 9:27

J

2

Why would you choose an O(n**2) algorithm to do this. The alternative to Counter (if you have <2.7) is not too difficult

>>> from operator import itemgetter
>>> from collections import defaultdict
>>> L=[3, 3, 3, 4, 4, 2]
>>> D=defaultdict(int)
>>> for i in L:
...     D[i]+=1
... 
>>> sorted(D.items(), key=itemgetter(1), reverse=True)
[(3, 3), (4, 2), (2, 1)]

Jobie answered 25/1, 2011 at 10:16 Comment(0)

G

0

def myfun(x,y):
    return x[1]-y[1]

list1 = [3, 3, 3, 4, 4, 2]
s1 = set(list1)
newlist = []
for e in s1:
    newlist.append((e,list1.count(e)))
print sorted(newlist,cmp=myfun)

I think, this is what you asked for. Sorry for hurry with the first answer. But just note that cmp argument for sorted is not available in python3

Granule answered 25/1, 2011 at 9:13 Comment(1)

See O(n**2) comments at #1829970. – Haemocyte 25/1, 2011 at 9:20

Recommended topics

Hot tags