Grouping the same recurring items that occur in a row from list

Asked 26/11, 2012 at 12:52 Answered 26/11, 2012 at 13:44

For instance, we have a list like this:

L = ["item1", "item2", "item3", "item3", "item3", "item1", "item2", "item4", "item4", "item4"]

I want to pack them into list of tuples of the form:

[("item1", 1), ("item2", 1), ("item3", 3),... ("item1", 1)]

I've already developed an algorithm which does something similar, to get:

{item1: 2, item2: 2, ...}

(it finds all the occurrences and counts them, even if they aren't neighbours...)

However, I want it to groups only those items which have the same and are neighbours (i.e. occur in a row together), how could I accomplish this?

It's not that I don't know how to do it but I tend to write code that is long and I want an elegant and uncomplicated solution in this case.

Endometriosis answered 26/11, 2012 at 12:52 Comment(2)

item1: 1, item2:1, item3:3....item1: 1 would not be a dictionary... (it has more than one of the same key) – Woo 26/11, 2012 at 12:59

Sorry about that. That is true. Ignore the fact that I've written that must be a dictionary. It is not a requirement. It can be in the form of tuples. So the order is important ofcourse. – Endometriosis 26/11, 2012 at 13:3

using itertools.groupby(), items are repeated so you might not be able to store all values in a dictionary, as item1 & item2 are repeated:

In [21]: l = ["item1", "item2", "item3", "item3", "item3", "item1", "item2", "item4", "item4", "item4"]

In [22]: for k,g in groupby(l):
    print "{0}:{1}".format(k,len(list(g)))
   ....:     
item1:1
item2:1
item3:3
item1:1
item2:1
item4:3

Noguchi answered 26/11, 2012 at 13:0 Comment(3)

No need for a key function in this case. – Colleague 26/11, 2012 at 13:5

Great.Thanks.I was looking for the groupby function indeed. And yes it is no use for key lambda function. It works absolutely flawlessly:) – Endometriosis 26/11, 2012 at 13:11

You can also use ilen from funcy library instead of len(list(...)) for speed. – Johanson 4/6, 2014 at 19:46

This is also using itertools.groupby (a generator version):

from itertools import groupby
counts = ((k, sum(1 for _ in g)) for k, g in groupby(l))
>>> list(counts)
[('item1', 1),
 ('item2', 1),
 ('item3', 3),
 ('item1', 1),
 ('item2', 1),
 ('item4', 3)]

Formative answered 26/11, 2012 at 13:2 Comment(3)

len(list(g)) is shorter than sum(1 for _ in g), +1 anyway. – Noguchi 26/11, 2012 at 13:13

@AshwiniChaudhary it's shorter, but I figured it could be faster; I tend to think it's a good idea to avoid creating a list just to count its elements. Thanks for the upvote :) – Formative 26/11, 2012 at 13:15

Good point, jut timed them, sum(1 for _ in g)<len(tuple(g))<len(list(g)), learned something new today. :) – Noguchi 26/11, 2012 at 13:22

using itertools.groupby(), items are repeated so you might not be able to store all values in a dictionary, as item1 & item2 are repeated:

In [21]: l = ["item1", "item2", "item3", "item3", "item3", "item1", "item2", "item4", "item4", "item4"]

In [22]: for k,g in groupby(l):
    print "{0}:{1}".format(k,len(list(g)))
   ....:     
item1:1
item2:1
item3:3
item1:1
item2:1
item4:3

Noguchi answered 26/11, 2012 at 13:0 Comment(3)

No need for a key function in this case. – Colleague 26/11, 2012 at 13:5

Great.Thanks.I was looking for the groupby function indeed. And yes it is no use for key lambda function. It works absolutely flawlessly:) – Endometriosis 26/11, 2012 at 13:11

You can also use ilen from funcy library instead of len(list(...)) for speed. – Johanson 4/6, 2014 at 19:46

python 3.2
from itertools import groupby

>>> [(i,(list(v)).count(i)) for i,v in groupby(L)]

Adlei answered 26/11, 2012 at 13:44 Comment(0)

Recommended topics

Hot tags