Grouping the same recurring items that occur in a row from list
Asked Answered
E

3

2

For instance, we have a list like this:

L = ["item1", "item2", "item3", "item3", "item3", "item1", "item2", "item4", "item4", "item4"]

I want to pack them into list of tuples of the form:

[("item1", 1), ("item2", 1), ("item3", 3),... ("item1", 1)]

I've already developed an algorithm which does something similar, to get:

{item1: 2, item2: 2, ...}

(it finds all the occurrences and counts them, even if they aren't neighbours...)

However, I want it to groups only those items which have the same and are neighbours (i.e. occur in a row together), how could I accomplish this?

It's not that I don't know how to do it but I tend to write code that is long and I want an elegant and uncomplicated solution in this case.

Endometriosis answered 26/11, 2012 at 12:52 Comment(2)
item1: 1, item2:1, item3:3....item1: 1 would not be a dictionary... (it has more than one of the same key)Woo
Sorry about that. That is true. Ignore the fact that I've written that must be a dictionary. It is not a requirement. It can be in the form of tuples. So the order is important ofcourse.Endometriosis
N
4

using itertools.groupby(), items are repeated so you might not be able to store all values in a dictionary, as item1 & item2 are repeated:

In [21]: l = ["item1", "item2", "item3", "item3", "item3", "item1", "item2", "item4", "item4", "item4"]

In [22]: for k,g in groupby(l):
    print "{0}:{1}".format(k,len(list(g)))
   ....:     
item1:1
item2:1
item3:3
item1:1
item2:1
item4:3
Noguchi answered 26/11, 2012 at 13:0 Comment(3)
No need for a key function in this case.Colleague
Great.Thanks.I was looking for the groupby function indeed. And yes it is no use for key lambda function. It works absolutely flawlessly:)Endometriosis
You can also use ilen from funcy library instead of len(list(...)) for speed.Johanson
F
5

This is also using itertools.groupby (a generator version):

from itertools import groupby
counts = ((k, sum(1 for _ in g)) for k, g in groupby(l))
>>> list(counts)
[('item1', 1),
 ('item2', 1),
 ('item3', 3),
 ('item1', 1),
 ('item2', 1),
 ('item4', 3)]
Formative answered 26/11, 2012 at 13:2 Comment(3)
len(list(g)) is shorter than sum(1 for _ in g), +1 anyway.Noguchi
@AshwiniChaudhary it's shorter, but I figured it could be faster; I tend to think it's a good idea to avoid creating a list just to count its elements. Thanks for the upvote :)Formative
Good point, jut timed them, sum(1 for _ in g)<len(tuple(g))<len(list(g)), learned something new today. :)Noguchi
N
4

using itertools.groupby(), items are repeated so you might not be able to store all values in a dictionary, as item1 & item2 are repeated:

In [21]: l = ["item1", "item2", "item3", "item3", "item3", "item1", "item2", "item4", "item4", "item4"]

In [22]: for k,g in groupby(l):
    print "{0}:{1}".format(k,len(list(g)))
   ....:     
item1:1
item2:1
item3:3
item1:1
item2:1
item4:3
Noguchi answered 26/11, 2012 at 13:0 Comment(3)
No need for a key function in this case.Colleague
Great.Thanks.I was looking for the groupby function indeed. And yes it is no use for key lambda function. It works absolutely flawlessly:)Endometriosis
You can also use ilen from funcy library instead of len(list(...)) for speed.Johanson
A
0
python 3.2
from itertools import groupby

>>> [(i,(list(v)).count(i)) for i,v in groupby(L)]
Adlei answered 26/11, 2012 at 13:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.