Grouping Python tuple list
Asked Answered
I

8

31

I have a list of (label, count) tuples like this:

[('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10), ('apple', 4), ('banana', 3)]

From that I want to sum all values with the same label (same labels always adjacent) and return a list in the same label order:

[('grape', 103), ('apple', 29), ('banana', 3)]

I know I could solve it with something like:

def group(l):
    result = []
    if l:
        this_label = l[0][0]
        this_count = 0
        for label, count in l:
            if label != this_label:
                result.append((this_label, this_count))
                this_label = label
                this_count = 0
            this_count += count
        result.append((this_label, this_count))
    return result

But is there a more Pythonic / elegant / efficient way to do this?

Incommunicative answered 12/2, 2010 at 1:19 Comment(0)
T
44

itertools.groupby can do what you want:

import itertools
import operator

L = [('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10),
     ('apple', 4), ('banana', 3)]

def accumulate(l):
    it = itertools.groupby(l, operator.itemgetter(0))
    for key, subiter in it:
       yield key, sum(item[1] for item in subiter) 

print(list(accumulate(L)))
# [('grape', 103), ('apple', 29), ('banana', 3)]
Tacmahack answered 12/2, 2010 at 1:26 Comment(3)
I like the use of operator.itemgetter in place of lambda.Godart
This requires the list to be sorted on the first key. If it isn't already sorted, then the defaultdict approach from ghostdog74 is a much better solution.Clouded
Why would you use operator instead of lambda?Gerstner
X
8

using itertools and list comprehensions

import itertools

[(key, sum(num for _, num in value))
    for key, value in itertools.groupby(l, lambda x: x[0])]

Edit: as gnibbler pointed out: if l isn't already sorted replace it with sorted(l).

Xylon answered 12/2, 2010 at 1:25 Comment(2)
to use groupby you must first ensure that the sequence is pregrouped (all the 'grape' adjacent, etc). one way to do that is to sort the sequence firstScaleboard
@Thomas Wouters, yes you are correct ("same labels are always adjacent")Scaleboard
S
6
import collections
d=collections.defaultdict(int)
a=[]
alist=[('grape', 100), ('banana', 3), ('apple', 10), ('apple', 4), ('grape', 3), ('apple', 15)]
for fruit,number in alist:
    if not fruit in a: a.append(fruit)
    d[fruit]+=number
for f in a:
    print (f,d[f])

output

$ ./python.py
('grape', 103)
('banana', 3)
('apple', 29)
Socialization answered 12/2, 2010 at 1:45 Comment(1)
This does search in alist for each item which makes your algorithm O(n^2) not a good thing.Shul
S
5
>>> from itertools import groupby
>>> from operator import itemgetter
>>> L=[('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10), ('apple', 4), ('banana', 3)]
>>> [(x,sum(map(itemgetter(1),y))) for x,y in groupby(L, itemgetter(0))]
[('grape', 103), ('apple', 29), ('banana', 3)]
Scaleboard answered 12/2, 2010 at 1:49 Comment(0)
F
4

my version without itertools
[(k, sum([y for (x,y) in l if x == k])) for k in dict(l).keys()]

Flied answered 19/4, 2017 at 12:51 Comment(0)
J
1

Method

def group_by(my_list):
    result = {}
    for k, v in my_list:
        result[k] = v if k not in result else result[k] + v
    return result 

Usage

my_list = [
    ('grape', 100), ('grape', 3), ('apple', 15),
    ('apple', 10), ('apple', 4), ('banana', 3)
]

group_by(my_list) 

# Output: {'grape': 103, 'apple': 29, 'banana': 3}

You Convert to List of tuples like list(group_by(my_list).items()).

Jahdal answered 16/5, 2018 at 7:12 Comment(0)
K
0

Or a simpler more readable answer ( without itertools ):

pairs = [('foo',1),('bar',2),('foo',2),('bar',3)]

def sum_pairs(pairs):
  sums = {}
  for pair in pairs:
    sums.setdefault(pair[0], 0)
    sums[pair[0]] += pair[1]
  return sums.items()

print sum_pairs(pairs)
Koeppel answered 10/7, 2016 at 18:29 Comment(0)
T
0

Simpler answer without any third-party libraries:

dct={}

for key,value in alist:
    if key not in dct:
        dct[key]=value
    else:
        dct[key]+=value
Tartuffe answered 8/7, 2022 at 9:47 Comment(1)
I don't see any third-party libraries here. itertools, operator, and collections are all part of the standard library. They come with Python.Cannell

© 2022 - 2024 — McMap. All rights reserved.