python: sum values in a list if they share the first word

Asked 16/3, 2022 at 13:26 Answered 28/3, 2022 at 20:54

I have a list as follows,

flat_list = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2', 'yellow,7', 'hello,7', 'mellow,7', 'hello,7']

I would like to get the sum of the values if they share the same word, so the output should be,

desired output:

l = [('hello',19), ('yellow', 9), ('mellow',13)]

so far, I have tried the following,

new_list = [v.split(',') for v in flat_list]

d = {}
for key, value in new_list:
   if key not in d.keys():
      d[key] = [key]
   d[key].append(value)

# getting rid of the first key in value lists
val = [val.pop(0) for k,val in d.items()]
# summing up the values
va = [sum([int(x) for x in va]) for ka,va in d.items()]

however for some reason the last sum up does not work and i do not get my desired output

Hifi answered 16/3, 2022 at 13:26 Comment(0)

Here is a variant for accomplishing your goal using defaultdict:

from collections import defaultdict

t = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2',
     'yellow,7', 'hello,7', 'mellow,7', 'hello,7']

count = defaultdict(int)

for name_number in t:
    name, number = name_number.split(",")
    count[name] += int(number)

You could also use Counter:

from collections import Counter

count = Counter()

for name_number in t:
    name, number = name_number.split(",")
    count[name] += int(number)

In both cases you can convert the output to a list of tuples using:

list(count.items())
# -> [('hello', 19), ('mellow', 13), ('yellow', 9)]

I ran your code and I do get the correct results (although not in your desired format).

Ingratiate answered 16/3, 2022 at 13:29 Comment(0)

One possible approach would be:

import pandas as pd
    
flat_list = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2', 'yellow,7', 'hello,7', 'mellow,7', 'hello,7']
new_list = [v.split(',') for v in flat_list]
    
df = pd.DataFrame(new_list)
df[1] = df[1].astype(int)
df2 = df.groupby(0).sum()
print(df2)

Output:

    0        1
    hello   19
    mellow  13
    yellow   9

Ensemble answered 16/3, 2022 at 13:35 Comment(0)

You can do this very simply without importing additional modules like so:

t = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2', 'yellow,7', 'hello,7', 'mellow,7', 'hello,7']

d = {}
for s in t: #for each string
    w, n = s.split(',') #get the string and the number
    d[w] = d[w] + int(n) if w in d.keys() else int(n) #add the number (sum)

l = list(d.items()) #make the result a list of tuples
print(l)

Output:

[('hello', 19), ('mellow', 13), ('yellow', 9)]

Gamber answered 16/3, 2022 at 13:41 Comment(2)

And I'd find d[w] = d.get(w, 0) + int(n) simpler. – Ondometer 16/3, 2022 at 18:42

@KellyBundy sorry I edited to include the OP's variable name and forgot the print statement. Thanks, updated. – Gamber 16/3, 2022 at 19:4

the last sum up does not work and i do not get my desired output

Actually it works fine, you just forgot to combine the two lists. Add

print(list(zip(val, va)))

and you'll see:

[('hello', 19), ('mellow', 13), ('yellow', 9)]

That's equivalent to your desired output:

[('hello',19), ('yellow', 9), ('mellow',13)]

Only the entries for yellow and mellow are in different order, since mellow appears first in the input.

Ondometer answered 16/3, 2022 at 18:38 Comment(0)

Summarising the replies above I'd say that the cleanest way, without the need of external imports, seems to be:

flat_list = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2', 
             'yellow,7', 'hello,7', 'mellow,7', 'hello,7']

d = {}
for ele in flat_list:
    key, value = ele.split(',')
    d[key]= d.get(key, 0) + int(value)
    
list(d.items())

And the output is:

[('hello', 19), ('mellow', 13), ('yellow', 9)]

That can be sorted by increasing value like this (or alphabetically using x[0]; set reverse to True for descending order):

sorted(list(d.items()), key=lambda x: x[1], reverse=False)

Sundstrom answered 28/3, 2022 at 20:54 Comment(0)

for some reason the last sum up does not work

To fix your original solution:

d = {ka:sum([int(x) for x in va]) for ka,va in d.items()}

Energetics answered 16/3, 2022 at 14:11 Comment(7)

How does this fix anything? That's equivalent to theirs, no? – Ondometer 16/3, 2022 at 18:39

It is not equivalent to theirs. Their original mistakenly added a key to the values AND added the first actual value twice. I think that's what i recall, anyway making those changes produces correct sums. – Energetics 16/3, 2022 at 18:41

Their original works fine, produces the correct sums already. As far as I can tell, you do things differently but achieve the exact same thing. – Ondometer 16/3, 2022 at 18:44

How about now?? – Energetics 16/3, 2022 at 18:54

Now it actually crashes. – Ondometer 16/3, 2022 at 18:58

Funny, I just ran it and it worked for me. All of OPs original with the last line replaced. Feel free to downvote. – Energetics 16/3, 2022 at 19:1

Ah, ok. You used d as input and say you fixed something, so I did this just after their preparation of d, instead of their last two statements. The danger of posting code and not saying what to do with it. If you instead do it instead of their last statement, then it doesn't crash but we're back to not fixing anything. You just replace the dict with another for no apparent reason. – Ondometer 16/3, 2022 at 19:12

Recommended topics

Hot tags