python: group elements of a tuple having the same first element

Asked 29/9, 2017 at 16:19 Answered 4/9, 2020 at 18:10

i have a tuple like this

[
(379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
(4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
(4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
]

i would like to get instead this:

[
(379146591, (('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)), 
(4746004, (('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)))
]

so the for any element, anything that is not the first element is inside a sub-tuple of it, and if the following element has the same element as first element, it will be set as another sub-tuple of the previous one.

so i can do:

for i in data:
    # getting the first element of the list
    for sub_i in i[1]:
        # i access all the tuples inside

are there some functions to do this?

Sites answered 29/9, 2017 at 16:19 Comment(3)

I think this would be a good use for a dictionary. You can use the first element as the key and the value can be a list of tuples. – Bipinnate 29/9, 2017 at 16:25

@Bipinnate ok thank you i was wondering if there was a library having a function like this or i have to write my own by me – Sites 29/9, 2017 at 16:26

You shouldn't need a library for this. I can write a sample here with dictionaries. Check out the answer by @Psidom – Bipinnate 29/9, 2017 at 16:27

It's pretty simple with defaultdict; You initialize the default value to be a list and then append the item to the value of the same key:

lst = [
    (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
    (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
    (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
]

from collections import defaultdict    
d = defaultdict(list)

for k, *v in lst:
    d[k].append(v)

list(d.items())
#[(4746004,
#  [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2),
#   ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]),
# (379146591, [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)])]

If order is important, use an OrderedDict which can remember the insertion orders:

from collections import OrderedDict
d = OrderedDict()

for k, *v in lst:
    d.setdefault(k, []).append(v)

list(d.items())
#[(379146591, [['it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1]]),
# (4746004,
#  [['it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2],
#   ['it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3]])]

Lowborn answered 29/9, 2017 at 16:28 Comment(1)

You don't need OrderedDict starting with CPython 3.6 and all other Python implementations starting with Python 3.7 because regular dicts maintain insertion order – Deliverance 9/9, 2020 at 17:52

Use itertools.groupby (and operator.itemgetter to get the first item). The only thing is that your data needs to already be sorted so that the groups appear one after the other (if you've used the uniq and sort bash commands, same idea), you can use sorted() for this

import operator
from itertools import groupby

data = [
    (379146591, "it", 55, 1, 1, "NON ENTRARE", "NonEntrate", 55, 1),
    (4746004, "it", 28, 2, 2, "NON ENTRARE", "NonEntrate", 26, 2),
    (4746004, "it", 28, 2, 2, "TheBestTroll Group", "TheBestTrollGroup", 2, 3),
]

data = sorted(data, key=operator.itemgetter(0))  # this might be unnecessary
for k, g in groupby(data, operator.itemgetter(0)):
    print(k, list(g))

Will output

4746004 [(4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]
379146591 [(379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)]

In your case, you also need to remove the first element from your lists of values. Change the last two lines of the above to:

for k, g in groupby(data, operator.itemgetter(0)):
    print(k, [item[1:] for item in g])

Output:

4746004 [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]
379146591 [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)]

Deliverance answered 4/9, 2020 at 18:10 Comment(0)

You can use Python3 variable unpacking and OrderedDict to retain order:

from collections import OrderedDict
d = OrderedDict()
l = [
  (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
  (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
 (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
]

for a, *b in l:
  if a in d:
     d[a].append(b)
  else:
     d[a] = [b]

final_data = [(a, tuple(map(tuple, b))) for a, b in d.items()]

Output:

[(379146591, (('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1),)), (4746004, (('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)))]

Mattingly answered 29/9, 2017 at 16:31 Comment(1)

You don't need OrderedDict starting with CPython 3.6 and all other Python implementations starting with Python 3.7 because regular dicts maintain insertion order – Deliverance 9/9, 2020 at 17:50

u can use collection.defaultdict:

data = [
    (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
    (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
    (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
    ]
from collections import defaultdict
a = defaultdict(list)
a = defaultdict(list)


from collections import defaultdict
a = defaultdict(list)

for d in data:
    a[d[0]].append(d[1:])

for k,v in a.items():
    a[k] = tuple(a[k])

print(dict(a))

Tanh answered 29/9, 2017 at 16:38 Comment(0)

Recommended topics

Hot tags