Inverting a dictionary with list values
Asked Answered
C

6

26

I have this index as a dict.

index = {
    'Testfil2.txt': ['nisse', 'hue', 'abe', 'pind'],
    'Testfil1.txt': ['hue', 'abe', 'tosse', 'svend']}

I need to invert the index so it will be a dict with duplicates of values merged into one key with the 2 original keys as values, like this:

inverse = {
    'nisse': ['Testfil2.txt'],
    'hue': ['Testfil2.txt', 'Testfil1.txt'],
    'abe': ['Testfil2.txt', 'Testfil1.txt'],
    'pind': ['Testfil2.txt'], 
    'tosse': ['Testfil1.txt'],
    'svend': ['Testfil1.txt']}

My textbook has this function for inverting dictionaries:

def invert_dict(d): 
    inverse = dict() 
    for key in d: 
        val = d[key] 
        if val not in inverse: 
            inverse[val] = [key] 
        else: 
            inverse[val].append(key) 
    return inverse

It works fine for simple key:value pairs, BUT, when I try that function with a dict that has lists as values such as my index, I get this error message:

Traceback (most recent call last):
  File "<pyshell#153>", line 1, in <module>
    invert_dict(index)
  File "<pyshell#150>", line 5, in invert_dict
    if val not in inverse:
TypeError: unhashable type: 'list'

The book is no help, and I suspect that I can use tuples in some way, but I am not sure how.

Catt answered 18/2, 2016 at 19:57 Comment(0)
M
19

I've tried around and you want to use val not in inverse but it can't be checked if a "list is in a dict". (val is a list)

For your code a simple change will do what you want:

def invert_dict(d): 
    inverse = dict() 
    for key in d: 
        # Go through the list that is saved in the dict:
        for item in d[key]:
            # Check if in the inverted dict the key exists
            if item not in inverse: 
                # If not create a new list
                inverse[item] = [key] 
            else: 
                inverse[item].append(key) 
    return inverse
Mariano answered 18/2, 2016 at 20:3 Comment(0)
M
29

My solution for reversing a dictionary.

inverse = {}
for k,v in index.items():
    for x in v:
        inverse.setdefault(x, []).append(k)

Output:

{'nisse': ['Testfil2.txt'],
 'hue': ['Testfil2.txt', 'Testfil1.txt'],
 'abe': ['Testfil2.txt', 'Testfil1.txt'],
 'pind': ['Testfil2.txt'],
 'tosse': ['Testfil1.txt'],
 'svend': ['Testfil1.txt']}
Monaco answered 18/2, 2016 at 20:3 Comment(1)
FYI, the whole try/except nonsense could be shortened significantly by either making new_dic a collections.defaultdict(list) or with a plain dict, replacing the entire try/except with just new_dic.setdefault(x, []).append(k), avoiding the need to handle "key exists" and "key missing" separately.Clapp
M
19

I've tried around and you want to use val not in inverse but it can't be checked if a "list is in a dict". (val is a list)

For your code a simple change will do what you want:

def invert_dict(d): 
    inverse = dict() 
    for key in d: 
        # Go through the list that is saved in the dict:
        for item in d[key]:
            # Check if in the inverted dict the key exists
            if item not in inverse: 
                # If not create a new list
                inverse[item] = [key] 
            else: 
                inverse[item].append(key) 
    return inverse
Mariano answered 18/2, 2016 at 20:3 Comment(0)
H
6

As a nested comprehension:

inverse = { v: k for k, l in index.items() for v in l }

or, perhaps more clearly:

inverse = { 
            new_key: index_key                              #body
            for index_key, index_value in index.items()     #outer loop
                for new_key in index_value                  #inner loop
            }

which is roughly equivalent to:

new_keys    =   []
new_values  =   []

for index_key, index_value in index.items():
    for new_key in index_value:
        new_keys.append(new_key)
        new_values.append(index_key)
        
inverse     =   dict(zip(new_keys,new_values))
Hysterical answered 25/3, 2022 at 22:40 Comment(2)
This is the most pythonic version. 1 line.Milliliter
This does not work for OP's specified input. The values in the inverted dict need to be lists as well, in case there are elements common to multiple value-lists in the original.Tourneur
W
2

You can not use list objects as dictionary keys, since they should be hashable objects. You can loop over your items and use dict.setdefault method to create the expected result:

new = {}
for k,value in index.items():
    for v in value:
        new.setdefault(v, []).append(k)

Result:

{'nisse': ['Testfil2.txt'],
 'hue': ['Testfil2.txt', 'Testfil1.txt'],
 'abe': ['Testfil2.txt', 'Testfil1.txt'],
 'pind': ['Testfil2.txt'],
 'tosse': ['Testfil1.txt'],
 'svend': ['Testfil1.txt']}

and if you are dealing with larger datasets for refusing of calling creating an empty list at each calling the setdefault() method you can use collections.defaultdict() which will calls the missing function just when it encounter a new key.

from collections import defaultdict

new = defaultdict(list)
for k,value in index.items():
    for v in value:
        new[v].append(k)

Result:

defaultdict(<type 'list'>,
    {'nisse': ['Testfil2.txt'],
     'hue': ['Testfil2.txt', 'Testfil1.txt'],
     'abe': ['Testfil2.txt', 'Testfil1.txt'],
     'pind': ['Testfil2.txt'],
     'tosse': ['Testfil1.txt'],
     'svend': ['Testfil1.txt']})
Whiffet answered 18/2, 2016 at 20:1 Comment(1)
Do you actually get a performance improvement with defaultdict? From what I remember from a related discussion, it's implemented in Python which makes it pretty slow while dict.setdefault is implemented in C and the time to create a new empty list is negligible.Peekaboo
T
-1

Here is a variation that uses a comprehension plus set to remove duplicates.

def invert_setdict(setdict):
    inverse = {}
    vk = [(v, k) for k, vs in index.items() for v in vs]
    for k, v in vk:
       inverse.setdefault(k, set()).add(v)

    return inverse

Example

>>> index = {
... 'Testfil2.txt': ['nisse', 'hue', 'abe', 'pind'],
... 'Testfil1.txt': ['hue', 'abe', 'tosse', 'svend']}

>>> inverse = invert_setdict(index)
>>> inverse
{'nisse': {'Testfil2.txt'},
 'hue': {'Testfil1.txt', 'Testfil2.txt'},
 'abe': {'Testfil1.txt', 'Testfil2.txt'},
 'pind': {'Testfil2.txt'},
 'tosse': {'Testfil1.txt'},
 'svend': {'Testfil1.txt'}}

If you want to convert the set values to lists:

>>> inverse = {k:list(v) for k, v in inverse.items()}
Tundra answered 20/1, 2023 at 15:1 Comment(3)
Remove duplicates? But OP's data doesn't have any duplicates... Neither does the data you're showing here... If this is filling a different need, you could post a separate question.Peekaboo
Note that sets are unordered. (And going back to lists doesn't restore the order.) If you want to preserve order, you can use a dict with None values as an ordered set.Peekaboo
There's no need to create vk; all it does is take up space. You can put its two loops directly in your code, just like in Arman's answerPeekaboo
E
-2

Two liner solution using unpacking operator * and nested compression.

for k,v in old_dict.items():
    new_dict = {**new_dict,**{vi:k for vi in v}}
Earthshaker answered 11/5, 2022 at 21:16 Comment(1)
This has the same issue as Jeremy's answer, and also requires new_dict = {} to be initialized first.Tourneur

© 2022 - 2024 — McMap. All rights reserved.