Many-to-one mapping (creating equivalence classes)
Asked Answered
S

4

19

I have a project of converting one database to another. One of the original database columns defines the row's category. This column should be mapped to a new category in the new database.

For example, let's assume the original categories are:parrot, spam, cheese_shop, Cleese, Gilliam, Palin

Now that's a little verbose for me, And I want to have these rows categorized as sketch, actor - That is, define all the sketches and all the actors as two equivalence classes.

>>> monty={'parrot':'sketch', 'spam':'sketch', 'cheese_shop':'sketch', 
'Cleese':'actor', 'Gilliam':'actor', 'Palin':'actor'}
>>> monty
{'Gilliam': 'actor', 'Cleese': 'actor', 'parrot': 'sketch', 'spam': 'sketch', 
'Palin': 'actor', 'cheese_shop': 'sketch'}

That's quite awkward- I would prefer having something like:

monty={ ('parrot','spam','cheese_shop'): 'sketch', 
        ('Cleese', 'Gilliam', 'Palin') : 'actors'}

But this, of course, sets the entire tuple as a key:

>>> monty['parrot']

Traceback (most recent call last):
  File "<pyshell#29>", line 1, in <module>
    monty['parrot']
KeyError: 'parrot'

Any ideas how to create an elegant many-to-one dictionary in Python?

Scyphus answered 17/12, 2009 at 11:8 Comment(1)
Check out this elegant answer to a similar question.Mousy
A
18

It seems to me that you have two concerns. First, how do you express your mapping originally, that is, how do you type the mapping into your new_mapping.py file. Second, how does the mapping work during the re-mapping process. There's no reason for these two representations to be the same.

Start with the mapping you like:

monty = { 
    ('parrot','spam','cheese_shop'): 'sketch', 
    ('Cleese', 'Gilliam', 'Palin') : 'actors',
}

then convert it into the mapping you need:

working_monty = {}
for k, v in monty.items():
    for key in k:
        working_monty[key] = v

producing:

{'Gilliam': 'actors', 'Cleese': 'actors', 'parrot': 'sketch', 'spam': 'sketch', 'Palin': 'actors', 'cheese_shop': 'sketch'}

then use working_monty to do the work.

Accrete answered 17/12, 2009 at 11:32 Comment(3)
+1 Thanks a lot. I assume there's no python native type for this job; Do you think there should be one?Scyphus
can't we have some reference as the value in the (key, value) pair rather than storing the actual string? Since the no. of keys are significantly larger than the no. of values, this would save a lot of space. Is there a way to do this?Bookcase
Old question, but regarding @ishan3243's observation, I am pretty sure Python will intern these strings, since they are being defined explicitly as constants. Furthermore, even if the values are read in at run-time, because of how this code loops and assigns the same variable to each index, it should cause string interning.Cy
U
5

You could override dict's indexer, but perhaps the following simpler solution would be better:

>>> assoc_list = ( (('parrot','spam','cheese_shop'), 'sketch'), (('Cleese', 'Gilliam', 'Palin'), 'actors') )
>>> equiv_dict = dict()
>>> for keys, value in assoc_list:
    for key in keys:
        equiv_dict[key] = value


>>> equiv_dict['parrot']
'sketch'
>>> equiv_dict['spam']
'sketch'

(Perhaps the nested for loop can be compressed an impressive one-liner, but this works and is readable.)

Usance answered 17/12, 2009 at 11:28 Comment(1)
Not for the faint of heart: equiv_dict = dict( sum([[(k, v) for k in ks] for (ks, v) in assoc_list], []) )Usance
T
3
>>> monty={ ('parrot','spam','cheese_shop'): 'sketch', 
        ('Cleese', 'Gilliam', 'Palin') : 'actors'}

>>> item=lambda x:[z for y,z in monty.items() if x in y][0]
>>>
>>> item("parrot")
'sketch'
>>> item("Cleese")
'actors'

But let me tell you, It will be slow than normal one to one dictionary.

Truong answered 17/12, 2009 at 11:20 Comment(1)
Slow-ish, but on the plus side doesn't require a persistent secondary data structure. Could be sped up a certain degree by not being written as a lambda and using a list comprehension.Mousy
O
2

If you want to have multiple keys pointing to the same value, i.e.

m_dictionary{('k1', 'k2', 'k3', 'k4'):1, ('k5', 'k6'):2} and access them as,

`print(m_dictionary['k1'])` ==> `1`.

Check this multi dictionary python module multi_key_dict. Install and Import it. https://pypi.python.org/pypi/multi_key_dict

Ostracoderm answered 24/7, 2016 at 2:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.