How to implement an efficient bidirectional hash table?
Asked Answered
R

8

135

Python dict is a very useful data-structure:

d = {'a': 1, 'b': 2}

d['a'] # get 1

Sometimes you'd also like to index by values.

d[1] # get 'a'

Which is the most efficient way to implement this data-structure? Any official recommend way to do it?

Reciprocate answered 23/7, 2010 at 13:38 Comment(6)
If you prefer, we can assume that values are immutable as well as keys are.Reciprocate
What would you return for this dict: {'a' : 1, 'b': 2, 'A' : 1 }Talus
@PaulMcGuire : I would return {1: ['a', 'A'], 2: 'b'}. See my answer for such a way to do it.Elena
Note to moderator: this is not a duplicate of https://mcmap.net/q/168772/-two-way-reverse-map-duplicate. The latter has 1) very vague wording 2) no MCVE 3) only deals with the case of the bijective map (see first comment in this question), which is a lot more restrictive than this actual question, which is more general. So I think marking it as duplicate is here, in this particular case, misleading. If really one should be a duplicate of another, it should be the contrary as this one here covers the general case whereas the other (see answers) doesn't cover the non-bijective case.Elena
@Elena It should return {1: {'a', 'A'}, 2: 'b'}Spiral
This question is more than ten years old, but I am reading it for the first time now. You might find inspiration in the Java library Google Guava. They have a class HashBiMap that is worth reading. (I assume a similar structure could be implemented in Python.) The documentation clearly outlines edge cases and how they are handled. Ref: github.com/google/guava/blob/master/guava/src/com/google/common/…Bathos
E
106

Here is a class for a bidirectional dict, inspired by Finding key from value in Python dictionary and modified to allow the following 2) and 3).

Note that :

    1. The inverse directory bd.inverse auto-updates itself when the standard dict bd is modified.
    1. The inverse directory bd.inverse[value] is always a list of key such that bd[key] == value.
    1. Unlike the bidict module from https://pypi.python.org/pypi/bidict, here we can have 2 keys having same value, this is very important.

Code:

class bidict(dict):
    def __init__(self, *args, **kwargs):
        super(bidict, self).__init__(*args, **kwargs)
        self.inverse = {}
        for key, value in self.items():
            self.inverse.setdefault(value, []).append(key) 

    def __setitem__(self, key, value):
        if key in self:
            self.inverse[self[key]].remove(key) 
        super(bidict, self).__setitem__(key, value)
        self.inverse.setdefault(value, []).append(key)        

    def __delitem__(self, key):
        self.inverse.setdefault(self[key], []).remove(key)
        if self[key] in self.inverse and not self.inverse[self[key]]: 
            del self.inverse[self[key]]
        super(bidict, self).__delitem__(key)

Usage example:

bd = bidict({'a': 1, 'b': 2})  
print(bd)                     # {'a': 1, 'b': 2}                 
print(bd.inverse)             # {1: ['a'], 2: ['b']}
bd['c'] = 1                   # Now two keys have the same value (= 1)
print(bd)                     # {'a': 1, 'c': 1, 'b': 2}
print(bd.inverse)             # {1: ['a', 'c'], 2: ['b']}
del bd['c']
print(bd)                     # {'a': 1, 'b': 2}
print(bd.inverse)             # {1: ['a'], 2: ['b']}
del bd['a']
print(bd)                     # {'b': 2}
print(bd.inverse)             # {2: ['b']}
bd['b'] = 3
print(bd)                     # {'b': 3}
print(bd.inverse)             # {2: [], 3: ['b']}
Elena answered 19/2, 2014 at 22:34 Comment(16)
Very neat solution of the ambiguous case!Antineutrino
I think this data structure is very useful in many practical problems.Disposed
This is phenomenal. It's succinct; it's self-documenting; it's reasonably efficient; it just works. My only quibble would be to optimize the repeated lookups of self[key] in __delitem__() with a single value = self[key] assignment reused for such lookups. But... yeah. That's negligible. Thanks for the pure awesome, Basj!Thomasthomasa
How about a Python 3 version?Telium
Have you tried this code in Py3 yet, @Telium ? I didn't spot any issues on a quick scan through it.Lamplighter
I did @TheNate. I get error AttributeError: 'bidict' object has no attribute 'iteritems' when running bd = bidict({'a': 1, 'b': 2})Telium
Ah. Right. Try it without "iter" and it should work.Lamplighter
I like this answer for the example. The accepted answer is correct still and I think the accepted answer should stay as the accepted answer, but this is a little bit more explicit for defining it yourself, merely because it clearly lays out that in order to reverse the dictionary you must place the reverse's values into a list since there cannot be a one-to-one mapping because a dictionary has a one-to-many relationship with key-to-values.Cupo
Can this be packaged into bidict?Oversubscribe
this doesn't support many to many in both directions. bd = bidict({'a': [1,2,3], 'b': [2,3,4]}) errors about unhashable type.Tern
This breaks when using bd.pop(some_key, None)Rawboned
I wrote my own BiDict based on what i've seen here and without {key -> key} issue. Also my implementation has same methods and behavior, that you expect from a dict. Please ping me here if you find some uncovered cases: gist.github.com/titovanton/2225805da5ccdd788349bbf7f1eef1a8Ledford
Will this get inefficient if we have a large number of keys with the same value then set some of these to different values? Would replacing the lists in inverse with sets solve this?Yonkers
Nice solution! I was just wondering if it is really desiderable to possibly have an empty list as value in the inverse dict after a key in the original dict changed its value ( I am referring to the last three lines in the usage example here). Would'nt it be better to check for that in setitem? if self.inverse[self[key]] == []: del self.inverse[self[key]]Margarito
I would use sets instead of lists in the inverse dictionary.Rawinsonde
Also breaks with bd.clear(); that doesn't go through __delitem__Atop
H
56

You can use the same dict itself by adding key,value pair in reverse order.

d={'a':1,'b':2}
revd=dict([reversed(i) for i in d.items()])
d.update(revd)
Headpiece answered 23/7, 2010 at 13:59 Comment(12)
+1 A nice, practical solution. Another way to write it: d.update( dict((d[k], k) for k in d) ).Mule
+1 For neat use of reversed(). I'm undecided if it's more readable than the explicit dict((v, k) for (k, v) in d.items()). In any case, you can pass pairs directly to .update: d.update(reversed(i) for i in d.items()).Plait
Note this fails e.g. for d={'a':1, 'b':2, 1: 'b'}Antineutrino
No, it doesn't for me.Caruso
Slight modification: dict(map(reversed, a_dict.items())).Disposed
This fails for d={'a':1,'b':2, 'c': 1}.Elena
@Disposed Nope, afterwards d[1] = 'a' overwrites the original d[1] = 'b'Antineutrino
Adding reverse mappings to the original dictionary is an awful idea. As the above comments demonstrate, doing so is not safe in the general case. Just maintain two separate dictionaries. Since the first two lines of this answer ignoring the trailing d.update(revd) are great, however, I'm still contemplating an upvote. Let's give this some thought.Thomasthomasa
the fact that it fails in the above cases is not related to the implementation. It is implicit that in a bidirectional structure as described in question, you cannot have duplicate values/keys (as they are accessed in same way). Keeping two dictionaries (in any form) corresponds to the limitation of not having duplicate values or duplicate keys (but you can have same values in both keys and values). In this case ={'a':1, 'b':2, 1: 'b'} succeeds, while d={'a':1,'b':2, 'c': 1} still fails, because not a matter of implementation.Bakery
In many cases, keys and values reside in different namespaces. Then, this is a good solution. One just has to be aware of that.Clemen
Idea is awful ONLY IF your dictionary has such cases. If one is looking for such data structure s/he already knows the limitations and the data probably doesn't have such cases. Searching for perfection will only waste your time if such simple solutions suffice.Colville
A clearer way of writing this would use a dictionary comprehension: revd = {j: i for i, j in d.items()}Cystocarp
D
45

A poor man's bidirectional hash table would be to use just two dictionaries (these are highly tuned datastructures already).

There is also a bidict package on the index:

The source for bidict can be found on github:

Devora answered 23/7, 2010 at 13:41 Comment(12)
2 dicts requires double inserts and deletes.Reciprocate
@Robuts, didn't understand you.Reciprocate
@Juanjo: nearly any bidirectional/reversible hash table will involve "double inserts and deletes", either as part of implementing the structure, or as part of using it. Keeping two indexes is really the only fast way to do it, AFAIK.Agglutination
Of course; I meant that take care of the 2 index by hand is the problem.Reciprocate
No, bidict is not really usable : Try : from bidict import bidict mybidict = bidict({'a': 'value1', 'b': 'value1'}) print mybidict['a'] => this leads to a KeyError because duplicate values are not accepted :(Elena
@Elena I think it's correct that it's not accepted since having more than one value means it's not a bijection anymore and is ambiguous for the reverse lookup.Singlecross
I don't really agree @Singlecross : I think limiting the usage of bidict to injections and bijections is a bit too restrictive... Here in my previous example, using math notations, we could see f^{-1}('value1') as the set {a, b}.Elena
@Elena Well, I can understand that there would be use cases which would be useful to have more than one value per key, so maybe this type of data structure should exist as a subclass of bidict. However, since a normal dict maps to a single object, I think it makes much more sense for the reverse to be the same as well. (Just to clarify, although the value can be a collection too, I meant that the key of the first dict should be of the same type as the value of the reverse dict)Singlecross
the link is outdated :(Jesseniajessey
@Jesseniajessey pypi.python.org/pypi/bidict should work as should pip install bidict, github.com/jab/bidict, and bidict.readthedocs.org.Maricamarice
A one to one datastructure or (bimap) would be as described here: en.wikipedia.org/wiki/Bidirectional_map Hence a 1 value maps to a key. And 1 key maps to a value. Making this less strict, one can extend this to allow multiple keys pointing to 1 value, but 1 value can only point to 1 key. And then loosening up even more would allow multikeys to multivalues. An example of the latter is to use 2 dicts mapping keys/values to sets of values/keys.Oversubscribe
This is what I did. I have no inserts or deletes, so it's easy enough to make two.Arcane
K
8

The below snippet of code implements an invertible (bijective) map:

class BijectionError(Exception):
    """Must set a unique value in a BijectiveMap."""

    def __init__(self, value):
        self.value = value
        msg = 'The value "{}" is already in the mapping.'
        super().__init__(msg.format(value))


class BijectiveMap(dict):
    """Invertible map."""

    def __init__(self, inverse=None):
        if inverse is None:
            inverse = self.__class__(inverse=self)
        self.inverse = inverse

    def __setitem__(self, key, value):
        if value in self.inverse:
            raise BijectionError(value)

        self.inverse._set_item(value, key)
        self._set_item(key, value)

    def __delitem__(self, key):
        self.inverse._del_item(self[key])
        self._del_item(key)

    def _del_item(self, key):
        super().__delitem__(key)

    def _set_item(self, key, value):
        super().__setitem__(key, value)

The advantage of this implementation is that the inverse attribute of a BijectiveMap is again a BijectiveMap. Therefore you can do things like:

>>> foo = BijectiveMap()
>>> foo['steve'] = 42
>>> foo.inverse
{42: 'steve'}
>>> foo.inverse.inverse
{'steve': 42}
>>> foo.inverse.inverse is foo
True
Ketch answered 25/12, 2015 at 5:12 Comment(0)
A
1

Something like this, maybe:

import itertools

class BidirDict(dict):
    def __init__(self, iterable=(), **kwargs):
        self.update(iterable, **kwargs)
    def update(self, iterable=(), **kwargs):
        if hasattr(iterable, 'iteritems'):
            iterable = iterable.iteritems()
        for (key, value) in itertools.chain(iterable, kwargs.iteritems()):
            self[key] = value
    def __setitem__(self, key, value):
        if key in self:
            del self[key]
        if value in self:
            del self[value]
        dict.__setitem__(self, key, value)
        dict.__setitem__(self, value, key)
    def __delitem__(self, key):
        value = self[key]
        dict.__delitem__(self, key)
        dict.__delitem__(self, value)
    def __repr__(self):
        return '%s(%s)' % (type(self).__name__, dict.__repr__(self))

You have to decide what you want to happen if more than one key has a given value; the bidirectionality of a given pair could easily be clobbered by some later pair you inserted. I implemented one possible choice.


Example :

bd = BidirDict({'a': 'myvalue1', 'b': 'myvalue2', 'c': 'myvalue2'})
print bd['myvalue1']   # a
print bd['myvalue2']   # b        
Argentina answered 23/7, 2010 at 13:58 Comment(3)
I'm not sure if this is a problem, but using the above implementation, wouldn't there be issues if the keys and values overlapped? So dict([('a', 'b'), ('b', 'c')]); dict['b'] -> 'c' instead of the key 'a'.Sapanwood
It's not an issue for the OP's example, but might be a good disclaimer to include.Sapanwood
How can we do that print bd['myvalue2'] answers b, c (or [b, c], or (b, c), or anything else) ?Elena
A
1

First, you have to make sure the key to value mapping is one to one, otherwise, it is not possible to build a bidirectional map.

Second, how large is the dataset? If there is not much data, just use 2 separate maps, and update both of them when updating. Or better, use an existing solution like Bidict, which is just a wrapper of 2 dicts, with updating/deletion built in.

But if the dataset is large, and maintaining 2 dicts is not desirable:

  • If both key and value are numeric, consider the possibility of using Interpolation to approximate the mapping. If the vast majority of the key-value pairs can be covered by the mapping function (and its
    reverse function), then you only need to record the outliers in maps.

  • If most of access is uni-directional (key->value), then it is totally ok to build the reverse map incrementally, to trade time for
    space.

Code:

d = {1: "one", 2: "two" }
reverse = {}

def get_key_by_value(v):
    if v not in reverse:
        for _k, _v in d.items():
           if _v == v:
               reverse[_v] = _k
               break
    return reverse[v]
Adamandeve answered 25/12, 2015 at 7:42 Comment(0)
P
1

a better way is convert the dictionary to a list of tuples then sort on a specific tuple field

def convert_to_list(dictionary):
    list_of_tuples = []
    for key, value in dictionary.items():
        list_of_tuples.append((key, value))
    return list_of_tuples

def sort_list(list_of_tuples, field):
     return sorted(list_of_tuples, key=lambda x: x[field])

dictionary = {'a': 9, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
list_of_tuples = convert_to_list(dictionary)
print(sort_list(list_of_tuples, 1))

output

[('b', 2), ('c', 3), ('d', 4), ('e', 5), ('a', 9)]
Pinball answered 23/10, 2021 at 20:33 Comment(0)
C
-1

Unfortunately, the highest rated answer, bidict does not work.

There are three options:

  1. Subclass dict: You can create a subclass of dict, but beware. You need to write custom implementations ofupdate, pop, initializer, setdefault. The dict implementations do not call __setitem__. This is why the highest rated answer has issues.

  2. Inherit from UserDict: This is just like a dict, except all the routines are made to call correctly. It uses a dict under the hood, in an item called data. You can read the Python Documentation, or use a simple implementation of a by directional list that works in Python 3. Sorry for not including it verbatim: I'm unsure of its copyright.

  3. Inherit from Abstract Base Classes: Inheriting from collections.abc will help you get all the correct protocols and implementations for a new class. This is overkill for a bidirectional dictionary, unless it can also encrypt and cache to a database.

TL;DR -- Use this for your code. Read Trey Hunner's article for details.

Casein answered 10/7, 2020 at 2:50 Comment(3)
Nice article. Still, what exactly does not work with bidict?Churinga
What did not work two years ago may or may not work now.Casein
It did work 2 years ago, it worked with Python 2.x, and it works with Python 3.10 now. Rather than distract with a mistake from whatever testing you did in 2020, it'd be better to focus on what your advice offers which is different since the mention of UserDict, etc. is useful and might point someone with more advanced needs in the right direction.Mistassini

© 2022 - 2024 — McMap. All rights reserved.