A data-structure for 1:1 mappings in python?
Asked Answered
B

9

45

I have a problem which requires a reversable 1:1 mapping of keys to values.

That means sometimes I want to find the value given a key, but at other times I want to find the key given the value. Both keys and values are guaranteed unique.

x = D[y]
y == D.inverse[x]

The obvious solution is to simply invert the dictionary every time I want a reverse-lookup: Inverting a dictionary is very easy, there's a recipe here but for a large dictionary it can be very slow.

The other alternative is to make a new class which unites two dictionaries, one for each kind of lookup. That would most likely be fast but would use up twice as much memory as a single dict.

So is there a better structure I can use?

  • My application requires that this should be very fast and use as little as possible memory.
  • The structure must be mutable, and it's strongly desirable that mutating the object should not cause it to be slower (e.g. to force a complete re-index)
  • We can guarantee that either the key or the value (or both) will be an integer
  • It's likely that the structure will be needed to store thousands or possibly millions of items.
  • Keys & Valus are guaranteed to be unique, i.e. len(set(x)) == len(x) for for x in [D.keys(), D.valuies()]
Brushwork answered 14/5, 2009 at 15:14 Comment(1)
How big is this dictionary? Are you sure two copies do not fit in memory?Phyletic
F
11
class TwoWay:
    def __init__(self):
       self.d = {}
    def add(self, k, v):
       self.d[k] = v
       self.d[v] = k
    def remove(self, k):
       self.d.pop(self.d.pop(k))
    def get(self, k):
       return self.d[k]
Frame answered 3/9, 2009 at 16:46 Comment(5)
This class fails in this example: {1 : 2, 2 : 4}An inverse method has to be implemented, IMHO.Blakeblakelee
@PaulPichaureau That's not a 1:1 mapping, you're thinking of a more general reversible dictionary. In the bijective (1:1) case, if 1 -> 2 is entered then by assumption 2 -> 1 must also be. You can't add then add 2 -> 4 without violating the preconditions. One could add a simple check in add(), such as if (k in self.d or v in self.d): # drop or throwMacklin
Or if (k in self.d and self.d[k] != v): self.d.pop(self.d[k])Cargile
This does not work if domain and codomain are different or have different types.Scalar
To be fair, this is storing the exact amount of space as a two dictionary implementation :pBaseboard
C
28

The other alternative is to make a new class which unites two dictionaries, one for each kind of lookup. That would most likely be fast but would use up twice as much memory as a single dict.

Not really. Have you measured that? Since both dictionaries would use references to the same objects as keys and values, then the memory spent would be just the dictionary structure. That's a lot less than twice and is a fixed ammount regardless of your data size.

What I mean is that the actual data wouldn't be copied. So you'd spend little extra memory.

Example:

a = "some really really big text spending a lot of memory"

number_to_text = {1: a}
text_to_number = {a: 1}

Only a single copy of the "really big" string exists, so you end up spending just a little more memory. That's generally affordable.

I can't imagine a solution where you'd have the key lookup speed when looking by value, if you don't spend at least enough memory to store a reverse lookup hash table (which is exactly what's being done in your "unite two dicts" solution).

Cent answered 14/5, 2009 at 15:34 Comment(5)
I think this is a good solution. However, you would be doubling the overhead of maintaining a dict (memory and computation) because now there are two. I suspect this overhead would be small compared to the rest of the problem.Substitute
@Doug: you are trading the overhead of maintaning a second dict, by the speed of near O(1) lookups on it. I can't see another approach that wouldn't duplicate the effort.Cent
@Substitute & nosklo: I just want to emphasize nosklo's point. This problem is a classic example of the tradeoff between time and space. If you want to ensure fast lookups on both ends, you need to maintain both dictionaries. The second dictionary is the price you pay for reverse lookups. If the space overhead is too much, a slower solution will be necessary. The only way you can do a fast reverse lookup is if some kind of information is kept around for doing it...Kroeger
By the way, let me add that this is not an order of magnitude of difference in space overhead, and it is an order of magnitude of difference in time saved. The space of the dict keeps things at O(n)... the reverse lookup without a second dict is also O(n), but is improved to O(1)... Now this is all theoretical and we probably care about the constant factors - but I'm just saying this seems like a fair and good tradeoff to me.Kroeger
text_to_number = {v: k for k, v in number_to_text.items()}Nigelniger
F
11
class TwoWay:
    def __init__(self):
       self.d = {}
    def add(self, k, v):
       self.d[k] = v
       self.d[v] = k
    def remove(self, k):
       self.d.pop(self.d.pop(k))
    def get(self, k):
       return self.d[k]
Frame answered 3/9, 2009 at 16:46 Comment(5)
This class fails in this example: {1 : 2, 2 : 4}An inverse method has to be implemented, IMHO.Blakeblakelee
@PaulPichaureau That's not a 1:1 mapping, you're thinking of a more general reversible dictionary. In the bijective (1:1) case, if 1 -> 2 is entered then by assumption 2 -> 1 must also be. You can't add then add 2 -> 4 without violating the preconditions. One could add a simple check in add(), such as if (k in self.d or v in self.d): # drop or throwMacklin
Or if (k in self.d and self.d[k] != v): self.d.pop(self.d[k])Cargile
This does not work if domain and codomain are different or have different types.Scalar
To be fair, this is storing the exact amount of space as a two dictionary implementation :pBaseboard
B
5

The other alternative is to make a new class which unites two dictionaries, one for each > kind of lookup. That would most likely use up twice as much memory as a single dict.

Not really, since they would just be holding two references to the same data. In my mind, this is not a bad solution.

Have you considered an in-memory database lookup? I am not sure how it will compare in speed, but lookups in relational databases can be very fast.

Buttonhook answered 14/5, 2009 at 15:19 Comment(0)
P
2

Here is my own solution to this problem: http://github.com/spenthil/pymathmap/blob/master/pymathmap.py

The goal is to make it as transparent to the user as possible. The only introduced significant attribute is partner.

OneToOneDict subclasses from dict - I know that isn't generally recommended, but I think I have the common use cases covered. The backend is pretty simple, it (dict1) keeps a weakref to a 'partner' OneToOneDict (dict2) which is its inverse. When dict1 is modified dict2 is updated accordingly as well and vice versa.

From the docstring:

>>> dict1 = OneToOneDict()
>>> dict2 = OneToOneDict()
>>> dict1.partner = dict2
>>> assert(dict1 is dict2.partner)
>>> assert(dict2 is dict1.partner)
>>> dict1['one'] = '1'
>>> dict2['2'] = '1'
>>> dict1['one'] = 'wow'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1['one'] = '1'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1.update({'three': '3', 'four': '4'})
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict3 = OneToOneDict({'4':'four'})
>>> assert(dict3.partner is None)
>>> assert(dict3 == {'4':'four'})
>>> dict1.partner = dict3
>>> assert(dict1.partner is not dict2)
>>> assert(dict2.partner is None)
>>> assert(dict1.partner is dict3)
>>> assert(dict3.partner is dict1)
>>> dict1.setdefault('five', '5')
>>> dict1['five']
'5'
>>> dict1.setdefault('five', '0')
>>> dict1['five']
'5'

When I get some free time, I intend to make a version that doesn't store things twice. No clue when that'll be though :)

Pumping answered 5/10, 2010 at 7:30 Comment(0)
B
1

Assuming that you have a key with which you look up a more complex mutable object, just make the key a property of that object. It does seem you might be better off thinking about the data model a bit.

Biz answered 14/5, 2009 at 15:20 Comment(2)
In this case I cannot - the objects on one side are numpy.int64s - the purpose of the application is to adapt a very austere, numeric graph-theory class to something that looks more naturally pythonic.Brushwork
In that case, a flyweight would do.Biz
P
1

"We can guarantee that either the key or the value (or both) will be an integer"

That's weirdly written -- "key or the value (or both)" doesn't feel right. Either they're all integers, or they're not all integers.

It sounds like they're all integers.

Or, it sounds like you're thinking of replacing the target object with an integer value so you only have one copy referenced by an integer. This is a false economy. Just keep the target object. All Python objects are -- in effect -- references. Very little actual copying gets done.

Let's pretend that you simply have two integers and can do a lookup on either one of the pair. One way to do this is to use heap queues or the bisect module to maintain ordered lists of integer key-value tuples.

See http://docs.python.org/library/heapq.html#module-heapq

See http://docs.python.org/library/bisect.html#module-bisect

You have one heapq (key,value) tuples. Or, if your underlying object is more complex, the (key,object) tuples.

You have another heapq (value,key) tuples. Or, if your underlying object is more complex, (otherkey,object) tuples.

An "insert" becomes two inserts, one to each heapq-structured list.

A key lookup is in one queue; a value lookup is in the other queue. Do the lookups using bisect(list,item).

Phyletic answered 14/5, 2009 at 15:50 Comment(2)
It was a fairly clear statement: at least one of items in each key/value pairs will be an integer and sometimes both will be integers.Buttonhook
Why the round-about statement? Why not a positive list of what data types are involved? The logic may be clear, but it's useless for algorithm design. The "either-or" is usually an exclusive or. But the "or both" means it's an inclusive or. Which means ANY combination of types (except 2 non-integers) would be valid. Making it a hard thing to optimize.Phyletic
G
0

It so happens that I find myself asking this question all the time (yesterday in particular). I agree with the approach of making two dictionaries. Do some benchmarking to see how much memory it's taking. I've never needed to make it mutable, but here's how I abstract it, if it's of any use:

class BiDict(list):
    def __init__(self,*pairs):
        super(list,self).__init__(pairs)
        self._first_access = {}
        self._second_access = {}
        for pair in pairs:
            self._first_access[pair[0]] = pair[1]
            self._second_access[pair[1]] = pair[0]
            self.append(pair)

    def _get_by_first(self,key):
        return self._first_access[key]

    def _get_by_second(self,key):
        return self._second_access[key]

    # You'll have to do some overrides to make it mutable
    # Methods such as append, __add__, __del__, __iadd__
    # to name a few will have to maintain ._*_access

class Constants(BiDict):
    # An implementation expecting an integer and a string
    get_by_name = BiDict._get_by_second
    get_by_number = BiDict._get_by_first

t = Constants(
        ( 1, 'foo'),
        ( 5, 'bar'),
        ( 8, 'baz'),
    )

>>> print t.get_by_number(5)
bar
>>> print t.get_by_name('baz')
8
>>> print t
[(1, 'foo'), (5, 'bar'), (8, 'baz')]
Guillaume answered 14/5, 2009 at 16:12 Comment(0)
A
0

How about using sqlite? Just create a :memory: database with a two-column table. You can even add indexes, then query by either one. Wrap it in a class if it's something you're going to use a lot.

Assignation answered 14/5, 2009 at 16:29 Comment(2)
depending on the requirements, using a DB to do this lookup may cost more in terms of memory and cpu cycles than a double dict.Supernal
In my case that will be way too slow!Brushwork
E
0

This is my take on this problem.

from collections import OrderedDict


class _OneToOne(OrderedDict):
    def __setitem__(self, key, value):
        if key in self:
            raise KeyError(f'Key {key} already exists')
        if value in self.r:
            raise KeyError(f'KeyRev {value} already exists')
        OrderedDict.__setitem__(self.r, value, key)
        OrderedDict.__setitem__(self, key, value)

    def __delitem__(self, key):
        OrderedDict.__delitem__(self.r, self[key])
        OrderedDict.__delitem__(self, key)


class OneToOne(_OneToOne):
    def __init__(self, *args, **kwargs):
        """One-to-one mapping"""
        self.r = _OneToOne()
        self.r.r = self
        super(OneToOne, self).__init__(*args, **kwargs)
Ethaethan answered 4/6, 2023 at 1:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.