In python, how can you retrieve a key from a dictionary?

Asked 19/11, 2010 at 12:21 Answered 20/11, 2010 at 17:3

Solved python performance dictionary networkx reverse-lookup

I have a hashable identifier for putting things in a dictionary:

class identifier():
    def __init__(self, d):
        self.my_dict = d
        self.my_frozenset = frozenset(d.items())
    def __getitem__(self, item):
        return self.my_dict[item]
    def __hash__(self):
        return hash(self.my_frozenset)
    def __eq__(self, rhs):
        return self.my_frozenset == rhs.my_frozenset
    def __ne__(self, rhs):
       return not self == rhs

I have a node type that encapsulates identifer for purposes of hashing and equality:

class node:
    def __init__(self, id, value):
        # id is of type identifier
        self.id = id
        self.value = value
        # define other data here...
    def __hash__(self):
        return hash(self.id)
    def __eq__(self, rhs):
        if isinstance(rhs, node):
            return self.id == rhs.id
        ### for the case when rhs is an identifier; this allows dictionary
        ### node lookup of a key without wrapping it in a node
        return self.id == rhs
    def __ne__(self, rhs):
        return not self == rhs

I put some nodes into a dictionary:

d = {}
n1 = node(identifier({'name':'Bob'}), value=1)
n2 = node(identifier({'name':'Alex'}), value=2)
n3 = node(identifier({'name':'Alex', 'nationality':'Japanese'}), value=3)
d[n1] = 'Node 1'
d[n2] = 'Node 2'
d[n3] = 'Node 3'

Some time later, I have only an identifier:

my_id = identifier({'name':'Alex'})

Is there any way to efficiently lookup the node that has been stored with this identifier in this dictionary?

Please note that this is a little trickier than it sounds; I know that I can trivially use d[my_id] to retrieve the associated item 'Node 2', but I want to efficiently return a reference to n2.

I know that I could do it by looking at every element in d, but I've tried that and it's much too slow (the dictionary has thousands of items in it and I do this a fair number of times).

I know that internally dict is using the hash and eq operators for that identifier to store node n2 and its associated item, 'Node 2'. In fact, using my_id to lookup 'Node 2' actually needs to lookup n2 as an intermediate step, so this should definitely be possible.

I am using this to store data in a graph. The nodes have a lot of additional data (where I put value) that is not used in the hash. I didn't create the graph package I'm using (networkX), but I can see the dictionary that stores my nodes. I could also keep an extra dictionary around of identifiers to nodes, but this would be a pain (I'd need to wrap the graph class and rewrite all add node, remove node, add nodes from list, remove nodes from list, add edge, etc. type functions to keep that dictionary up to date).

This is quite the puzzle. Any help would be really appreciated!

Anorak answered 19/11, 2010 at 12:21 Comment(2)

The later versions of NetworkX actually do keep an internal dictionary of "node attributes" which you might be able to use. Try G.add_node(id,name='Bob',value=2) and then inspect G.node[id]. – Psychotic 19/11, 2010 at 14:48

+1 Good comment. I was using this to store the extra things I was using, but changed to a more object-oriented design because there were methods and members that all node types should have; it is not just value; there are a lot of things I store inside node. – Anorak 19/11, 2010 at 19:51

Instead of

d[n1] = 'Node 1'

use:

d[n1] = ('Node 1', n1)

Then you have access to n1 no matter how you found the value.

I don't believe there is a way with dictionaries to retrieve the original key k1 if all you have is a k2 equal to k1.

Bung answered 19/11, 2010 at 12:30 Comment(2)

Thanks for the idea. Unfortunately, I don't have access to how the dictionary is setup-- in networkX this dictionary actually stores the list of nodes that are adjacent to this node in the graph. But I know that the dictionary must be looking up n2 this internally to be able to lookup 'Node 2'! It would be so disappointing if I couldn't lookup n2 in this way. – Anorak 19/11, 2010 at 12:40

I am marking this as the answer, because it is a valid workaround. But essentially, what I was trying to do CANNOT BE DONE. As far as the dictionary is concerned, if the hash is the same == is True, the objects are equal, even if some underlying data is different. Essentially, retrieving the exact object in this case (where the dictionary finds them equal but it contains some different unhashed data) was poor design on my part. : ) – Anorak 27/3, 2012 at 21:42

Have two dictionaries. - Whenever you add a key/value to the primary dictionary, also add them to the reverse dictionary, but with the key/value swapped.

For example:

# When adding a value:
d[n2] = value;
# Must also add to the reverse dictionary:
rev[value] = d

# This means that:
value = d[n2]
# Will be able to efficiently find out the key used with:
key = rev[value]

Devito answered 19/11, 2010 at 12:37 Comment(6)

That goes without saying, aaronasterling. :) – Devito 19/11, 2010 at 12:41

That's a good idea, and if the graph library were my code, I'd build in something that would do basically that. But since it's not, I'd have to wrap it in a class that has to mimic every add_node, remove_node, add_list_of_nodes, remove_list_of_nodes, add_egde... That's doable, but it seems so unfortunate given that n2 must be looked up internally to retrieve 'Node 2'. – Anorak 19/11, 2010 at 12:47

You could always generate the reverse dictionary just when you need it - catch is that you'll need to generate it before you know that you'll need it a lot so that the optimisation will work in your favour. Don't forget to shoot the original developer for not letting you manipulate the dictionary access - perhaps send him some not-so-subtle notes about dependency injection? – Devito 19/11, 2010 at 12:51

@Arafangion, you'd be surprised about what go without saying :| – Frederickfredericka 19/11, 2010 at 13:0

You might be nicer to the original developer. You can actually manipulate directly the dict-of-dicts data structure that NetworkX uses and in whatever way you like. See G=networkx.Graph([(1,2)]) and then inspect G.adj. – Psychotic 19/11, 2010 at 15:1

@Arafangion: Did you mean rev[value] = n2 in the fourth line? – Raama 19/11, 2010 at 15:16

Here is a way to use a custom node object with NetworkX. If you store the object in the "node attribute" dictionary you can use it as a reverse dictionary to get the object back by referencing the id. It's a little awkward but it works.

import networkx as nx

class Node(object):

    def __init__(self,id,**attr):
        self.id=id
        self.properties={}
        self.properties.update(attr)

    def __hash__(self):
        return self.id

    def __eq__(self,other):
        return self.id==other.id

    def __repr__(self):
        return str(self.id)

    def __str__(self):
        return str(self.id)


G=nx.Graph()
# add two nodes
n1=Node(1,color='red') # the node id must be hashable
n2=Node(2,color='green')
G.add_node(n1,obj=n1)
G.add_node(n2,obj=n2)

# check what we have
print G.nodes() # 1,2
print n1,n1.properties['color'] # 1,red
print n1==n2   # False 
for n in G:
    print n.properties['color']
print Node(1) in G # True
# change color of node 1
n1.properties['color']='blue'
for n in G:
    print n.properties

# use "node attribute" data in NetworkX to retrieve object
n=G.node[Node(1)]['obj']
print type(n) # <class '__main__.Node'>
print n # 1
print n.id # 1
print n.properties # {'color': 'blue'}

You can of course define a function that makes this simpler:

   def get_node(G,n):
        return G.node[Node(1)]['obj']

    n=get_node(G,1)
    print n.properties

Psychotic answered 20/11, 2010 at 17:3 Comment(0)

The thing is, there is no guaranty that the key is effectively a Node. What if you do

d[my_id]=d[my_id]

Everything would still work perfectly except now, your key is an Identifier and not a Node. Allowing two classes to "equal" like this is really dangerous. If you really need to find a Node by it's name that should be done in the Node class or externaly, but shouldn't depend of the presence of not of the node in a hash.

If you can't modify that (because you can't modify the code), then I guess you are stuck to do the ineffecient way

Reeder answered 19/11, 2010 at 13:48 Comment(0)

using my_id to lookup 'Node 2' actually needs to lookup n2 as an intermediate step

This is not true. A dictionary is a hashtable: it maps the hash of an item to (a bucket of) entries. When you ask for d[my_id], Python first gets hash(my_id) and then looks that up in d. You are getting confused because you have that hash(n1) == hash(id1), which is a Very Bad Thing.

You are asking for a mapping between identifiers and nodes. If you want one of these, you will have to create one yourself.

Are the identifiers all matched with nodes upon creation, or do you construct them later? That is, are you really asking to be able to find the node with identifier identifier({'name':'Alex'}), or has that identifier already been created and added to a node? If the latter, you could do the following:

class Node:
    def __init__(self, id, value):
        id.parent = self
        ...

Aida answered 19/11, 2010 at 14:17 Comment(2)

1 You are incorrect. The hash must necessarily be equal, but an equal hash is not sufficient to lookup an item in a hash table. I've just remade the identifier class so that its hash function always returns 1, and the code in the question. This is called a collision. Collisions may lower performance, but hash tables can work when there are collisions. This is why you need hash as well as eq for dict to work properly. See en.wikipedia.org/wiki/Hash_table#Collision_resolution. – Anorak 19/11, 2010 at 19:38

Typo-- "and the code in the question" --> "and the code in the question can associate two keys with the same hash to different items." – Anorak 19/11, 2010 at 19:47

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags