Towards understanding dictionaries
Asked Answered
S

2

3

I am required to use multiple hashtables, so in , I would normally use an std::unordered_map. So far I can understand that I can use a dictionary in Python, so let's assume the following code:

my_dict_1 = {}
my_dict_1['foo'] = 1
my_dict_2 = {}
my_dict_2['foo'] = 2

Will the two dictionaries be using different hash functions (notice that the key is the same), thus they can be considered two different hash tables (I mean that they will actually store the data differently)?


EDIT:

Yes the dictionaries are two different objects of course, but the question is about the technique that they will use to store the data!

Sande answered 8/5, 2016 at 13:31 Comment(6)
They clearly are not the same hash table.Willner
@JohnColeman thanks for the upvote. Can you please point to some documentation saying so? I am very new in python and it's not that clear for me. :/ What you are saying is that every dictionary has its own hashfunction, right? I wonder if they are independent too, but this is for later.. :)Sande
What do you mean by "different hash functions"? A hash function is a piece of code (and it is implementation-defined). They are definitely different dictionaries (hash tables), though (since they are by construction two different objects). It has not really anything to do with dictionaries specifically, btw.Overbearing
I am mean the hash function every dictionary will use is different from the other. If use a dictionary as a hashtable and then another one, I would like to know that they will store the data differently, because of a different hush function. Hope it is clearer now @importthis.Sande
I'm really curious why you ask about this - why would the hash function be different, and how would it be different? The point of the hash is to quickly test if two keys are the same, what would be a benefit of having a different hash for each dictionary?Dismissive
@Dismissive because I was hoping to use them for LSH: #37090471Sande
W
3

A simple Python shell experiment to show that different dictionaries can use the same key:

>>> my_dict_1 = {'foo':1}
>>> my_dict_2 = {'foo':2}
>>> my_dict_1,my_dict_2
({'foo': 1}, {'foo': 2})

This is a good discussion of how it is implemented. The key point is that each dictionary is allocated its own portion of memory (which can of course grow as needed). The exact same hash function is used for both dictionaries, but is being used to probe different areas in memory.

Willner answered 8/5, 2016 at 13:44 Comment(2)
The exact same hash function? Damn, then that means that if we feed both dictionaries with very same data, they will store them in the very same way (in different memory locations, but I do not care about that), thank you John!!Sande
@gsamaras: I just browsed through the source code, and it certainly seems the hash is only calculated through the key, and not, for example, salted with the dict name or something like that. Also, "Hashes for integers are the same as the integer (hash(100) == 100), but that is an implementation detail and could change without warning." (mail.python.org/pipermail/tutor/2003-August/024555.html)Yoder
C
1

id(...)

id(object) -> integer

Return the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (Hint: it's the object's memory address.)

Above is the id doc string, it says that object's identity is object's memory address, so we can use the id function to see the variable's memory address:

In your program, I can see the address like this:

def main():
    my_dict_1 = {}
    my_dict_1['foo'] = 1
    my_dict_2 = {}
    my_dict_2['foo'] = 2
    print(hex(id(my_dict_1['foo'])))
    print(hex(id(my_dict_2['foo'])))

if __name__ == '__main__':
    main()

This program output this:

0x9a33e0
0x9a3400

We can see that my_dict_1['foo'] and my_dict_2['foo'] have different memory address.

So I think the two dicts should use the same hash function, but the variable's memory address should be the sum of the hash value and a base value. In this way, the two variable will be stored in the different memory areas.

Compass answered 8/5, 2016 at 14:18 Comment(2)
There is an answer mentioning pretty much the same thing. It is deleted now, thus you can't see it. Thanks though, +1 for an analytic answer, but my edit is really clear, I think.Sande
These are just the addresses of the integers 1 and 2, independent of whether they're referred to from one or several dicts or not. (Try it: put print(hex(id(1))) into the same program.)Irra

© 2022 - 2024 — McMap. All rights reserved.