Following this question, we know that two different dictionaries, dict_1
and dict_2
for example, use the exact same hash function.
Is there any way to alter the hash function used by the dictionary?Negative answers also accepted!
Following this question, we know that two different dictionaries, dict_1
and dict_2
for example, use the exact same hash function.
Is there any way to alter the hash function used by the dictionary?Negative answers also accepted!
You can't change the hash-function - the dict will call hash
on the keys it's supposed to insert, and that's that.
However, you can wrap the keys to provide different __hash__
and __eq__
-Methods.
class MyHash(object):
def __init__(self, v):
self._v = v
def __hash__(self):
return hash(self._v) * -1
def __eq__(self, other):
return self._v == other._v
If this actually helps anything with your original problem/question I doubt though, it seems rather a custom array/list-based data-structure might be the answer. Or not.
__eq__
."? –
Trivet my_dict = {} ; my_dict[MyHash("foo")] = 4 ; print(my_dict[MyHash("foo")])
I get a KeyError, I think you still need to implement __eq__
for this to work correctly. –
Trivet __hash__
, but that doesn't change that I have to forward to the original __eq__
. Thanks for pointing that out. –
Okechuku Here is a "hash table" on top of a list of lists, where each hash table object is associated with a particular hashing function.
class HashTable(object):
def __init__(self, hash_function, size=256):
self.hash_function = hash_function
self.buckets = [list() for i in range(size)]
self.size = size
def __getitem__(self, key):
hash_value = self.hash_function(key) % self.size
bucket = self.buckets[hash_value]
for stored_key, stored_value in bucket:
if stored_key == key:
return stored_value
raise KeyError(key)
def __setitem__(self, key, value):
hash_value = self.hash_function(key) % self.size
bucket = self.buckets[hash_value]
# if the key is in the bucket, replace the value and return
for i, (stored_key, stored_value) in enumerate(bucket):
if stored_key == key:
bucket[i] = (key, value)
return
# otherwise append the key value pair to the bucket
bucket.append((key, value))
The rest of your application can still see the underlying list of buckets. Your application might require additional metadata to be associated with each bucket, but that would be as simple as defining a new class for the elements of the bucket list instead of a plain list.
i
in the __setitem__
for loop, which it took me 40 seconds to notice, or that I didn't comment a single line of code. Is the class definition self evident, or should I document more? –
Gilford I think what you want is a way to create buckets. Based on this I recommend collections.defaultdict
with a set
initializer as the "bucket" (depends on what you're using it for though).
Here is a sample:
#!/usr/bin/env python
from collections import defaultdict
from itertools import combinations
d = defaultdict(set)
strs = ["str", "abc", "rts"]
for s in strs:
d[hash(s)].add(s)
d[hash(''.join(reversed(s)))].add(s)
for combination in combinations(d.values(), r=2):
matches = combination[0] & combination[1]
if len(matches) > 1:
print matches
# output: set(['str', 'rts'])
Two strings ending up in the same buckets here are very likely the same. I've created a hash collision by using the reverse function and using a string and it's reverse as values.
Note that the set will use full comparison but should do it very fast.
Don't hash too many values without draining the sets.
© 2022 - 2024 — McMap. All rights reserved.
id
wouldn't be sufficient? – Trivet