Alter the hash function of a dictionary

Asked 8/5, 2016 at 19:44 Answered 8/5, 2016 at 20:12

Solved python python-2.7 dictionary hash locality-sensitive-hash

Following this question, we know that two different dictionaries, dict_1 and dict_2 for example, use the exact same hash function.

Is there any way to alter the hash function used by the dictionary?_{Negative answers also accepted!}

Esque answered 8/5, 2016 at 19:44 Comment(6)

Hey, nice formatting! I didn't knew that you can use a tag or sub-placed text in an post... – Omit 8/5, 2016 at 19:47

Same before someone taught me @linusg! Use [...tag: ...Python] without the dots and enclose text in sub and sup tags to get the result. Click edit on my question to see exactly how is this done! – Esque 8/5, 2016 at 19:48

I narrowed down the question @ReutSharabani. – Esque 8/5, 2016 at 20:1

May I ask: Why do you want to alter the hash function? – Trivet 8/5, 2016 at 20:4

@TadhgMcDonald-Jensen sure. There was an attempt to explain that before my edit. Now I would like not to edit again and go back. I want that for this reason: #37090471 – Esque 8/5, 2016 at 20:6

I'm guessing that storing the data based on it's id wouldn't be sufficient? – Trivet 8/5, 2016 at 20:35

You can't change the hash-function - the dict will call hash on the keys it's supposed to insert, and that's that.

However, you can wrap the keys to provide different __hash__ and __eq__-Methods.

class MyHash(object):
     def __init__(self, v):
         self._v = v

     def __hash__(self):
         return hash(self._v) * -1

     def __eq__(self, other):
         return self._v == other._v

If this actually helps anything with your original problem/question I doubt though, it seems rather a custom array/list-based data-structure might be the answer. Or not.

Okechuku answered 8/5, 2016 at 20:5 Comment(3)

what do you mean by "doesn't require me to implement __eq__."? – Trivet 8/5, 2016 at 20:25

when I do my_dict = {} ; my_dict[MyHash("foo")] = 4 ; print(my_dict[MyHash("foo")]) I get a KeyError, I think you still need to implement __eq__ for this to work correctly. – Trivet 8/5, 2016 at 20:27

@TadhgMcDonald-Jensen my comment was actually referring to the equality of the _v being still a match for the simple modification of __hash__, but that doesn't change that I have to forward to the original __eq__. Thanks for pointing that out. – Okechuku 8/5, 2016 at 20:31

Here is a "hash table" on top of a list of lists, where each hash table object is associated with a particular hashing function.

class HashTable(object):
    def __init__(self, hash_function, size=256):
        self.hash_function = hash_function
        self.buckets = [list() for i in range(size)]
        self.size = size

    def __getitem__(self, key):
        hash_value = self.hash_function(key) % self.size
        bucket = self.buckets[hash_value]
        for stored_key, stored_value in bucket:
            if stored_key == key:
                return stored_value
        raise KeyError(key)


    def __setitem__(self, key, value):
        hash_value = self.hash_function(key) % self.size
        bucket = self.buckets[hash_value]
        # if the key is in the bucket, replace the value and return
        for i, (stored_key, stored_value) in enumerate(bucket):
            if stored_key == key:
                 bucket[i] = (key, value)
                 return
        # otherwise append the key value pair to the bucket
        bucket.append((key, value))

The rest of your application can still see the underlying list of buckets. Your application might require additional metadata to be associated with each bucket, but that would be as simple as defining a new class for the elements of the bucket list instead of a plain list.

Gilford answered 8/5, 2016 at 20:10 Comment(3)

Why this got a downvote? I am going to upvote since there is no justification here. :) – Esque 8/5, 2016 at 20:11

It might have been because I had initially forgotten a to increment i in the __setitem__ for loop, which it took me 40 seconds to notice, or that I didn't comment a single line of code. Is the class definition self evident, or should I document more? – Gilford 8/5, 2016 at 20:24

Your answer, your choice. :) The length is not a must for a good post. For example my best answer was quite small at first, but then I felt like expanding it, worth it! My best question is short (relatively). – Esque 9/5, 2016 at 7:3

I think what you want is a way to create buckets. Based on this I recommend collections.defaultdict with a set initializer as the "bucket" (depends on what you're using it for though).

Here is a sample:

#!/usr/bin/env python

from collections import defaultdict
from itertools import combinations

d = defaultdict(set)

strs = ["str", "abc", "rts"]
for s in strs:
    d[hash(s)].add(s)
    d[hash(''.join(reversed(s)))].add(s)

for combination in combinations(d.values(), r=2):
    matches = combination[0] & combination[1]
    if len(matches) > 1:
        print matches

# output: set(['str', 'rts'])

Two strings ending up in the same buckets here are very likely the same. I've created a hash collision by using the reverse function and using a string and it's reverse as values.

Note that the set will use full comparison but should do it very fast.

Don't hash too many values without draining the sets.

Corlisscorly answered 8/5, 2016 at 20:12 Comment(1)

Hey you undeleted your answer and that's not a notification. Thanks for an alternative solution! – Esque 9/5, 2016 at 6:55

Recommended topics

Hot tags