Hash function for collection of items that disregards ordering
Asked Answered
E

3

5

I am using a the hash() function to get the hash value of my object which contains two integers and two Strings. Moreover, I have a dictionary where I store these objects; the process is that I check if the object exists with the hash value, if yes I update if not I insert the new one.

The thing is that when creating the objects, I do not know the order of the object variables and I want to treat the objects as same no matter the order of these variables.

Is there an alternative function to the hash() function that does not consider the order of the variables?

#Consequently what I want is:
hash((int1,str1,int2,str2)) == hash((int2,str2,int1,str1)) 
Eighteenmo answered 27/2, 2017 at 17:13 Comment(5)
Would you post a small section of your code to make clearer what you are doing? My first thought is to sort the two integers, but I cannot tell whether this would work in your implementation.Mckinnon
You can always sort the input: hash(tuple(sorted((1, 2)))).Elda
@TomLynch I showed a toy example, in my code I have also strings so it is difficult to sort.Eighteenmo
Maybe you should modify the question because now it strongly suggest you expect to have there only integers.Scirrhus
@Scirrhus just updated.Eighteenmo
S
13

You could use a frozenset instead of a tuple:

>>> hash(frozenset([1, 2, 'a', 'b']))
1190978740469805404
>>>
>>> hash(frozenset([1, 'a', 2, 'b']))
1190978740469805404
>>>
>>> hash(frozenset(['a', 2, 'b', 1]))
1190978740469805404

However, the removal of duplicates from the iterable presents a subtle problem:

>>> hash(frozenset([1,2,1])) == hash(frozenset([1,2,2]))
True

You can fix this by creating a counter from the iterable using collections.Counter, and calling frozenset on the counter's items, thus preserving the count of each item from the original iterable:

>>> from collections import Counter
>>>
>>> hash(frozenset(Counter([1,2,1]).items())) 
-307001354391131208
>>> hash(frozenset(Counter([1,1,2]).items()))
-307001354391131208
>>> 
>>> hash(frozenset(Counter([1,2,1]).items())) == hash(frozenset(Counter([1,2,2]).items()))
False
Stinky answered 27/2, 2017 at 17:17 Comment(5)
it works because there's the same number of elements in each set, so even if there are duplicates it doesn't matter.Moth
@Jean-FrançoisFabre Thanks for the observation. It revealed a bug :)Stinky
Not a bug for the OP problem though.Moth
FrozenMultiset() would also solve this duplicates issue pypi.python.org/pypi/multisetSlowdown
@Slowdown Good one! A collections.Counter object is a multiset. I guess their implementation of the FrozenMultiset has some sort of counter underneath. I think you can make the link and some snippet into an answer.Stinky
N
3

Usually for things like this it helps immeasurably if you post some sample code, but I'll assume you've got something like this:

class Foo():
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __hash__(self):
        return hash((self.x, self.y))

You're taking a hash of a tuple there, which does care about order. If you want your hash to not care about the order of the ints, then just use a frozenset:

    def __hash__(self):
        return hash(frozenset([self.x, self.y]))
Neurotic answered 27/2, 2017 at 17:18 Comment(3)
@Jean-FrançoisFabre Whoops! I meant frozen sets.Neurotic
Considering the comments about also including strings, I think this is the right answer! Define a class, and use that to specify whatever hash method you want.Orchitis
@Neurotic Just upvoted thank you for the effort as well. Apologies for the confusion of not using Strings.Eighteenmo
D
1

If the range of the values is not too great you could add them together, that way the order can be disregarded, however it does increase the possibility for 2 hashes to have the same value:

def hash_list(items):
    value = 0
    for item in items:
        value+= hash(item)
    return value

hash_list(['a', 'b', 'c'])
>>> 8409777985338339540
hash_list(['b', 'a', 'c'])
>>> 8409777985338339540
Deformed answered 1/4, 2021 at 11:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.