Uniform distribution of hashcode()

Asked 15/4, 2016 at 8:54 Answered 17/5, 2016 at 13:17

I define my class as:

final class Key<T extends Comparable<T>> {
    private final T q;
    private final T o;
    public Key(T q1, T o1) {
        q = q1;
        o = o1;
    }

    @Override
    public boolean equals(Object obj) {
        if(obj != null && obj instanceof Key) {
            Key<T> s = (Key<T>)obj;
            return q.equals(s.q) && o.equals(s.o);
        }
        return false;
    }

    @Override
    public int hashCode() {
        return Objects.hash(q,o);
    }
}

I also define an array to contain object key . For example:

Object arr[] = new Object[100];
Key<String> k = new Key<>("a","b");
int h = k.hashcode();
...
arr[h+i % h] = k; //i from 1 to 10 for example

The problem is that hashcode() can return a negative value so

arr[h+i % h] = k;

can return an error out of array index. That's why I changed my code as(based on my searching for avoiding hashcode() return negative value):

@Override
        public int hashCode() {
            return (Objects.hash(q,o)&0x7FFFFFFF);
        }

So if I do this way, does a uniform distribution of the hashcode() be changed or not? I mean the probability to have a same value from two different objects will be increased or not?

Scrutiny answered 15/4, 2016 at 8:54 Comment(2)

Yes, my error. I also edited it. Thanks – Scrutiny 15/4, 2016 at 9:10

you can have a look at murmur hash which has very good distribution. and cant be negetive value as well – Steger 17/5, 2016 at 13:22

The Object.hash() has a very simple hashCode which is not particularly uniform for simple examples. e.g. Objects.hash("B", "B") and Objects.hash("A", "a") have the same hashCode. (And BTW simple enough that I could work that out in my head)

Also every between Objects.hashCode("a", "a") and Objects.hashCode("z", "z") is between 4065 and 4865 which doesn't look particularly uniform, esp for the higher bits.

Within this context, I think you can say you are not making matters any worse.

Jayme answered 15/4, 2016 at 9:15 Comment(2)

If so. which way is better to avoid negative value of hashcode() 1. as above 2. avoid negative value at this step: arr[h+i % h] = k. I mean I use Math.abs(h+i % h) to convert to positive value. – Scrutiny 15/4, 2016 at 12:29

@Scrutiny You want to avoid Math.abs here as this can return a negative number o_O The (hash & 0x7FFF_FFFF) % buckets is better. Note: Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE which you are unlikely to find out for a long time. – Jayme 15/4, 2016 at 12:36

Please have look in to Murmurhash and MurmurHash - what is it? Fortunately Google guava has ready made implementation for this.

Guava way is like below example We have following classes

import com.google.common.hash.HashCode; import com.google.common.hash.HashFunction; import com.google.common.hash.Hashing;

using the above classes I have my method to generate hashcode like below

/**
     * getMurmur128Hash.
     * 
     * @param content
     * @return HashCode
     */
    public static HashCode getMurmur128Hash(String content) {
        final HashFunction hf = Hashing.murmur3_128();
        final HashCode hc = hf.newHasher().putString(content, Charsets.UTF_8).hash();
        return hc;
    }
    /**
     * getAbsMurmur128HashAsLongVal.
     * 
     * @param content
     * @return Long Absolute value of Long for the HashCode.
     */
    public static Long getAbsMurmur128HashAsLongVal(String content) {
        return Math.abs(getMurmur128Hash(content).asLong());
    }

Steger answered 17/5, 2016 at 13:17 Comment(0)

Recommended topics

Hot tags