Order Independent Hash in Java
Asked Answered
L

3

8

I'd like to calculate a hash of a set of strings in Java. Yes I can sort the strings and calculate the MD5 hash iterative using digest.update. But I'd prefer to omit the sort and use something like combineUnordered https://github.com/google/guava/wiki/HashingExplained There is a lot of similar question asking the same such as Order-independant Hash Algorithm but non of them provides a simple example showing how to calculate iterative an order independent hash in Java.

Lusatian answered 12/11, 2017 at 20:54 Comment(3)
Why do youneed to overwrite the hash algorithm of the set?Respecting
@SzigyártóMihály no need to overwrite, I'm looking for an simple example. I know MD5, which is order sensitive and MurmurHash which shouldn't, but I could not find an example of using it.Lusatian
The set uses the sum of items' hashes, which is not depending on the order.Respecting
D
6

Just XOR each hash and the order wont matter, plus the hash size will be fixed rather than grow with the size of the collection.

Hashcode using built in java string hashcode:

int hashcode = strings.stream()
        .mapToInt(Object::hashCode)
        .reduce(0, (left, right) -> left ^ right);

Hashcode using guava and MD5 like the question asked:

Optional<byte[]> hash = strings.stream()
        .map(s -> Hashing.md5().hashString(s, Charset.defaultCharset()))
        .map(HashCode::asBytes)
        .reduce((left, right) -> xor(left, right));


static byte[] xor(byte[] left, byte[] right) {
    if(left.length != right.length) {
        throw new IllegalArgumentException();
    }
    byte[] result = new byte[left.length];
    for(int i=0; i < result.length; i++) {
        result[i] = (byte) (left[i] ^ right[i]);
    }
    return result;
}
Devotional answered 12/11, 2017 at 22:7 Comment(2)
This is the preferred method. XORing the hashes is better than adding them.Chowchow
Yes, this is true for sets, but for bags that can contains duplicates XOR is not applicable as the duplicates will reset it to zero @LukeJoshuaPark, so some SUM (wrapped) must be used.Lusatian
R
1

You can calculate the MD5 hash of each string individually, and then, add them all to get a single hash. That will be order independent. Because addition operation is commutative.

Here is an example (assuming we have a method md5Hex(String str) that calculates md5 hash for a given string and returns the results in hexadecimal format):

String[] strings = {"str1", "str2", "str3", ...};

BigInteger hashSum = BigInteger.ZERO;
for(String s : strings) {
    String hexHash = md5Hex(s);
    hashSum = hashSum.add(new BigInteger(hexHash, 16));
}

String finalHash = hashSum.toString(16);
Reprehension answered 12/11, 2017 at 21:14 Comment(2)
Yes, thank you; the background of the question (though downwoted) was if I should go this way or use some alternative hash algorithm that can combine unsorted and possible receive less collisions.Lusatian
@MarmiteBomber, added an example.Reprehension
S
0

Here's an example using Guava to calculate order-independent hash of a set of strings:

import java.util.Set;

import com.google.common.base.Charsets;
import com.google.common.hash.HashCode;
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hashing;

...

public String hash(final Set<String> strings) {
    final HashFunction function = Hashing.murmur3_128();

    // Hashing.combineUnordered will throw an exception if input is empty.
    if (strings.isEmpty()) {
        return function.newHasher()
            .hash()
            .toString();
    }

    final List<HashCode> stringsHashes = strings.stream()
            .map(string -> function.newHasher()
                    .putString(string, Charsets.UTF_8)
                    .hash())
            .toList();

    return Hashing.combineUnordered(stringsHashes).toString();
}
Shf answered 24/11, 2023 at 5:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.