MurmurHash - what is it?
Asked Answered
D

2

101

I've been trying to get a high level understanding of what MurmurHash does.

I've read a basic description but have yet to find a good explanation of when to use it and why. I know its very fast but want to know a bit more.

I asked a related question about how I could fit a UUID into a Redis bitset, and someone suggested using MurmurHash. It works but I'd like to understand the risks/benefits.

Dashboard answered 10/8, 2012 at 10:15 Comment(0)
R
138

Murmur is a family of good general purpose hashing functions, suitable for non-cryptographic usage. As stated by Austin Appleby, MurmurHash provides the following benefits:

  • simple (in term of number of generated assembly instructions).
  • good distribution (passing chi-squared tests for practically all keysets & bucket sizes.
  • good avalanche behavior (max bias of 0.5%).
  • good collision resistance (passes Bob Jenkin's frog.c torture-test. No collisions possible for 4-byte keys, no small (1- to 7-bit) differentials).
  • great performance on Intel/AMD hardware, good tradeoff between hash quality and CPU consumption.

You can certainly use it to hash UUIDs (like any other advanced hashing functions: CityHash, Jenkins, Paul Hsieh's, etc ...). Now, a Redis bitset is limited to 4 GB bits (512 MB). So you need to reduce 128 bits of data (UUID) to 32 bits (hashed value). Whatever the quality of the hashing function, there will be collisions.

Using an engineered hash function like Murmur will maximize the quality of the distribution, and minimize the number of collisions, but it offers no other guarantee.

Here are some links comparing the quality of general purpose hash functions:

http://www.azillionmonkeys.com/qed/hash.html

http://www.strchr.com/hash_functions

http://blog.aggregateknowledge.com/2011/12/05/choosing-a-good-hash-function-part-1/

http://blog.aggregateknowledge.com/2011/12/29/choosing-a-good-hash-function-part-2/

http://blog.aggregateknowledge.com/2012/02/02/choosing-a-good-hash-function-part-3/

Raul answered 10/8, 2012 at 12:25 Comment(5)
I've tried using MurmurHash, for hashing my UUIDs, but the hash function is returning negative ids for some UUIDs. Does anyone know how to get around this?Dashboard
The output of the C implementation of MurmurHash is an unsigned integer ... it cannot be negative. Perhaps you are using Java? In Java, to cast a signed int to an unsigned value in the bottom 32 bits of a long, you need to AND with 0xffffffffL (see #9579139)Raul
Do you know of any analysis of this hash? Is it universal? Is it 2-wise-independent etc?Bravin
@DidierSpezia Why Math.abs() is not good enough? The result would also be in good distribution, given that the original ids, whether negative or not, are already evenly distributed.Tocsin
Math.abs() may indeed be good enough ... but you lose 1 bit, so the probably of collision is multiplied by 2 (i.e. your hash is on 31 bits instead of 32).Raul
L
-4

MurmurHash can be return negtive value, original value bit AND against 0x7fffffff。that is value & 0x7fffffff .When the input is positive, the original value is returned. When the input number is negative, the returned positive value is the original value bit AND against 0x7fffffff which is not its absolutely value. note:MurmurHash's return value can not be fix length.

Largess answered 21/5, 2020 at 3:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.