Murmurhash 2 results on Python and Haskell
Asked Answered
D

2

8

Haskell and Python don't seem to agree on Murmurhash2 results. Python, Java, and PHP returned the same results but Haskell don't. Am I doing something wrong regarding Murmurhash2 on Haskell?

Here is my code for Haskell Murmurhash2:

import Data.Digest.Murmur32

    main = do
    print $ asWord32 $ hash32WithSeed 1 "woohoo"

And here is the code written in Python:

import murmur

if __name__ == "__main__":
    print murmur.string_hash("woohoo", 1)

Python returned 3650852671 while Haskell returned 3966683799

Doodle answered 3/5, 2013 at 7:18 Comment(10)
Well, my Haskell gives me 2399372562.Scanderbeg
What seed did you use for that?Doodle
I have used your code without changes, the seed is 1.Scanderbeg
Wtf? Anyways, did you try comparing it to your Python or any other language? I really can't figure out why Haskell is behaving that way.Doodle
My Python is identical to your Python.Scanderbeg
So I guess the Haskell library is the culprit. Thanks for the responses :)Doodle
If this is the murmur-hash package, I suspect the Hashable32 instance used for Strings doesn't actually work the same way other languages do. It looks awfully general.. But there are several other murmurhash libs on hackage. Maybe one of them works the way you'd expect.Rhynchocephalian
That's the exact package I am currently using. Can you suggest a decent murmurhash 2 library for Haskell that you had used before?Doodle
I haven't used any. And a second look at hackage suggests the rest are murmurhash3, which doesn't help you at all.Rhynchocephalian
badwolf: can you try hashing some ints and see if they match?Ovine
J
3

The murmur-hash package (I am its author) does not promise to compute the same hashes as other languages. If you rely on hashes to be compatible with other software that computes hashes I suggest you create newtype wrappers that compute hashes the way you want them. For text, in particular, you need to at least specify the encoding. In your case you could convert the text to an ASCII string using Data.ByteString.Char8.pack, but that still doesn't give you the same hash since the ByteString instance is more of a placeholder.

BTW, I'm not actively improving that package because MurmurHash2 has been superseded by MurmurHash3, but I keep accepting patches.

Jacinthe answered 4/5, 2013 at 17:51 Comment(1)
I never expected you (the author) to respond here. Thanks anyway for the response and willingness to accept improvements. :) Might as well, I'll avoid hashing on Haskell completely when communicating hashes to programs written in other languages BTW, comments above are offtopic and not helpful, will be flagging them after a while.Doodle
J
5

From a quick inspection of the sources, it looks like the algorithm operates on 32 bits at a time. The Python version gets these by simply grabbing 4 bytes at a time from the input string, while the Haskell version converts each character to a single 32-bit Unicode index.

It's therefore not surprising that they yield different results.

Jonniejonny answered 3/5, 2013 at 16:7 Comment(1)
I can't test it at the moment, but unless there is some other difference I missed, hashing "a\0\0\0" in Python (on a little-endian machine) should give the same result as hashing "a" in Haskell, for example.Jonniejonny
J
3

The murmur-hash package (I am its author) does not promise to compute the same hashes as other languages. If you rely on hashes to be compatible with other software that computes hashes I suggest you create newtype wrappers that compute hashes the way you want them. For text, in particular, you need to at least specify the encoding. In your case you could convert the text to an ASCII string using Data.ByteString.Char8.pack, but that still doesn't give you the same hash since the ByteString instance is more of a placeholder.

BTW, I'm not actively improving that package because MurmurHash2 has been superseded by MurmurHash3, but I keep accepting patches.

Jacinthe answered 4/5, 2013 at 17:51 Comment(1)
I never expected you (the author) to respond here. Thanks anyway for the response and willingness to accept improvements. :) Might as well, I'll avoid hashing on Haskell completely when communicating hashes to programs written in other languages BTW, comments above are offtopic and not helpful, will be flagging them after a while.Doodle

© 2022 - 2024 — McMap. All rights reserved.