I finally got around to creating a MurMur3 implementation, and i managed to translate the SMHasher test code. My implementation gives the same result as the SMHasher test. That means i can finally give some useful, and assumed to be correct, test vectors.
This is for Murmur3_x86_32 only
| Input | Seed | Expected |
|--------------|------------|------------|
| (no bytes) | 0 | 0 | with zero data and zero seed, everything becomes zero
| (no bytes) | 1 | 0x514E28B7 | ignores nearly all the math
| (no bytes) | 0xffffffff | 0x81F16F39 | make sure your seed uses unsigned 32-bit math
| FF FF FF FF | 0 | 0x76293B50 | make sure 4-byte chunks use unsigned math
| 21 43 65 87 | 0 | 0xF55B516B | Endian order. UInt32 should end up as 0x87654321
| 21 43 65 87 | 0x5082EDEE | 0x2362F9DE | Special seed value eliminates initial key with xor
| 21 43 65 | 0 | 0x7E4A8634 | Only three bytes. Should end up as 0x654321
| 21 43 | 0 | 0xA0F7B07A | Only two bytes. Should end up as 0x4321
| 21 | 0 | 0x72661CF4 | Only one byte. Should end up as 0x21
| 00 00 00 00 | 0 | 0x2362F9DE | Make sure compiler doesn't see zero and convert to null
| 00 00 00 | 0 | 0x85F0B427 |
| 00 00 | 0 | 0x30F4C306 |
| 00 | 0 | 0x514E28B7 |
For those of you who will be porting to a language that doesn't have actual arrays, i also have some string based tests. For these tests:
- all strings are assumed to be UTF-8 encoded
- and do not include any null terminator
I'll leave these in code form:
TestString("", 0, 0); //empty string with zero seed should give zero
TestString("", 1, 0x514E28B7);
TestString("", 0xffffffff, 0x81F16F39); //make sure seed value is handled unsigned
TestString("\0\0\0\0", 0, 0x2362F9DE); //make sure we handle embedded nulls
TestString("aaaa", 0x9747b28c, 0x5A97808A); //one full chunk
TestString("aaa", 0x9747b28c, 0x283E0130); //three characters
TestString("aa", 0x9747b28c, 0x5D211726); //two characters
TestString("a", 0x9747b28c, 0x7FA09EA6); //one character
//Endian order within the chunks
TestString("abcd", 0x9747b28c, 0xF0478627); //one full chunk
TestString("abc", 0x9747b28c, 0xC84A62DD);
TestString("ab", 0x9747b28c, 0x74875592);
TestString("a", 0x9747b28c, 0x7FA09EA6);
TestString("Hello, world!", 0x9747b28c, 0x24884CBA);
//Make sure you handle UTF-8 high characters. A bcrypt implementation messed this up
TestString("ππππππππ", 0x9747b28c, 0xD58063C1); //U+03C0: Greek Small Letter Pi
//String of 256 characters.
//Make sure you don't store string lengths in a char, and overflow at 255 bytes (as OpenBSD's canonical BCrypt implementation did)
TestString(StringOfChar("a", 256), 0x9747b28c, 0x37405BDC);
I'll post just two of the 11 SHA-2 test vectors that i converted to Murmur3.
TestString("abc", 0, 0xB3DD93FA);
TestString("abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq", 0, 0xEE925B90);
And finally, the big one:
- Key:
"The quick brown fox jumps over the lazy dog"
- Seed: 0x9747b28c
- Hash: 0x2FA826CD
If anyone else can confirm any/all of these vectors from their implementations.
And, again, these test vectors come from an implementation that passes the SMHasher 256 iteration loop test from KeySetTest.cpp - VerificationTest(...)
.
These tests came from my implementation in Delphi. I also created an implementation in Lua (which isn't big on supporting arrays).
Note: Any code released into public domain. No attribution required.
x86 32bit Tests
against my own implementation. It would be useful if the javascript test also checked that Javascript strings are utf-8 encoded (e.g. "ππππππππ"), as well as that it supports embedded nulls (e.g. "\0\0\0\0") – Tarriance