Migrate from MurmurHash to MurmurHash3
Asked Answered
B

2

7

In Scala 2.10, MurmurHash for some reason is deprecated, saying I should use MurmurHash3 now. But the API is different, and there is no useful scaladocs for MurmurHash3 -> fail.

For instance, current code:

trait Foo {
  type Bar
  def id: Int
  def path: Bar

  override def hashCode = {
    import util.MurmurHash._
    var h = startHash(2)
    val c = startMagicA
    val k = startMagicB
    h = extendHash(h, id, c, k)
    h = extendHash(h, path.##, nextMagicA(c), nextMagicB(k))
    finalizeHash(h)
  }
}

How would I do this using MurmurHash3 instead? This needs to be a fast operation, preferably without allocations, so I do not want to construct a Product, Seq, Array[Byte] or whathever MurmurHash3 seems to be offering me.

Boart answered 10/2, 2013 at 12:1 Comment(0)
J
7

The MurmurHash3 algorithm was changed, confusingly, from an algorithm that mixed in its own salt, essentially (c and k), to one that just does more bit-mixing. The basic operation is now mix, which you should fold over all your values, after which you should finalizeHash (the Int argument for length is for convenience also, to help with distinguishing collections of different length). If you want to replace your last mix by mixLast, it's a little faster and removes redundancy with finalizeHash. If it takes you too long to detect what the last mix is, just mix.

Typically for a collection you'll want to mix in an extra value to indicate what type of collection it is.

So minimally you'd have

override def hashCode = finalizeHash(mixLast(id, path.##), 0)

and "typically" you'd

// Pick any string or number that suits you, put in companion object
val fooSeed = MurmurHash3.stringHash("classOf[Foo]")   

// I guess "id" plus "path" is two things?
override def hashCode = finalizeHash(mixLast( mix(fooSeed,id), path.## ), 2)

Note that the length field is NOT there to give a high-quality hash that mixes in that number. All mixing of important hash values should be done with mix.

Jaggers answered 10/2, 2013 at 12:56 Comment(2)
Thanks, Rex. The seed generation makes sense. For so Product this would be probably stringHash(productPrefix).Boart
@Boart - That would be a reasonable value. It really boils down to what might be around with the same contents but a different identity; if no such thing exists (or you don't mind colliding with it/them) then you can put anything or nothing there. "Only collide with things of the same name and contents (that are also a Product and have chosen a seed using the same obvious method)" is a very reasonable policy.Jaggers
P
4

Looking at the source code of MurmurHash3 suggests something like this:

override def hashCode = {
  import util.hashing.MurmurHash3._

  val h = symmetricSeed // I'm not sure which seed to use here
  val h1 = mix(h, id)
  val h2 = mixLast(h1, path ##)
  finalizeHash(h2, 2)
}

or, in (almost) one line:

import util.hashing.MurmurHash3._
override def hashCode = finalizeHash(mix(mix(symmetricSeed, id), path ##), 2)
Presignify answered 10/2, 2013 at 12:20 Comment(1)
Looks good. I guess I'll use productSeed as my original code was based on the product hash generator for arity 2.Boart

© 2022 - 2024 — McMap. All rights reserved.