How should I use Guava's Hashing#consistentHash?
Asked Answered
B

4

26

I'm looking into using a consistent hash algorithm in some java code I'm writing. The guava Hashing library has a consistentHash(HashCode, int) method, but the documentation is rather lacking. My initial hope was that I could just use consistentHash() for simple session affinity to efficiently distribute load across a set of backend servers.

Does anyone have a real-world example of how to use this method? In particular I'm concerned with managing the removal of a bucket from the target range.

For example:

@Test
public void testConsistentHash() {
    List<String> servers = Lists.newArrayList("server1", "server2", "server3", "server4", "server5");

    int bucket = Hashing.consistentHash(Hashing.md5().hashString("someId"), servers.size());
    System.out.println("First time routed to: " + servers.get(bucket));

    // one of the back end servers is removed from the (middle of the) pool
    servers.remove(1);

    bucket = Hashing.consistentHash(Hashing.md5().hashString("blah"), servers.size());
    System.out.println("Second time routed to: " + servers.get(bucket));
}

Leads to the output:

First time routed to: server4
Second time routed to: server5

What I want is for that identifier ("someId") to map to the same server after removal of a server earlier in the list. So in the sample above, after removal I guess I'd want bucket 0 to map to "server1", bucket 1 to map to "server3", bucket 2 to map to "server4" and bucket 3 to map to "server5".

Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition? I guess I had envisioned perhaps a more complicated Hashing API that would manage the remapping after adding and removal of particular buckets for me.

Note: I know the sample code is using a small input and bucket set. I tried this with 1000s of input across 100 buckets and the result is the same. Inputs that map to buckets 0-98 stay the same when I change the buckets to 99 and bucket 99 gets distributed across the remaining 99 buckets.

Brunn answered 7/9, 2012 at 13:55 Comment(3)
You note is right... but you can see that Guava knows nothing about your list except its size, can't you? So it can't do nothing else.Lift
I think this is the doc link you really want: docs.guava-libraries.googlecode.com/git-history/release13/… -- while it's true there's not a lot there, what else do you feel it should say?Helgahelge
@Kevin - The documentation is probably O.K. If anything a couple more words on the requirement on additions/removals being at the end. I posted my question because I was hoping my interpretation was wrong and there was some obvious way to manage bucket manipulation that I hadn't thought of. I got to the guava method after starting with the wikipedia entry and reading the java implementation referenced there, so I guess I was expecting to see something closer to what those two articles describe (more like Chris's description of what's coming in an answer below).Brunn
L
5

I'm afraid that no data structure can do it really right with the current consistentHash. As the method accepts only the list size, nothing but appending and removal from the end can be supported. Currently, the best solution consist probably of replacing

servers.remove(n)

by

server.set(n, servers.get(servers.size() - 1);
servers.remove(servers.size() - 1);

This way you sort of swap the failed and the very last server. This looks bad as it makes the assignments to the two swapped servers wrong. This problem is only half as bad as one of them have failed. But it makes sense, as after the following removal of the last list element, everything's fine, except for the assignments to the failed server and to the previously last server.

So twice as much assignments as needed change. Not optimal, but hopefully usable?

Lift answered 7/9, 2012 at 15:11 Comment(2)
Thanks for the quick answer. It seems like a reasonable work-around until a richer API is available. I'll likely either go with this or roll my own based on other language implementations.Brunn
@GamingBuck: I've just created a better (maybe even optimal) solution.Lift
E
4

I don't think there's a good way to do this at the moment. consistentHash in its current form is useful only in simple cases -- basically, where you have a knob to increase or decrease the number of servers... but always by adding and removing at the end.

There's some work underway to add a class like this:

public final class WeightedConsistentHash<B, I> {
  /** Initially, all buckets have weight zero. */
  public static <B, I> WeightedConsistentHash<B, I> create(
      Funnel<B> bucketFunnel, Funnel<I> inputFunnel);

  /**
   * Sets the weight of bucket "bucketId" to "weight".
   * Requires "weight" >= 0.0.
   */
  public void setBucketWeight(B bucketId, double weight);

  /**
   * Returns the bucket id that "input" maps to.
   * Requires that at least one bucket has a non-zero weight.
   */
  public B hash(I input);
}

Then you would write:

WeightedConsistentHash<String, String> serverChooser =
    WeightedConsistentHash.create(stringFunnel(), stringFunnel());
serverChooser.setBucketWeight("server1", 1);
serverChooser.setBucketWeight("server2", 1);
// etc.

System.out.println("First time routed to: " + serverChooser.hash("someId"));

// one of the back end servers is removed from the (middle of the) pool
serverChooser.setBucketWeight("server2", 0);

System.out.println("Second time routed to: " + serverChooser.hash("someId"));

And you should get the same server each time. Does that API look suitable?

Entophyte answered 7/9, 2012 at 14:34 Comment(2)
I must admit I haven't looked too closely at the Funnel API yet but at a first glance that seems workable. I'm looking forward to seeing its availability.Brunn
Any news on this? I can see a weightedConsistentHash reference present in v14, but absent from v16. Reference removed by Colin in commit 9acc76ba4.Lift
T
2

The guava API does not have any knowledge of your server list. It can only guarantee this:

int bucket1 = Hashing.consistentHash(Hashing.md5().hashString("server1"),N);    
int bucket2 = Hashing.consistentHash(Hashing.md5().hashString("server1"),N-1);

assertThat(bucket1,is(equalTo(bucket2))); iff bucket1==bucket2!=N-1 

you need to manange the bucket to your server list yourself

Turntable answered 9/9, 2012 at 3:29 Comment(0)
E
1

The answer proposed in the question is the correct one:

Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition?

Guava is hashing into a ring with ordinal numbers. The mapping from those ordinal numbers to the server ids has to be maintained externally:

  • Given N servers initially - one can choose some arbitrary mapping for each ordinal number 0..N-1 to server-ids A..K (0->A, 1->B, .., N-1->K). A reverse mapping from the server id to it's ordinal number is also required (A->0, B->1, ..).

  • On the deletion of a server - the ordinal number space shrinks by one. All the ordinal numbers starting with the one for the deleted server need to be remapped to the next server (shift by one).

    So for example, after the initial mapping, say server C (corresponding to ordinal number 2) got deleted. Now the new mappings become: (0->A, 1->B, 2->D, 3->E, .., N-2->K)

  • On the addition of a server L (say going from N to N+1 servers) - a new mapping can be added from N->L.

What we are doing here is mimicking how nodes would move in a ring as they are added and deleted. While the ordering of the nodes remains the same - their ordinal numbers (on which Guava operates) can change as nodes come and leave.

Encephalic answered 20/8, 2018 at 6:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.