I'm looking into using a consistent hash algorithm in some java code I'm writing. The guava Hashing library has a consistentHash(HashCode, int)
method, but the documentation is rather lacking. My initial hope was that I could just use consistentHash()
for simple session affinity to efficiently distribute load across a set of backend servers.
Does anyone have a real-world example of how to use this method? In particular I'm concerned with managing the removal of a bucket from the target range.
For example:
@Test
public void testConsistentHash() {
List<String> servers = Lists.newArrayList("server1", "server2", "server3", "server4", "server5");
int bucket = Hashing.consistentHash(Hashing.md5().hashString("someId"), servers.size());
System.out.println("First time routed to: " + servers.get(bucket));
// one of the back end servers is removed from the (middle of the) pool
servers.remove(1);
bucket = Hashing.consistentHash(Hashing.md5().hashString("blah"), servers.size());
System.out.println("Second time routed to: " + servers.get(bucket));
}
Leads to the output:
First time routed to: server4 Second time routed to: server5
What I want is for that identifier ("someId") to map to the same server after removal of a server earlier in the list. So in the sample above, after removal I guess I'd want bucket 0 to map to "server1", bucket 1 to map to "server3", bucket 2 to map to "server4" and bucket 3 to map to "server5".
Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition? I guess I had envisioned perhaps a more complicated Hashing API that would manage the remapping after adding and removal of particular buckets for me.
Note: I know the sample code is using a small input and bucket set. I tried this with 1000s of input across 100 buckets and the result is the same. Inputs that map to buckets 0-98 stay the same when I change the buckets
to 99 and bucket 99 gets distributed across the remaining 99 buckets.