Twitter - twemproxy - memcached - Retry not working as expected
Asked Answered
A

2

26

Simple setup:

  • 1 node running twemproxy (vcache:22122)
  • 2 nodes running memcached (vcache-1, vcache-2) both listening on 11211

I have the following twemproxy config:

default:
  auto_eject_hosts: true
  distribution: ketama
  hash: fnv1a_64
  listen: 0.0.0.0:22122
  server_failure_limit: 1
  server_retry_timeout: 600000 # 600sec, 10m
  timeout: 100
  servers:
    - vcache-1:11211:1
    - vcache-2:11211:1

The twemproxy node can resolve all hostnames. As part of testing I took down vcache-2. In theory for every attempt to interface with vcache:22122, twemproxy will contact a server from the pool to facilitate the attempt. However, if one of the cache nodes is down, then twemproxy is supposed to "auto eject" it from the pool, so subsequent requests will not fail.

It is up to the app layer to determine if a failed interface attempt with vcache:22122 was due to infrastructure issue, and if so, try again. However I am finding that on the retry, the same failed server is being used, so instead of subsequent attempts being passed to a known good cache node (in this case vcache-1) they are still being passed to the ejected cache node (vcache-2).

Here's the php code snippet which attempts the retry:

....

// $this is a Memcached object with vcache:22122 in the server list

$retryCount = 0;

do {

    $status = $this->set($key, $value, $expiry);

    if (Memcached::RES_SUCCESS === $this->getResultCode()) {

        return true;
    }


} while (++$retryCount < 3);

return false;

-- Update --

Link to Issue opened on Github for more info: Issue #427

Azrael answered 2/11, 2015 at 21:50 Comment(0)
M
1

I can't see anything wrong with your configuration. As you know the important settings are in place:

default:
  auto_eject_hosts: true
  server_failure_limit: 1

The documentation suggests connection timeouts might be an issue.

Relying only on client-side timeouts has the adverse effect of the original request having timedout on the client to proxy connection, but still pending and outstanding on the proxy to server connection. This further gets exacerbated when client retries the original request.

Is your PHP script closing the connection and retrying before twemproxy failed its first attempt and removed the server from the pool? Perhaps adding a timeout value in the twemproxy lower than the connection timeout used in PHP solves the issue.

From your discussion on Github though it sounds like support for healthcheck, and perhaps auto ejection, aren't stable in twemproxy. If you're building against old packages you might be better to find a package which has been stable for some time. Is mcrouter (with interesting article) suitable?

Maskanonge answered 12/11, 2015 at 13:43 Comment(2)
I tried various permutations. I unset the memcached object, added a sleep, then reinstantiated the object. But still no luck. I am quite certain I tweaked the timeout setting, but it's not in the OP, I need to check my notes. I'll update.Azrael
I updated config in OP, I do have timeout set to 100 (ms). Which should pass by at least the 3rd try.Azrael
D
0

For this feature to work please merge with this repo/branch

https://github.com/charsyam/twemproxy/tree/feature/heartbeat

to have this specific commit

https://github.com/charsyam/twemproxy/commit/4d49d2ecd9e1d60f18e665570e4ad1a2ba9b65b1

here is the PR

https://github.com/twitter/twemproxy/pull/428

after that recompile it

Dactyl answered 28/8, 2016 at 14:7 Comment(1)
Awesome, I'll check it out.Azrael

© 2022 - 2024 — McMap. All rights reserved.