Eureka's self-preservation mode never recovers

Asked 15/10, 2015 at 14:19 Answered 22/11, 2019 at 1:36

I'm currently facing an issue where Eureka never clears out service instances that have become stale because a VM went down unexpectedly. Understandably, Eureka's self-preservation mode kicked in because there was a large drop (below the threshold) in service renewals/heartbeat requests. However, 15+ hours later the dead instances are still registered in Eureka. This is a major problem as service requests continue to be directed to the dead instances only to return errors.

My hope was that the threshold is continuously adjusted and after some period of time, Eureka's threshold would be at a new norm level and self-preservation mode would be reset. We are using Eureka in mirrored setup and our configurations are not very complex.

Our setup:

Eureka via spring-boot-starter-parent 1.2.5.RELEASE

eureka:
  dashboard:
    path: services
    enabled: false
  instance:
    hostname: localhost
    leaseRenewalIntervalInSeconds: 3
    metadataMap:
      managementPath: /admin
      instanceId: discoveryPrimary
  client:
    registerWithEureka: false
    fetchRegistry: false
    serviceUrl:
      defaultZone: http://${eureka.instance.hostname}:${server.port}/eureka/
  server:
    waitTimeInMsWhenSyncEmpty: 0

Is it possible to adjust Eureka configurations to reset the self-preservation mode (where it stops clearing instances) and allow it to clear service registries if the services are dead for 5+ minutes?

Arlinda answered 15/10, 2015 at 14:19 Comment(0)

If you are having only a few instances of your services, everytime any of them fail, the self preservation will kick in because on default the renewalPercentThreshold is 0.85.

So if only 84% of your instances renewed their lease eureka "turns on" self preservation.

This means if you have 3 instances and one fails, only 66% percent of them renewed their licences so none will get deregistered. You can tune renewalPercentThreshold in the server properties to suite your deployment.

eureka:
  server:
    renewalPercentThreshold: 0.49

With this, if you have 2 instances and 1 fails you are still good.

Cholecalciferol answered 15/10, 2015 at 14:37 Comment(3)

Adjusting the renewalPercentThreshold will get around the issue by delaying self-preservation mode. We can accomplish the same by setting eureka.server.enableSelfPreservation=false, but that still won't solve the issue if self-preservation is triggered and those instances never come back. – Arlinda 15/10, 2015 at 14:58

No, it won't just delay the self preservation mode if your network is healthy or yout instances don't just come and go every minute. It is simply configured for a smaller infrastructure. You can tune it further with renewalThresholdUpdateIntervalMs so the time window will be smaller for self preservation to kick in. If you are turning it off, you simply don't use one of Eureka's resiliency feature. – Helmut 15/10, 2015 at 15:3

Our goal is to try to use all of Eureka's resiliency features, including self preservation mode. However, by setting the threshold lower, it only avoids engaging the self-preservation mode and once it is engaged, it still won't clear out the dead instances after 15+ hours. I do not believe that the threshold is the issue for the described reasons. Are there some other configurations that would allow the self-preservation mode to reset and eventually clear out dead instances? – Arlinda 16/10, 2015 at 16:15

Even tough an old question, here's my two cents.

My hope was that the threshold is continuously adjusted and after some period of time, Eureka's threshold would be at a new norm level and self-preservation mode would be reset.

Incorrect assumption. Eureka self-preservation never expires and the thresholds are not adjusted dynamically. You'll have to bring back the down VMs/ clients (so that overall > 85% of the clients are UP) in order to get away from this state.

I believe it makes sense to turn it off - have a look at conclusions here and the similar question here.

Goodtempered answered 10/12, 2017 at 5:13 Comment(2)

It is not recommended to disable self preservation mode in production. One missed heartbeat and a healthy instance is removed from registry. Not a good idea. – Frug 5/3, 2018 at 11:59

You are not correct @narendra-choudhary. One missed heartbeat does not evict an instance. If a heartbeat fails, clients backs-off exponentially by a factor of 2, until a maximum delay. Then fall-back to the next server in the server-list and servers replicate the registry info. – Goodtempered 5/3, 2018 at 12:25

After disable this mode as recommend by @Fahim Farook 's answer,to avoid loss of one heartbeat casues registry removed, you can tunning the acceptable lossing heartbeat duration by config the follwing properties:

eureka.instance.lease-renewal-interval-in-seconds=30
eureka.instance.lease-expiration-duration-in-seconds=180

Weed answered 22/11, 2019 at 1:36 Comment(0)

Recommended topics

Hot tags