Why is ZUUL forcing a SEMAPHORE isolation to execute its Hystrix commands?
Asked Answered
W

2

11

I noticed Spring-Cloud ZUUL forces the execution isolation to SEMAPHORE instead of the THREAD defaults (as recommended by Netflix).

A comment in org.springframework.cloud.netflix.zuul.filters.route.RibbonCommand says:

we want to default to semaphore-isolation since this wraps 2 others commands that are already thread isolated

But still I don't get it :-( What are those two other commands?

Configured this way, Zuul can only sched load but does not allow for timing out and let the client walk away. In short, even if the Hystrix timeout is set to 1000ms, clients will only be released once the call forwarded to the service down the chain returns (or timeouts because of a ReadTimeout for instance).

I tried to force THREAD isolation by overriding the configuration (per service unfortunately, since the default is forced in the code) and everything seems to work as expected. However, I'm not keen in doing this without proper understanding of the implications - certainly in regards of the comment found in the code and the defaults adopted by the Spring Cloud version of Zuul.

Can someone provide more information? Thx

Waitabit answered 30/4, 2015 at 10:53 Comment(2)
any updates? The RestClient.LoadBalancerCommand inside the RibbonCommand, creates 2 Observables. Is it anywhere related to this comment. I am not sure.Rapport
Hello, sorry for being late to the party :) I'm curious why did you feel the need to change the isolation strategy? Did you notice performance bottlenecks due to the semaphore? As mentioned in the comment in org.springframework.cloud.netflix.zuul.filters.route.support.AbstractRibbonCommand, the timeout is handled by the 2 methods it wraps. I'm guessing one of those is a Ribbon command, so the timeouts can be set by ribbon.connectTimeout and ribbon.readTimeoutCeylon
B
3

The Hystrix documentation has a good example of why semaphore isolation is appropriate when wrapping commands that are thread isolated. Specifically, it says:

The façade HystrixCommand can use semaphore-isolation since all of the work it is doing is going through two other HystrixCommands that are already thread-isolated. It is unnecessary to have yet another layer of threading as long as the run() method of the façade is not doing any other network calls, retry logic, or other “error prone” things.

Update: The question mentions that thread isolation has to be configured for each service, but I found out that you can control the isolation of all Hystrix commands (including RibbonCommands) by setting the following property:

hystrix.command.default.execution.isolation.strategy = THREAD
Benoite answered 29/9, 2015 at 9:58 Comment(5)
But in this case, what are those two other HystrixCommands... Can't find them hence my question.Waitabit
@BertrandRenuart I haven't looked into the Zuul implementation so I don't have an answer to that question.Benoite
@BertrandRenuart Perhaps the comment is referring to the use of rx.Observable in LoadBalancerCommand which is invoked via the RestClient in RibbonCommand. Although I have to admit, I'm really just guessing here.Benoite
There's no way to change the SEMAPHORE configuration to THREAD, it's hardcorded in org/springframework/cloud/netflix/zuul/filters/route/RestClientRibbonCommand.getSetter(String commandKey)Lindly
@WornOutSoles I have to disagree, setting the property specified in the answer above definitely changed the behavior wrt timeout. I believe that the configuration you point to is a so called "instance default from code", which can be overridden by a "dynamic instance property". See github.com/Netflix/Hystrix/wiki/Configuration for details.Benoite
S
0

This pattern is defined in the Hystrix documentation

The façade HystrixCommand can use semaphore-isolation since all of the work it is doing is going through two other HystrixCommands that are already thread-isolated. It is unnecessary to have yet another layer of threading as long as the run() method of the façade is not doing any other network calls, retry logic, or other “error prone” things.

The reason why we only use semaphore is because

  1. We want to throttle number of requests to primary & secondary (may be more) combined
  2. Since isolation is already achieved by the actual hystrix command thread pools (that this facade proxies), we dont need to need to worry about isolatation at this level using threadpools (each thread in the pool has a fixed overhead interms of resource consumption, unlike semaphore which is just a counter). So, lightweight semaphore is good enough

Reference: https://github.com/Netflix/Hystrix/wiki/How-To-Use#primary--secondary-with-fallback

Subdue answered 18/1, 2019 at 10:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.