We have a REST API available. For each of the endpoints that this API offers, we have a defined SLA based on the internal testing. New Relic provide an option to define the Apdex T score on a per application basis. Considering a scenario as follows:
- Endpoint A: SLA is 200ms
- Endpoint B: SLA is 800ms
Average SLA: 500ms
Case 1: Consider the average SLA for the Apdex Threshold value The problem with this approach is that even though my endpoint A is expected to completed in 200ms, it wouldn't be flagged even if the endpoint takes twice the time defined in the SLA since it would still be less than the average value. Vice-versa would be the case for endpoint B, where it would be flagged even if it was below 800ms.
Case 2: Consider the max SLA(800ms) of all the endpoints as the Apdex T value Again the problem, here would be with the endpoint A. Any delay in response from this endpoint wouldn't be flagged even if take 4 times the actual expected time.
So, how do we arrive at an Apdex Threshold value in such scenarios? I went through the following article from New relic: LINK. This makes sense when we look the service as a whole, but not when we look at each of the endpoints.