HttpClient Polly WaitAndRetry policy
Asked Answered
S

2

5

Anyone have any idea on why the policy below stop retry after 3 times instead of 10 ?

IAsyncPolicy<HttpResponseMessage> httpWaitAndRetryPolicy =
     Policy.HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
          .OrHandle<Exception>(r => true)
          .WaitAndRetryAsync(10, retryAttempt => TimeSpan.FromSeconds(2));

I set the retry attempt to 10 and test the http post call with BadRequest failure. But it only retry 3 times and then stopped until timeout and threw exception

----> System.Threading.Tasks.TaskCanceledException : A task was canceled.
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at HttpRetry.Lab.Tests.ServiceTest.WhenPostWrongAlert_ThenRecoversProperly() in C:\ServiceTest.cs:line 56
--TaskCanceledException


15:57:03.6367  INFO HttpClientProvider - Configuring client xxxxxxxx:1234/api/" timeout=00:02:00
15:57:03.6636  INFO Service            - POST xxxx/xxxxxxx
15:57:04.2051  INFO HttpClientProvider - Retrying retryCount=1 sleepDuration=00:00:02 result=Polly.DelegateResult`1[System.Net.Http.HttpResponseMessage]
15:57:06.6880  INFO HttpClientProvider - Retrying retryCount=2 sleepDuration=00:00:02 result=Polly.DelegateResult`1[System.Net.Http.HttpResponseMessage]
15:59:03.6811  INFO HttpClientProvider - Retrying retryCount=3 sleepDuration=00:00:02 result=Polly.DelegateResult`1[System.Net.Http.HttpResponseMessage]
15:59:03.6811 ERROR ServiceTest - System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at HttpRetry.Lab.Service.<PostAsync>d__4.MoveNext() in C:\Service.cs:line 38
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at HttpRetry.Lab.Tests.ServiceTest.<PostAsync>d__4.MoveNext() in C:\ServiceTest.cs:line 27

var serviceProvider = serviceConnection.AddHttpClient(connection.Name, c =>
        {
            c.BaseAddress = new Uri(connection.BaseUrl);
            c.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Basic", Convert.ToBase64String(Encoding.ASCII.GetBytes($"{connection.UserName}:{connection.Password}")));
            c.Timeout = connection.Timeout; // Timeout is TimeSpan.FromSeconds(120)
        })
    .AddPolicyHandler(httpWaitAndRetryPolicy)
    .Services.BuildServiceProvider();

HttpClientFactories.Add(connection.Name, serviceProvider.GetService<IHttpClientFactory>());

Confirmed the root cause of the issue: I don't know what causes the symptom but looks like the request connection will not be released unless explicitly calling Dispose HttpResponseMessage OnRetry. The current solution is setup OnRetry in WaitAndRetryAsync and dispose the resposne. Everything's working fine without the needs of changing ServicePointManager.DefaultConnectionLimit

Stomato answered 9/7, 2020 at 1:58 Comment(5)
How did you set the timeout (via Timeout policy / via cancellationToken / via HttpClient's "global" timeout / other way)? What does it mean CancelRequest status code in this case? Can you please share with us the Execute code as well?Baseborn
I have updated more information here. Will try to capture the exception once the remote endpoint is back on live.Stomato
@PeterCsala I updated everything in my post. Please take a look. The issue is haunting me for quite awhile nowStomato
Your exception message is out-of-sync with the code. Exception states 2 mintutes global timeout, where config comments says 1 min. Exception states 2 seconds sleepDuration, whereas policy setup says 5 seconds.Baseborn
@PeterCsala thanks for pointing that out. Things are inconsistent after multiple try-an-error. Will update the question shortlyStomato
B
7

Your timeout

As I can see you have a global 1 minute timeout on HttpClient level. That will throw a TaskCanceledException even though you might expect TimeoutException.

If you wish to receive TimeoutException then you have to specify a request based timeout via the RequestTimeout property of HttpRequestMessage. For further details please check the following link.

Your retry

Your retry logic defines 3 (or 10) retries with 5 seconds penalty. 3 retries means 4 attempts, because there is an initial (0th) request, which is out-side of the retry scope. If that fails then the 1st retry will become the 2nd attempt.

So the flow will look like this:

  1. Initial request sent out << 1st attempt
  2. Initial request failed
  3. Retry logic is triggered
  4. 5 seconds penalty is exposed
  5. 1st retry logic fires << 2nd attempt
  6. 2nd attempt failed
  7. Retry logic is triggered
  8. 5 seconds penalty is exposed
  9. 2nd retry logic fires << 3rd attempt
  10. ...

If all of this can't finish in under a minute then HttpClient will throw a TaskCanceledExpcetion due to global timeout.

Polly's timeout policy

Polly supports local and global timeout policies as well. There is a single Timeout policy, which can be used in both ways.

It can act as a local timeout if your timeout policy is wrapped inside the retry: retryPolicy.WrapAsync(timeoutPolicy);

It can act as a global timeout if your retry policy is wrapped inside the timeout: timeoutPolicy.WrapAsync(retryPolicy);

Of course, you can have global and local timeouts as well at the same time: Policy.WrapAsync(globalTimeoutPolicy, retryPolicy, localTimeoutPolicy);

I would highly encourage you to consider to use Polly's Timeout instead of HttpClient's Timeout in order to have a single place where you define your resilient strategy.

Please bear in mind that Timeout policy will throw TimeoutRejectedException if the timeout elapsed without a response. Because your retry handles all sort of Exceptions (.OrHandle<Exception>()), that's why you don't need to modify your retry policy.

Polly's transient failure's error handling

There is a nuget package called Microsoft.Extensions.Http.Polly (1), which defines several useful utilities. One of them is the HttpPolicyExtensions.HandleTransientHttpError()

It catches HttpRequestException and checks whether the response's status code is either 5xx or 408 (RequestTimeout).

It might worth considering to use this as well.

Debugging policies

Each of different policies define callback to provide ability to get insight how they work. In case of retry it is called onRetry or onRetryAsync for sync or async retry respectively. By providing the following delegate inside your WaitAndRetryAsync you can get really useful information:

onRetryAsync: (exception, delay, times, context) => {
  //TODO: logging
}
Baseborn answered 10/7, 2020 at 10:28 Comment(8)
Look like my colleague finally get to the bottom of this by examine the code in Polly. The problem is on Microsoft default value on ServicePointManager.DefaultConnectionLimit = 2. For me, I can only see the 2nd retry. So initial call + 1st retry used up the connection limit and the 2nd retry will just being starve until timeout. I will post more update on this topic later on.Stomato
i'm wondering if this is because I set the HttpCompletionOption.ResponseHeadersRead at my HttpClient.Post. So the connection never get released as I never query the ResponseHeader.Stomato
@Stomato Sorry to say, but ServicePointManager.DefaultConnectionLimit has nothing to do with Polly. It controls the outgoing concurrent Http request count. It varies based on the Host machine type (the value is different for Workstation and different for Windows Server). BTW: Which version of .NET Core are you using? And on what machine are you tesing? In case of Linux there is another property which controls this.Baseborn
I'm using .net framework 4.8. Cannot migrate to .net core as some external dependencies don't have net core version yet. Thanks a lot. Looks like the timeoutPolicy did release the http connection for the subsequent retry call on limited concurrent connection. I need to do more testing in order to make sure all retry goes through to rest api but at least the basic logic looks a lot better nowStomato
@Stomato I'm glad to hear that. :DBaseborn
@Stomato Did you manage to test thoroughly your solution with timeoutPolicy?Baseborn
I don't know what causes the symptom but looks like the request connection will not be released unless explicitly calling Dispose HttpResponseMessage OnRetry. The problem is solved completely once I call HttpResponseMessage.Result.Dispose() on retry. Don't even need the timeout policy anymore.Stomato
@Stomato In this case could be please provide an answer for your own question and close this topic?Baseborn
S
4

As a result, our team found out that the Polly Retry policy will not release http request connection until the HttpResponseMessage has been used. To be exact, this has nothing to do with Polly Retry, it's just the connection will not be released until the original HttpClient.SendAsync getting return. And the retry policy kinda delay that happen because of WaitAndRetry. End up the Polly Retry policy (e.g. x times) could end up using x+1 concurrent http connection per BadRequest.

There's 2 ways to "use" the HttpResponseMessage. Either by explicitly call response.Result.Dispose or performing some sort of Read on the response Content. e.g. response.Result.ReadAsAsync. Well, one more way is to wait for the httpClient timeout but I believe that's not really what we want. Here's the code that get things working. The key is the HttpResponse.Dispose in OnRetry

    ServicePointManager.DefaultConnectionLimit = appConfiguration.HttpClients.ConnectionLimit;

    IAsyncPolicy<HttpResponseMessage> httpWaitAndRetryPolicy =
        Policy.Handle<HttpRequestException>()
              .Or<Exception>()
              .OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
              .WaitAndRetryAsync(3, 
                                 retryAttempt => TimeSpan.FromSeconds(appConfiguration.HttpClients.RetryFactor * retryAttempt), 
                                 OnRetry);

    IAsyncPolicy<HttpResponseMessage> timeoutPolicy = Policy.TimeoutAsync<HttpResponseMessage>(appConfiguration.HttpClients.RequestTimeout);

    foreach (var endpoint in appConfiguration.HttpClients.HttpEndpoints)
    {
        var serviceProvider = serviceConnection.AddHttpClient(endpoint.Name,
                              c =>
                              {
                                  c.BaseAddress = new Uri(endpoint.BaseUrl);
                                  c.DefaultRequestHeaders.Authorization   = new System.Net.Http.Headers.AuthenticationHeaderValue(endpoint.AuthenticationScheme, Convert.ToBase64String(Encoding.ASCII.GetBytes($"{endpoint.UserName}:{endpoint.Password}")));
                                  c.DefaultRequestHeaders.ConnectionClose = false;
                                  c.Timeout     = endpoint.Timeout;
                              }).AddPolicyHandler(Policy.WrapAsync(httpWaitAndRetryPolicy, timeoutPolicy))
                              .Services.BuildServiceProvider();

        httpClientFactories.Add(endpoint.Name, serviceProvider.GetService<IHttpClientFactory>());
    }

    private Task OnRetry(DelegateResult<HttpResponseMessage> response, TimeSpan span, int retryCount, Context context)
    {
        if (response == null)
            return Task.CompletedTask;

        var responseResult = response.Result;
        logger.Info($"RetryCount={retryCount} IsSuccess={responseResult == null ? "" : responseResult.IsSuccessStatusCode} StatusCode={responseResult == null ? "" : responseResult.StatusCode} Exception={response.Exception?.Message}");

        response.Result?.Dispose();
        return Task.CompletedTask;
    }
Stomato answered 15/7, 2020 at 15:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.