Adding callback function on each retry attempt using requests/urllib3
Asked Answered
C

1

20

I've implemented a retry mechanism to requests session using urllib3.util.retry as suggested both here and here.

Now, I am trying to figure out what is the best way to add a callback function that will be called on every retry attempt.

To explain myself even more, if either the Retry object or the requests get method had a way to add a callback function, it would have been great. Maybe something like:

import requests
from requests.packages.urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

def retry_callback(url):
    print url   

s = requests.Session()
retries = Retry(total=5, status_forcelist=[ 500, 502, 503, 504 ])
s.mount('http://', HTTPAdapter(max_retries=retries))

url = 'http://httpstat.us/500'
s.get(url, callback=retry_callback, callback_params=[url])

I know that for printing url I can use the logging, but this is only a simple example for a more complex use.

Combs answered 5/7, 2018 at 10:11 Comment(0)
Y
39

You can subclass the Retry class to add that functionality.

This is the full interaction flow with the Retry instance for a given connection attempt:

  • Retry.increment() is called with the current method, url, response object (if there is one), and exception (if one was raised) whenever an exception is raised, or a 30x redirection response was returned, or the Retry.is_retry() method returns true.
    • .increment() will re-raise the error (if there was one) and the object was configured not to retry that specific class of errors.
    • .increment() calls Retry.new() to create an updated instance, with any relevant counters updated and the history attribute amended with a new RequestHistory() instance (a named tuple).
    • .increment() will raise a MaxRetryError exception if Retry.is_exhausted() called on the return value of Retry.new() is true. is_exhausted() returns true when any of the counters it tracks has dropped below 0 (counters set to None are ignored).
    • .increment() returns the new Retry instance.
  • the return value of Retry.increment() replaces the old Retry instance tracked. If there was a redirect, then Retry.sleep_for_retry() is called (sleeping if there was a Retry-After header), otherwise Retry.sleep() is called (which calls self.sleep_for_retry() to honor a Retry-After header, otherwise just sleeping if there is a back-off policy). Then a recursive connection call is made with the new Retry instance.

This gives you 3 good callback points; at the start of .increment(), when creating the new Retry instance, and in a context manager around super().increment() to let a callback veto an exception or update the returned retry policy on exit.

This is what putting a hook on the start of .increment() would look like:

import logging

logger = getLogger(__name__)

class CallbackRetry(Retry):
    def __init__(self, *args, **kwargs):
        self._callback = kwargs.pop('callback', None)
        super(CallbackRetry, self).__init__(*args, **kwargs)
    def new(self, **kw):
        # pass along the subclass additional information when creating
        # a new instance.
        kw['callback'] = self._callback
        return super(CallbackRetry, self).new(**kw)
    def increment(self, method, url, *args, **kwargs):
        if self._callback:
            try:
                self._callback(url)
            except Exception:
                logger.exception('Callback raised an exception, ignoring')
        return super(CallbackRetry, self).increment(method, url, *args, **kwargs)

Note, the url argument is really only the URL path, the net location portion of the request is omitted (you'd have to extract that from the _pool argument, it has .scheme, .host and .port attributes).

Demo:

>>> def retry_callback(url):
...     print('Callback invoked with', url)
...
>>> s = requests.Session()
>>> retries = CallbackRetry(total=5, status_forcelist=[500, 502, 503, 504], callback=retry_callback)
>>> s.mount('http://', HTTPAdapter(max_retries=retries))
>>> s.get('http://httpstat.us/500')
Callback invoked with /500
Callback invoked with /500
Callback invoked with /500
Callback invoked with /500
Callback invoked with /500
Callback invoked with /500
Traceback (most recent call last):
  File "/.../lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
    body_pos=body_pos, **response_kw)
  File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
    body_pos=body_pos, **response_kw)
  File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
    body_pos=body_pos, **response_kw)
  [Previous line repeated 1 more times]
  File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 712, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
  File "<stdin>", line 8, in increment
  File "/.../lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='httpstat.us', port=80): Max retries exceeded with url: /500 (Caused by ResponseError('too many 500 error responses',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../lib/python3.6/site-packages/requests/sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "/.../lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/.../lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/.../lib/python3.6/site-packages/requests/adapters.py", line 499, in send
    raise RetryError(e, request=request)
requests.exceptions.RetryError: HTTPConnectionPool(host='httpstat.us', port=80): Max retries exceeded with url: /500 (Caused by ResponseError('too many 500 error responses',))

Putting a hook in the .new() method would let you adjust the policy for a next attempt, as well as let you introspect the .history attribute, but would not let you avoid the exception re-raising.

Yearning answered 9/7, 2018 at 21:51 Comment(5)
Wow. Thank you for this detailed answer. So if I want to add also a callback_params argument, I can do it the same way you did with callback and just pass them when I call the callback function itself, right?Combs
@A.Sarid: yes, you can add any number of additional attributes to your subclass, and use those as you see fit. Do update the new() method to copy across any such attributes to the kw dictionary before calling the super().new() method to create a copy.Yearning
@A.Sarid: Personally, I'd not add such a params feature. It complicates your callback handling in the class, and is not needed. Instead, pass in a callback function that can handle a default set of arguments.Yearning
Thanks! Yea default set of arguments is fine, but if I want to give those arguments when creating my CallbackRetry class, what will be my best option?Combs
@A.Sarid: you'd have to build an argument list and pass it to the callback with *args: args = [getattr(self, argname) for argname in self.callback_params] and self.callback(*args). I strongly suggest you don't do this. Use a wrapper callback instead, the wrapper accepts all arguments, then calls your actual callback with just url or similar: callback = lambda method, url, *args, **kwargs: real_callback(url) (in the scenario where url is the second argument that the Retry subclass will pass in).Yearning

© 2022 - 2024 — McMap. All rights reserved.