Cannot read urllib error message once it is read()

Asked 11/11, 2015 at 21:27 Answered 29/1, 2017 at 15:58

My problem is with error handling of the python urllib error object. I am unable to read the error message while still keeping it intact in the error object, for it to be consumed later.

response = urllib.request.urlopen(request) # request that will raise an error
response.read()
response.read() # is empty now
# Also tried seek(0), that does not work either.

So this how I intend to use it, but when the Exception bubbles up, the.read() second time is empty.

try:
    response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
    self.log.exception(err.read())
    raise err

I tried making a deepcopy of the err object,

import copy
try:
    response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
    err_obj_copy = copy.deepcopy(err)
    self.log.exception(
        "Method:{}\n"
        "URL:{}\n"
        "Data:{}\n"
        "Details:{}\n"
        "Headers:{}".format(method, url, data, err_obj_copy.read(), headers))
    raise err

but copy is unable to make a deepcopy and throws an error - TypeError: __init__() missing 5 required positional arguments: 'url', 'code', 'msg', 'hdrs', and 'fp'.

How do I read the error message, while still keeping it intact in the object?

I do know how to do it using requests, but I am stuck with legacy code and need to make it work with urllib

Janettejaneva answered 11/11, 2015 at 21:27 Comment(0)

This is what I did. Worked for me.

When reading the error for the first time, save it to a variable like this: msg = response.read().decode('utf8'). You can then create a new HTTPError instance, with the message, and propagate it.

resp = urllib.request.urlopen(request)
msg = resp.read().decode('utf8')
self.log.exception(msg)
raise HTTPError(resp.url, resp.code, resp.reason, resp.headers, io.BytesIO(bytes(msg, 'utf8')))

Pigeonhearted answered 29/1, 2017 at 15:58 Comment(2)

You should save the result of resp.read() so that you pass in the raw bytes back to HTTPError instead of re-encoding the text. See @jf's answer above. – Neace 29/1, 2017 at 17:21

Thank you @reubano. It sure is better that way. I don't understand why, at first, when I tried to pass in the raw bytes, the variable msg would remain as an empty bytestring object. I must have been doing something wrong. I think that's why I decoded the bytestring. – Pigeonhearted 29/1, 2017 at 18:40

The error object may read from the network. Network is not seekable -- you can't go back in the general case.

You could replace err with a new HTTPError instance that reads from a buffer (like io.BytesIO()) instead of the network e.g., (not tested):

content = err.read()
self.log.exception(content)
raise HTTPError(err.url, err.code, err.reason, err.headers, io.BytesIO(content))

Though I'm not sure that you should -- handle the error in a single place instead e.g., reraise a more application specific exception or leave the logging to an upstream handler.

Sarsen answered 11/11, 2015 at 23:30 Comment(0)

Recommended topics

Hot tags