python requests module logging of encoding

G

6

17

I'm using python and requests module==2.18.4

While crawling some data with requests, I used the logging module for debugging.

I want the log to look something like this:

[DEBUG] 2018-01-25 03:15:36,940 http://localhost:8888 "GET /aaa" 200 2290
[DEBUG] 2018-01-25 03:15:36,940 http://localhost:8888 "GET /aaa" 200 2290
[DEBUG] 2018-01-25 03:15:36,940 http://localhost:8888 "GET /aaa" 200 2290

But I get this:

[DEBUG] 2018-01-25 03:15:36,940 http://localhost:8888 "GET /aaa" 200 2290
[DEBUG] 2018-01-25 03:15:36,974 EUC-JP Japanese prober hit error at byte 1765
[DEBUG] 2018-01-25 03:15:36,990 EUC-KR Korean prober hit error at byte 1765
[DEBUG] 2018-01-25 03:15:36,994 CP949 Korean prober hit error at byte 1765
[DEBUG] 2018-01-25 03:15:37,009 EUC-TW Taiwan prober hit error at byte 1765
[DEBUG] 2018-01-25 03:15:37,036 utf-8 not active
[DEBUG] 2018-01-25 03:15:37,036 SHIFT_JIS Japanese confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,036 EUC-JP not active
[DEBUG] 2018-01-25 03:15:37,036 GB2312 Chinese confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,036 EUC-KR not active
[DEBUG] 2018-01-25 03:15:37,036 CP949 not active
[DEBUG] 2018-01-25 03:15:37,036 Big5 Chinese confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,036 EUC-TW not active
[DEBUG] 2018-01-25 03:15:37,036 windows-1251 Russian confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 KOI8-R Russian confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 ISO-8859-5 Russian confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 MacCyrillic Russian confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 IBM866 Russian confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 IBM855 Russian confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 ISO-8859-7 Greek confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 windows-1253 Greek confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 ISO-8859-5 Bulgairan confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 windows-1251 Bulgarian confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 TIS-620 Thai confidence = 0.01
[DEBUG] 2018-01-25 03:15:37,038 ISO-8859-9 Turkish confidence = 0.47949350706
[DEBUG] 2018-01-25 03:15:37,038 windows-1255 Hebrew confidence = 0.0
[DEBUG] 2018-01-25 03:15:37,038 windows-1255 Hebrew confidence = 0.0
[DEBUG] 2018-01-25 03:15:37,038 windows-1255 Hebrew confidence = 0.0
...

I don't want that encoding in logs. How can I remove them?

Grainy answered 24/1, 2018 at 18:26 Comment(0)

H

14

I had the same issue and found these the extra logs coming from chardet.charsetprober module.

To suppress these logs, put this after imports.

logging.getLogger('chardet.charsetprober').setLevel(logging.INFO)

This won't print any DEBUG level message from chardet.charsetprober module and you will get the desired log message only.

Hope it helps!

Hollow answered 7/2, 2018 at 6:7 Comment(0)

R

13

Try setting the logging level for the module chardet.charsetprober to something higher than DEBUG (eg. INFO).

logger = logging.getLogger('chardet.charsetprober')
logger.setLevel(logging.INFO)

Rent answered 2/2, 2018 at 11:4 Comment(0)

P

3

response.content.decode('ISO-8859-1') set it for mix-charset decoding, worked for me

Preciosa answered 24/1, 2018 at 18:26 Comment(0)

T

3

I assume this issue has something to do with r.text (the text attribute of the response returned). Since requests does not know the specific encoding, it has to kinda try, thus the long list logged. To avoid this, you can either set the logging level higher (like INFO), or specify the encoding (r.encoding='utf-8' or whatever you like) before accessing r.text.

Telic answered 19/6, 2018 at 1:18 Comment(0)

R

0

Not sure if I am understanding your question properly, why can't you separate this message to different logging level.

Reinforce answered 2/2, 2018 at 11:7 Comment(1)

it's automatic log of requests module... i can't separate it with my own. – Grainy 5/2, 2018 at 6:22

I

0

As mentioned in other answers, when you call the r.text field, the requests library tries to guess the encoding of the text.

In some cases, you could use the r.content field (binary response content) instead of r.text to avoid this guessing process.

Irena answered 16/12, 2020 at 8:21 Comment(0)

Recommended topics

Hot tags