How to remove Byte Order Mark in python
Asked Answered
B

1

5

This question is related to a recent change to the Stack Overflow API that I reported here. In that question, I received a response that seems like it'd work, but in practice I'm unable to make it work.

This is my code

import requests
import json
url="https://api.stackexchange.com/2.2/sites/?filter=%21%2AL1%2AAY-85YllAr2%29&pagesize=1&page=1"
response = requests.get(url)
response.text

This outputs

u'\ufeff{"items":[{"site_state":"normal","api_site_parameter":"stackoverflow","name":"Stack Overflow"}],"has_more":true,"quota_max":300,"quota_remaining":294}'

The leading u'\ufeff means that if I do response.json() I get a ValueError: No JSON object could be decoded

The suggestion I was provided was to use decode('utf-8-sig'). However, I can't seem to get this work work either:

Try 1:

response.text.decode('utf-8-sig')
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

Try 2:

json.loads(response.text).decode('utf-8-sig')
ValueError: No JSON object could be decoded

What is the appropriate way to remove the leading u'\ufeff?

Bumbling answered 3/7, 2014 at 13:3 Comment(0)
D
10

response.text is a Unicode object, i. e. it already has been decoded, so you can't decode it again.

What you need to do is tell the response object which encoding it should use:

response = requests.get(url)
response.encoding = "utf-8-sig"
response.text

See the docs for more background info.

Decor answered 3/7, 2014 at 13:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.