Use "byte-like object" from urlopen.read with JSON? [duplicate]
Asked Answered
J

5

27

Just trying to test out very simple Python JSON commands, but I'm having some trouble.

urlopen('http://www.similarsitesearch.com/api/similar/ebay.com').read()

should output

'{"num":20,"status":"ok","r0":"http:\\/\\/www.propertyroom.com\\/","r1":"http:\\/\\/www.ubid.com\\/","r2":"http:\\/\\/www.bidcactus.com\\/","r3":"http:\\/\\/www.etsy.com\\/","r4":"http:\\/\\/us.ebid.net\\/","r5":"http:\\/\\/www.bidrivals.com\\/","r6":"http:\\/\\/www.ioffer.com\\/","r7":"http:\\/\\/www.shopgoodwill.com\\/","r8":"http:\\/\\/www.beezid.com\\/","r9":"http:\\/\\/www.webidz.com\\/","r10":"http:\\/\\/www.auctionzip.com\\/","r11":"http:\\/\\/www.overstock.com\\/","r12":"http:\\/\\/www.bidspotter.com\\/","r13":"http:\\/\\/www.paypal.com\\/","r14":"http:\\/\\/www.ha.com\\/","r15":"http:\\/\\/www.onlineauction.com\\/","r16":"http:\\/\\/bidz.com\\/","r17":"http:\\/\\/www.epier.com\\/","r18":"http:\\/\\/www.sell.com\\/","r19":"http:\\/\\/www.rasmus.com\\/"}'

but I get that same string, with a b in front:

b'{"num":20,"status":"ok","r0":"http:\\/\\/www.propertyroom.com\\/","r1":"http:\\/\\/www.ubid.com\\/","r2":"http:\\/\\/www.bidcactus.com\\/","r3":"http:\\/\\/www.etsy.com\\/","r4":"http:\\/\\/us.ebid.net\\/","r5":"http:\\/\\/www.bidrivals.com\\/","r6":"http:\\/\\/www.ioffer.com\\/","r7":"http:\\/\\/www.shopgoodwill.com\\/","r8":"http:\\/\\/www.beezid.com\\/","r9":"http:\\/\\/www.webidz.com\\/","r10":"http:\\/\\/www.auctionzip.com\\/","r11":"http:\\/\\/www.overstock.com\\/","r12":"http:\\/\\/www.bidspotter.com\\/","r13":"http:\\/\\/www.paypal.com\\/","r14":"http:\\/\\/www.ha.com\\/","r15":"http:\\/\\/www.onlineauction.com\\/","r16":"http:\\/\\/bidz.com\\/","r17":"http:\\/\\/www.epier.com\\/","r18":"http:\\/\\/www.sell.com\\/","r19":"http:\\/\\/www.rasmus.com\\/"}'

Subsequently, when I try to run

json.loads(urlopen('http://similarsitesearch.com/api/similar/ebay.com').read())

it gives me the error message:

TypeError: can't use a string pattern on a bytes-like object"

which I'm assuming has something to do with the b?

I imported urlopen from urllib.request, and I am running Python 3.

Any ideas?

Jessen answered 1/6, 2012 at 7:24 Comment(0)
M
33

The content from read() is of type bytes so you need to convert it to a string before trying to decode it into a json object.

To convert bytes to a string, change your code to: urlopen('http://similarsitesearch.com/api/similar/ebay.com').read().decode("utf-8")

Marchesa answered 12/12, 2012 at 17:29 Comment(0)
P
6

You need to examine the charset specified in the Content-Type header and decode by that before passing it to json.load*().

Person answered 1/6, 2012 at 7:30 Comment(5)
It appears to be UTF-8, there isn't any automatic decoding occurring? (Was there historically?)Anorthite
There never was; urllib.urlopen().read() returned a bytestring in 2.x as well. It just so happened that json was okay with that.Person
Sorry, I'm not quite understanding. Further clarification? :)Jessen
@IgnacioVazquez-Abrams: Pity that there are factual errors in that presentation. It claims Python doesn't support UTF-32, for example.Clerical
@MartijnPieters: It didn't, back when it was written.Person
M
6

It worked well :

def myView(request):
    encoding = request.read().decode("utf-8")
    dic = json.loads(encoding)
    print(dic)
Material answered 5/3, 2014 at 19:29 Comment(0)
U
0

urllib is returning a byte array, which I assume is the default in py3, and json is expecting a string. Try wrapping the return value in a str() call before invoking the json call

j = str(urlopen('http://similarsitesearch.com/api/similar/ebay.com').read())
json.loads(j)
Uncommon answered 1/6, 2012 at 7:29 Comment(2)
Hmmmm, now its telling me that "No JSON object could be decoded."Jessen
That's because str() doesn't convert a bytes to a str in 3.x.Person
A
0

Looks like a byte literal. Investigate how you get the data with http, or how the API returns the data in the headers.

Alan answered 1/6, 2012 at 7:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.