How to fix unicode issue when using a web service with Python Suds
Asked Answered
S

3

7

I am trying to work with the HORRIBLE web services at Commission Junction (CJ). I can get the client to connect and receive information from CJ, but their database seems to include a bunch of bad characters that cause a UnicideDecodeError.

Right now I am doing:

from suds.client import Client
wsdlLink = 'https://link-search.api.cj.com/wsdl/version2/linkSearchServiceV2.wsdl'
client = Client(wsdlLink)
result = client.service.searchLinks(developerKey='XXX', websiteId='XXX', promotionType='coupon')

This works fine until I hit a record that has something like 'CorpNet® 10% Off Any Service' then the ® causes it to break and I get

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 758: ordinal not in range(128)" error.

Is there a way to encode the ® on my end so that it does not break when SUDS reads in the result?

UPDATE: To clarify, the ® is coming from the CJ database and is in their response. SO somehow I need to decode the non-ascii characters BEFORE SUDS deals with the response. I am not sure how (or if) this is done in SUDs.

Soninlaw answered 16/1, 2011 at 2:57 Comment(1)
make sure that you don't mix str and unicode objects e.g., u'a'+'®' will cause the error. Decode input to Unicode as earlier as possible.Jarboe
T
3

Implicit UnicodeDecodeErrors is something you get when trying to add str and unicode objects. Python will then try to decode the str into unicode, but using the ASCII encoding. If your str then contains anything that is not ascii, you will get this error.

Your solution is the decode it manually like so:

thestring = thestring.decode('utf8')

Try, as much as possible, to decode any string that may contain non-ascii characters as soo as you are handed it from whatever module you get it from, in this case suds.

Then, if suds can't handle Unicode (which may be the case) make sure you encode it back just before handing the text back to suds (or any other library that breaks if you give it unicode).

That should solve things nicely. It may be a big change, as you need to move all your internal processing from str to unicode, but it's worth it. :)

Terrie answered 16/1, 2011 at 8:31 Comment(7)
Lennart. The issue is the non-ascii characters are actually in CJ database and in the response they send. So not sure how I can decode their response before SUDS tries to parse it and throws the error. I need someway to send the request, decode the response and then parse the response. But I do not see a way to do this in SUDs.Soninlaw
@chris: So the error happens in suds, even before your code ever handles the data? In that case it's a bug, either in suds or in the server. Perhaps the server sends data encoded in UTF when it claims it's something else?Terrie
Lennart - Correct. I am pretty sure it is happening at the server (which I can not control). Commission Junction do not seem to support the web services and I was hoping there was someway to correct the data before it gets fed back into SUDs. I was thinking it was a long shot, but thought I was maybe missing something.Soninlaw
@chris: Come to think if it, since it uses the ascii decoder when it fails, I think it's more likely to be a suds bug. You'll have to check on a suds mailing list.Terrie
@Lennart you are 100% correct. Last night I started digging around in SUDS and was able to patch it so that everything works now. Thanks very much for your help.Soninlaw
@chris: Please, report the bug and submit your patch at fedorahosted.org/sudsJarboe
@chris, would you please tell me what patch did you use to solve this porblem, I'm having almost similar issue here with unicode #15339641 . Thanks in advance!Autism
E
1

The "registered" character is U+00AE and is encoded as "\xc2\xae" in UTF-8. It looks like you have a str object encoded in UTF-8 but some code is doing (probably by default) your_str_object.decode("ascii") which will fail with the error message you showed.

What you need to do is show us a complete example (i.e. ALL the code necessary to get the error), plus the full error message and traceback, so that at least we can guess whether the problem is in your code or in imported code.

Ernestineernesto answered 16/1, 2011 at 4:9 Comment(3)
To be clear. The data that is causing the error is in the reply back from the web service. So, I send a request that works, but the issue happens when CJ replies back with a "registered" character in the response. So what I need to do is somehow clean the character BEFORE SUDS tries to parse it. As far as the code, what you see above is all you need to do with SUDS to get a web service response and SUDs error.Soninlaw
@chris: All you need to do is what everybody should do when asking about a problem that is raising an exception: run the minimal code necessary to cause the problem, and copy/paste the full error message and traceback into an edit of your question. By the way, how do you know that it's a "registered" character in the response?Ernestineernesto
John - the code I ran is what I put in the question, and the error is "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 759: ordinal not in range(128)". Not sure what you are asking for beyond that. I know it is the "registered" character because I can use another soap client that does not break on the ascii characters and see that on the results that break the "registered" character is included in the response. When the results do not include the "registered" character, the results come back correct. The issue ended up being in SUDs, I did a ghetto patch. ThanksSoninlaw
W
0

I am using SUDS to interface with Salesforce via their SOAP API. I ran into the same situation until I followed @J.F.Sabastian's advice by not mixing str and unicode string types. For example, passing a SOQL string like this does work with SUDS 0.3.9:

qstr = u"select Id, FirstName, LastName from Contact where FirstName='%s' and LastName='%s'"  % (u'Jorge', u'López')

I did not seem to need to do str.decode("utf-8") either.

If you're running your script from PyDev on Eclipse, you might want to go into Project => Properties and under Resource, set "Text File Encoding" to UTF-8, on my Mac, this defaults to "MacRoman". I suppose on Windoze, the default is either Cp1252 or ISO-8859-1 (Latin). You could also set this in your Workspace of your Projects inherit this setting from their workspace. This only effects the program source code.

Wrasse answered 20/1, 2011 at 19:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.