Non-valid Unicode/XML with Python SimpleXMLRPCServer?
Asked Answered
L

3

7

I am getting the following error on the client side when I pass invalid XML characters to a Python SimpleXMLRPCServer:

Fault: <Fault 1: "<class 'xml.parsers.expat.ExpatError'>:not well-formed (invalid token): line 6, column 15">

Why? Do I have to change the SimpleXMLRPCServer library code to fix this?

Here is my XML-RPC server code:

from SimpleXMLRPCServer import SimpleXMLRPCServer

import logging
logging.basicConfig(level=logging.DEBUG)

def tt(text):
    return "cool"

server = SimpleXMLRPCServer(("0.0.0.0", 9000))
server.register_introspection_functions()
server.register_function(tt)

# Run the server's main loop
server.serve_forever()

Here is my XML-RPC client code:

s = xmlrpclib.ServerProxy('http://localhost:9000')
s.tt(unichr(0x8))

On the server side, I don't get ANY error or traceback:

liXXXXXX.members.linode.com - - [06/Dec/2010 23:19:40] "POST /RPC2 HTTP/1.0" 200 -

Why no error on the server side? How do I diagnose what is going on?

And I get the following traceback on the client side:

/usr/lib/python2.6/xmlrpclib.pyc in __call__(self, *args)
   1197         return _Method(self.__send, "%s.%s" % (self.__name, name))
   1198     def __call__(self, *args):
-> 1199         return self.__send(self.__name, args)
   1200 
   1201 ##


/usr/lib/python2.6/xmlrpclib.pyc in __request(self, methodname, params)
   1487             self.__handler,
   1488             request,
-> 1489             verbose=self.__verbose
   1490             )
   1491 

/usr/lib/python2.6/xmlrpclib.pyc in request(self, host, handler, request_body, verbose)
   1251             sock = None
   1252 
-> 1253         return self._parse_response(h.getfile(), sock)
   1254 
   1255     ##


/usr/lib/python2.6/xmlrpclib.pyc in _parse_response(self, file, sock)
   1390         p.close()
   1391 
-> 1392         return u.close()
   1393 
   1394 ##


/usr/lib/python2.6/xmlrpclib.pyc in close(self)
    836             raise ResponseError()
    837         if self._type == "fault":
--> 838             raise Fault(**self._stack[0])
    839         return tuple(self._stack)
    840 

Fault: <Fault 1: "<class 'xml.parsers.expat.ExpatError'>:not well-formed (invalid token): line 6, column 15">

How do I get sane server-side processing if the input contains invalid XML? Can I clean up this data server side? How?

Lamarlamarck answered 7/12, 2010 at 4:24 Comment(0)
S
3

First, your example doesn't work for me, either. I don't know what you're asking about "sane server-side processing if the input contains invalid XML" -- you send the server invalid XML, and it is giving you back an error... what more do you want?

Second, stick a print 'hi there' in tt, you will see that tt is not being called when you send unichr(0x8). The exact response (a 200) by the server is:

HTTP/1.0 200 OK
Server: BaseHTTP/0.3 Python/2.6.5
Date: Tue, 07 Dec 2010 07:33:09 GMT
Content-type: text/xml
Content-length: 350

<?xml version='1.0'?>
<methodResponse>
<fault>
<value><struct>
<member>
<name>faultCode</name>
<value><int>1</int></value>
</member>
<member>
<name>faultString</name>
<value><string>&lt;class 'xml.parsers.expat.ExpatError'&gt;:not well-formed (invalid token): line 6, column 15</string></value>
</member>
</struct></value>
</fault>
</methodResponse>

So, you see your error message.

Now, according to the XML-RPC spec,

  • What characters are allowed in strings? Non-printable characters? Null characters? Can a "string" be used to hold an arbitrary chunk of binary data?

Any characters are allowed in a string except < and &, which are encoded as &lt; and &amp;. A string can be used to encode binary data.

Ok, but this is XML, and according to the XML spec:

Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Which doesn't include 0x08, and seems to completely contradict the XML-RPC spec! So, it would see that the XML spec is being implemented fairly rigorously by your XML parser (which, judging from the error, looks to be expat). Since XML doesn't allow 0x08, you can't send 0x08, and indeed, you get an error back.

If we do:

data = "<?xml version='1.0'?>\n<methodCall>\n<methodName>tt</methodName>\n<params>\n<param>\n<value><string>\x08</string></value>\n</param>\n</params>\n</methodCall>"
p = xml.parsers.expat.ParserCreate()
p.Parse(data, True)

...we get your error. Again, you are passing garbage XML to the server, the server is passing you back an error message, and the Python in the middle is presenting that error to you as an exception. What behavior did you expect?

Swanherd answered 7/12, 2010 at 7:58 Comment(1)
So yes, thank you for investigating. As I indicated, I understand that this is not valid XML. I would like to be able to trap the error server-side (instead of silently failing), and then strip any invalid characters in the input. I don't write the clients, and I would like to offer the best possible partial results to clients if they pass me XML that has one or two invalid characters.Lamarlamarck
G
0

You indicated in your comment that you would like to handle as much of the XML for the client as possible. While this may sound good on first sight (?), there are cons to consider:

  • How do you know what can you strip? Maybe you strip something that would have been important, but the client send it badly coded, etc.

  • Imagine that initially you support request with one particular malformation. But then users start to send you a second type malformation, and you add exception for that one too (once you added for the first one, why not?). This is a long way down the road...

  • It is better to let things fail as soon as possible and let them be dealt with where it is should be. This time the client implementation is wrong, so let the client fix it. Better for both of you on the long run.

If you manage the client code too, then you may last-resort to pushing some XML tidy on it (see BeautifulSoup for example). But rather deal with the problem by disabling invalid input in the first place.

Grieco answered 13/1, 2011 at 22:14 Comment(0)
G
0

Thanatos perfectly explained the reason of your problem in his post.

As for a solution to workaround this problem: You can use xmlrpclib.Binary to base64-encode the data to be sent. (For PY3K: xmlrpc.client.Binary)

Gallinaceous answered 25/5, 2012 at 8:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.