So Ive been using suds with great benefit to consume a webservice.
Hit an issue with performance, for some data the cpu would spike hard, it would take more than 60s to complete the request, which is served by gunicorn, suds to webservice and so on.
Looking into it with line_profiler, objgraph, memory_profiler etc, I find the culprit is it about takes 13s to parse a 9.2mb xml file, which is the response from the webservice.
That can not be normal right? Just 9.2mb and I see 99% of the time is spent parsing it, and the parsing is done by "from xml.sax import make_parser" which means standard python?
Any faster xml parsers out there for big files?
Ill look into exactly what kind of structure is in the XML, but so far I know its "UmResponse" which contains around 7000 "Document" elements with each contains 10-20 lines of elements.
EDIT: Investigating further I see half of that 13s is spent in the suds Handler in suds/sax/ ... hm could be suds problem and not python library, of course.
EDIT2: suds unmarshaller used most of the time spent processing this, about 50s, parsing with sax was also slow, pysimplesoap which uses xml.minidom is taking about 13s and lots of memory. However lxml.etree is below 2s and objectify is also very fast, fast enough to use it instead of ElementTree (which is faster than cElementTree for this specific xml here, 0.5s for one 0.17s for other)
Solution: Suds allows parameter retxml to be true, to give back the XML without parsing and unmarshalling, from there I can do it faster with lxml.