Our project gets from upstream XML of this form:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
<bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
It then reads/parses this XML using ElementTree, and then for every app setting matching a certain key ("foo"), it writes a new value that it knows about that the upstream process doesn't ( in this case key "foo" should have the value "bar").
The downstream process consuming the filtered XML is, aaahhhh... fragile. It expects to receive the XML in exactly the form above.
If I parse this XML without registering a namespace, then ElementTree mangles my tree like this on input:
<configuration xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<runtime>
<ns0:assemblyBinding>
<ns0:dependentAssembly>
<ns0:assemblyIdentity culture="neutral" name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" />
<ns0:bindingRedirect newVersion="7.0.0.0" oldVersion="0.0.0.0-6.0.0.0" />
</ns0:dependentAssembly>
</ns0:assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
The downstream process can't handle this, because it's no clever enough to realize that, semantically, this is the same thing. So, I decide to register the namespace I know the upstream process will provide as a default namespace to avoid the prefixes showing up everywhere, and now I get this:
<configuration xmlns="urn:schemas-microsoft-com:asm.v1">
<runtime>
<assemblyBinding>
<dependentAssembly>
<assemblyIdentity culture="neutral" name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" />
<bindingRedirect newVersion="7.0.0.0" oldVersion="0.0.0.0-6.0.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
I don't know much about XML, but this also the downstream component cries about, and it seems to me that doesn't now mean this default xmlns
now apply to all included elements inside <configuration>
, whereas before it only applied to the <assemblyBinding>
element?
Is there anyway, using ElementTree, to handle this namespace so that I can take in the upstream's XML, set foo
's value, and then pass that on downstream, without moving the namespace around, and leaving it exactly as I found it?
I could use an lxml-based solution, which seems to handle this, however, lxml has a dependency on C which the downstream component would really like not to have to support: a pure Python solution is preferable.
I could read the document as HTML which would ignore the namespace attribute, let me manipulate the value I want, and then pass on the document; however, I have yet to find a Python parser that doesn't downcase all the element names, and my downstream component requires the casing on all element names to be preserved.
I could resort to string parsing and regular expressions. I would rather not write my own parser.
The only advice I could find so far about namespace handling in ElementTree suggests the "register a default namespace to avoid prefixes" approach, which I assumed would be suitable, but ElementTree then insists on moving the xmlns
declaration up to the root node upon dumping.
I could also be clever build up a string that dumps the tree out in stages and in exactly the right order to put the xmlns
declaration back on the "right node", but that strikes me, also, as pretty darned fragile.
Has anyone managed to get past a problem like this?