It all comes down to namespaces
xml tags can (but must not) have a namespace. So even if the root node defines a default namespace, child nodes are allowed to not have a namespace, which is not equivalent to be in the default namespace.
This is the difference between your element1
and element2
: element1
's subelement has no namespace; element2
's subelement is in the default namespace, since when you create it you specify the default namespace. If you try
element2.find("{j:l}b"))
-> returns the element b
, or to be more accurate, the element {j:a}b
.
So yes, namespace matters. And when you create the elements with lxml, you can define elements without namespace: just don't add it.
But what about serialization?
Now I am not an lxml expert, so this is just my guess on the point. Thing is when you serialize the element, there is no way to discriminate between elements which are really without namespace and element in the default namespace, so they are represented in the same way.
Consequently, serializing an element and then parsing it again, cannot give the original result. If for example, using your element1
you do:
sel1 = etree.tostring(element1)
element1s = etree.fromstring(sel1)
It turns out that element1s
is not equal to element1
, because the subelement b
now is subelement {j:a}b
. When parsing the string, elements without namespace are added to the default namespace.
Conclusion
Now, I don't know if this is intended or is a bug. At the best of my knowledge, if an XML document declares a default namespace, all elements which do not explicitly have a different namespace should be considered in the default namespace. As it happens when you parse an xml document with the fromstring
function. You can have a "no namespace" only if no default namespace is declared.
So in my opinion your b
subelement of element1
should "inherit" the namespace of the parent node, since parent node defines a default namespace with nsmap={None: "j:a"}
.
But you could also be told that since you are building the document using lxml elements, it's your responsibility to put each element in the correct namespace, which means you have to add the default namespace explicitly.
Since elements without namespaces are allowed by xml under some circustances, lxml does not complain when an element do not have a namespace.
I think that automatic addition of the default namespaces to subelement of elements which declare a default namespace would be a cool feature, but it's just not there.
<a xmlns="j:a"><b xmlns="" /></a>
. That would be consistent, it would generate a legal XML tree, it would behave the same after parsing, and it would make it immediately obvious that the API wants explicit namespaces as in.SubElement(element2, '{j:a}b')
. – Race