I am attempting to use Lxml to parse the contents of a .docx document. I understand that lxml replaces namespace prefixes with the actual namespace, however this makes it a real pain to check what kind of element tag I am working with. I would like to be able to do something like
if (someElement.tag == "w:p"):
but since lxml insists on prepending te ful namespace I'd either have to do something like
if (someElemenet.tag == "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p'):
or perform a lookup of the full namespace name from the element's nsmap attribute like this
targetTag = "{%s}p" % someElement.nsmap['w']
if (someElement.tag == targetTag):
If there were was an easier way to convince lxml to either
- Give me the tag string without the namespace appended to it, I can use the prefix attribute along with this information to check which tag I'm working with OR
- Just give me the tag string using the prefix
This would save a lot of keystrokes when writing this parser. Is this possible? Am I missing something in the documentation?