XSL - How to remove unused namespaces from source xml?
Asked Answered
J

3

19

I have an xml with a lot of unused namespaces, like this:

<?xml version="1.0" encoding="UTF-8"?>
<ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com" xmlns:ns3="http://www.c.com" xmlns:ns4="http://www.d.com">
    <ns1:Body>
        <ns2:a>
            <ns2:b>data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope> 

I would like to remove the unused namespaces without having to specify in the xslt which ones to remove/maintain. The result xml should be this:

<?xml version="1.0" encoding="UTF-8"?>
<ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com">
    <ns1:Body>
        <ns2:a>
            <ns2:b>data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope> 

I've googled a lot but haven't found a solution to this particular issue. Is there any?

Thanks.

PS: Not 100% sure but I think it should be for XSL 1.0.

Jadejaded answered 4/1, 2011 at 11:59 Comment(1)
In scope namespace URI not being part of any QName is not the same as not used. One can think in schema definitions, i.e.Loricate
M
23

Unlike the answer of @Martin-Honnen, this solution produces exactly the desired result -- the necessary namespace nodes remain where they are and are not moved down.

Also, this solution correctly deals with attributes that are in a namespace:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*" priority="-2">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*">
  <xsl:element name="{name()}" namespace="{namespace-uri()}">
   <xsl:variable name="vtheElem" select="."/>

   <xsl:for-each select="namespace::*">
     <xsl:variable name="vPrefix" select="name()"/>

     <xsl:if test=
      "$vtheElem/descendant::*
              [(namespace-uri()=current()
             and 
              substring-before(name(),':') = $vPrefix)
             or
              @*[substring-before(name(),':') = $vPrefix]
              ]
      ">
      <xsl:copy-of select="."/>
     </xsl:if>
   </xsl:for-each>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the following XML document (the provided XML document with an added namespaced attribute):

<ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com" xmlns:ns3="http://www.c.com" xmlns:ns4="http://www.d.com">
    <ns1:Body ns2:x="1">
        <ns2:a>
            <ns2:b>data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope>

the desired, correct result is produced:

<ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com">
   <ns1:Body ns2:x="1">
      <ns2:a>
         <ns2:b>data1</ns2:b>
         <ns2:c>data2</ns2:c>
      </ns2:a>
   </ns1:Body>
</ns1:Envelope>
Mum answered 4/1, 2011 at 14:24 Comment(10)
@mdiez: There is a problem with namespaces... Some implementations don't handle XPath namespace axe.Loricate
To expand on the comment of Implementations unaware of the namespace axis: This will result in namespaces being "pushed down" to each element that uses the namespace. This, f.e., applies to the default TransformerFactory of Java, whereas the Saxon implementation handles this correctly.Conversable
If you like to preserve namespaces that only occur in attribute values and therefore are not syntactically necessary, see @Gentil's answer below.Conversable
@Leviathan, I wouldn't recommend trying to guess whether the string value of an attribute is a QName or just happens to be a syntactically-valid QName -- in the general case this is just guessing. One could use schema information, if it is known that the XML document is an instance of a given schema.Mum
I see your point, but in a situation where you just want to strip as many namespaces as possible without side effects this approach should be a fair compromise, since any errors will result in having a harmless superfluous namespace. If you do not look for namespaces in attribute values, though, you may end up with a namespace removed that was actually necessary. This obviously depends on the kind of xml you are parsing.Conversable
This solution unfortunately does not work when having a default namespace applied. The result will have a namespace declaration on every tagMargalit
@Xyaren, Why do you think this is a problem? The task is to remove the unused namespaces. The default namespace is used -- everywhere -- so it is OK if it is not removed. Actually, it is not possible to remove used namespaces (that is namespaces for which in the document there are elements in them), because this will result in modifying the document.Mum
Nevermind, it was a bug in the XML processor of my Java version. Updating to Saxon fixed the issue.Margalit
Update: I just got in contact with this bug again, and it seems that this transformation does not work with apache-xalan, which is the default java transformation library included in the sdk. Installing the Saxon-HE dependency (mvnrepository.com/artifact/net.sf.saxon/Saxon-HE) will produce the desired results.Margalit
@Margalit This means that apache-xalan is buggy. The transformation in this answer is standard (no extensions) XSLT 1.0 and should produce the same results with any compliant XSLT 1.0 processor.Mum
D
3

Well if you use

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

  <xsl:template match="@* | text() | comment() | processing-instruction()">
    <xsl:copy/>
  </xsl:template>

  <xsl:template match="*">
    <xsl:element name="{name()}" namespace="{namespace-uri()}">
      <xsl:apply-templates select="@* | node()"/>
    </xsl:element>
  </xsl:template>

</xsl:stylesheet>

then unused namespaces are removed but the result is more likely to look like

<ns1:Envelope xmlns:ns1="http://www.a.com">
    <ns1:Body>
        <ns2:a xmlns:ns2="http://www.b.com">
            <ns2:b>data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope>

than what you asked for.

Dibbell answered 4/1, 2011 at 12:49 Comment(2)
+1 Also good answer without the "not always implemented" namespace axe. But it could end up with a lot of "duplicated" namespace declarations for elements being the first under the namespace URI for each branch.Loricate
Perfectly good answer. It's semantically equivalent; the location of the namespace declarations doesn't matter.Bankable
A
1

Adding to Dimitre's answer, if those namespaces should be preserved that only occur in attribute values, add this condition: @*[contains(.,concat($vPrefix,':'))]:

  <xsl:if test= "$vtheElem/descendant::* [namespace-uri() = current()     and
                   substring-before(name(),':') = $vPrefix or
                   @*[substring-before(name(),':') = $vPrefix] or
                   @*[contains(.,concat($vPrefix,':'))]
                  ]">

This will correctly preserve the namespace ns3 because of attrib="ns3:Header" as in the following example.

 <ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com" xmlns:ns3="http://www.c.com" xmlns:ns4="http://www.d.com">
    <ns1:Body ns2:x="1">
        <ns2:a>
            <ns2:b atrib="ns3:Header">data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope>
Adobe answered 9/2, 2017 at 16:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.