How do I remove namespaces from xml, using java dom?
Asked Answered
L

9

14

I have the following code

DocumentBuilderFactory dbFactory_ = DocumentBuilderFactory.newInstance();
Document doc_;
DocumentBuilder dBuilder = dbFactory_.newDocumentBuilder();
StringReader reader = new StringReader(s);
InputSource inputSource = new InputSource(reader);
doc_ = dBuilder.parse(inputSource);
doc_.getDocumentElement().normalize();

Then I can do

doc_.getDocumentElement();

and get my first element but the problem is instead of being job the element is tns:job.

I know about and have tried to use:

dbFactory_.setNamespaceAware(true);

but that is just not what I'm looking for, I need something to completely get rid of namespaces.

Any help would be appreciated, Thanks,

Josh

Lobster answered 11/1, 2011 at 18:22 Comment(3)
Why do you want to get rid of namespaces, instead of coping with them?Martingale
I have some legacy code that doesn't support them.Lobster
If it's legacy POS, maybe just use brute-force stripping out of namespace prefixes; even something as simple as regexp would work. It's not the right way in general, but sometimes crap is to be fought with crap. :)Ribera
E
7

For Element and Attribute nodes:

Node node = ...;
String name = node.getLocalName();

will give you the local part of the node's name.

See Node.getLocalName()

Epanodos answered 11/1, 2011 at 18:29 Comment(2)
Is there anyway to completely remove them from the xml? Or are they here to stay?Lobster
As Anon and Tomalak have mentioned, you really don't want to strip namespace info from your XML. This is a good workaround for your particular case, but I would leave the namespace info intact.Epanodos
P
15

Use the Regex function. This will solve this issue:

public static String removeXmlStringNamespaceAndPreamble(String xmlString) {
  return xmlString.replaceAll("(<\\?[^<]*\\?>)?", ""). /* remove preamble */
  replaceAll("xmlns.*?(\"|\').*?(\"|\')", "") /* remove xmlns declaration */
  .replaceAll("(<)(\\w+:)(.*?>)", "$1$3") /* remove opening tag prefix */
  .replaceAll("(</)(\\w+:)(.*?>)", "$1$3"); /* remove closing tags prefix */
}
Perilune answered 7/7, 2011 at 5:12 Comment(3)
Using regexes to remove all namespaces just can't be a good thing, even if this code works.Sophisticated
@Sophisticated I agree with you, but I haven't found a better solution ...Leralerch
@Tomalak's XSLT is a better solution. It uses XML to process XML.Sophisticated
M
8

You can pre-process XML to remove all namespaces, if you absolutely must do so. I'd recommend against it, as removing namespaces from an XML document is in essence comparable to removing namespaces from a programming framework or library - you risk name clashes and lose the ability to differentiate between once-distinct elements. However, it's your funeral. ;-)

This XSLT transformation removes all namespaces from any XML document.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*">
    <xsl:element name="{local-name()}">
      <xsl:apply-templates select="node()|@*" />
    </xsl:element>
  </xsl:template>

  <xsl:template match="@*">
    <xsl:attribute name="{local-name()}">
      <xsl:apply-templates select="node()|@*" />
    </xsl:attribute>
  </xsl:template>
</xsl:stylesheet>

Apply it to your XML document. Java examples for doing such a thing should be plenty, even on this site. The resulting document will be exactly of the same structure and layout, just without namespaces.

Martingale answered 11/1, 2011 at 19:3 Comment(0)
E
7

For Element and Attribute nodes:

Node node = ...;
String name = node.getLocalName();

will give you the local part of the node's name.

See Node.getLocalName()

Epanodos answered 11/1, 2011 at 18:29 Comment(2)
Is there anyway to completely remove them from the xml? Or are they here to stay?Lobster
As Anon and Tomalak have mentioned, you really don't want to strip namespace info from your XML. This is a good workaround for your particular case, but I would leave the namespace info intact.Epanodos
C
3

Rather than

dbFactory_.setNamespaceAware(true);

Use

dbFactory_.setNamespaceAware(false);

Although I agree with Tomalak: in general, namespaces are more helpful than harmful. Why don't you want to use them?


Edit: this answer doesn't answer the OP's question, which was how to get rid of namespace prefixes. RD01 provided the correct answer to that.

Curiosity answered 11/1, 2011 at 18:28 Comment(1)
@Lobster - so is the issue that you're still seeing the prefix when you use a parser that's not namespace aware? If yes, then look at RD01's answer.Curiosity
L
2

Tomalak, one fix of your XSLT (in 3rd template):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="node()">
    <xsl:copy>
        <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*">
    <xsl:element name="{local-name()}">
        <xsl:apply-templates select="node() | @*" />
    </xsl:element>
  </xsl:template>

  <xsl:template match="@*">
    <!-- Here! -->
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>

  </xsl:template>
</xsl:stylesheet>
Larcenous answered 13/12, 2011 at 12:18 Comment(0)
C
2

The size of the input xml also needs to be considered when choosing the solution. For large xmls, in the size of ~100k, possible if your input is from a web service, you also need to consider the garbage collection implications when you manipulate a large string. We used String.replaceAll before, and it caused frequent OOM in production with a 1.5G heap size because of the way replaceAll is implemented.

You can reference http://app-inf.blogspot.com/2013/04/pitfalls-of-handling-large-string.html for our findings.

I am not sure how XSLT deals with large String objects, but we ended up parsing the string manualy to remove prefixes in one parse to avoid creating additional large java objects.

public static String removePrefixes(String input1) {
    String ret = null;
    int strStart = 0;
    boolean finished = false;
    if (input1 != null) {
        //BE CAREFUL : allocate enough size for StringBuffer to avoid expansion
        StringBuffer sb = new StringBuffer(input1.length()); 
        while (!finished) {

            int start = input1.indexOf('<', strStart);
            int end = input1.indexOf('>', strStart);
            if (start != -1 && end != -1) {
                // Appending anything before '<', including '<'
                sb.append(input1, strStart, start + 1);

                String tag = input1.substring(start + 1, end);
                if (tag.charAt(0) == '/') {
                    // Appending '/' if it is "</"
                    sb.append('/');
                    tag = tag.substring(1);
                }

                int colon = tag.indexOf(':');
                int space = tag.indexOf(' ');
                if (colon != -1 && (space == -1 || colon < space)) {
                    tag = tag.substring(colon + 1);
                }
                // Appending tag with prefix removed, and ">"
                sb.append(tag).append('>');
                strStart = end + 1;
            } else {
                finished = true;
            }
        }
        //BE CAREFUL : use new String(sb) instead of sb.toString for large Strings
        ret = new String(sb);
    }
    return ret;
}
Confess answered 14/4, 2013 at 6:41 Comment(0)
H
2
public static void wipeRootNamespaces(Document xml) {       
    Node root = xml.getDocumentElement();
    NodeList rootchildren = root.getChildNodes();
    Element newroot = xml.createElement(root.getNodeName());

    for (int i=0;i<rootchildren.getLength();i++) {
        newroot.appendChild(rootchildren.item(i).cloneNode(true));
    }

    xml.replaceChild(newroot, root);
}
Hijack answered 14/5, 2013 at 19:42 Comment(1)
line #4 ... root.getLocalName(); (?)Maineetloire
T
1

Instead of using TransformerFactory and then calling transform on it (which was injecting the empty namespace, I transformed as follows:

    OutputStream outputStream = new FileOutputStream(new File(xMLFilePath));
    OutputFormat outputFormat = new OutputFormat(doc, "UTF-8", true);
    outputFormat.setOmitComments(true);
    outputFormat.setLineWidth(0);

    XMLSerializer serializer = new XMLSerializer(outputStream, outputFormat);
    serializer.serialize(doc);
    outputStream.close();
Tamartamara answered 21/1, 2015 at 18:25 Comment(0)
S
0

I also faced the namespace issue and was unable to read XML file in java. below is the solution:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);// this is imp code that will deactivate namespace in xml
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("XML/"+ fileName);
Sena answered 17/9, 2019 at 5:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.