Make DocumentBuilder.parse ignore DTD references
Asked Answered
M

7

90

When I parse my xml file (variable f) in this method, I get an error

C:\Documents and Settings\joe\Desktop\aicpcudev\OnlineModule\map.dtd (The system cannot find the path specified)

I know I do not have the dtd, nor do I need it. How can I parse this File object into a Document object while ignoring DTD reference errors?

private static Document getDoc(File f, String docId) throws Exception{
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(f);


    return doc;
}
Masseur answered 30/9, 2008 at 21:8 Comment(0)
G
61

A similar approach to the one suggested by @anjanb

    builder.setEntityResolver(new EntityResolver() {
        @Override
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            if (systemId.contains("foo.dtd")) {
                return new InputSource(new StringReader(""));
            } else {
                return null;
            }
        }
    });

I found that simply returning an empty InputSource worked just as well?

Groat answered 30/9, 2008 at 22:19 Comment(4)
Setting the features on DocumentBuilderFactory worked for me. The solution in this post did not work.Tetrafluoroethylene
This also worked perfectly for me, even though I thought I didn't use SAXHeck
Sadly this didn't work for me. I still got the error. @jt did it for me though.Offprint
Thanks for the solution, this is the approach recommended by org.xml I think. It looks like there is a lot of material on this topic. see xerces.apache.org/xml-commons/components/resolver/…, or en.wikipedia.org/wiki/XML_Catalog and the javadoc saxproject.org/apidoc/org/xml/sax/EntityResolver.html and saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.htmlPop
C
149

Try setting features on the DocumentBuilderFactory:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setFeature("http://xml.org/sax/features/namespaces", false);
dbf.setFeature("http://xml.org/sax/features/validation", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

DocumentBuilder db = dbf.newDocumentBuilder();
...

Ultimately, I think the options are specific to the parser implementation. Here is some documentation for Xerces2 if that helps.

Comb answered 1/10, 2008 at 1:39 Comment(6)
the last one (load-external-dtd) did the trick for me - thanks.Urogenous
While trying this, I got a DOMException: NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.. I fixed this with dbf.setNamespaceAware(true);Ochoa
Just to let you know, the last feature setting (as stated by @Amarghosh) works great with a SAXParserFactory.Moskva
For me the load-external-dtd setting was enough.Flaunch
Using all the above features also makes the code to fail. Just using the last two feature(nonvalidating) makes my code work.Tremendous
Per the most upvoted comment, load-external-dtd was the trick. Thanks! I was surprised that a feature with http://apache.org/... in the name fixes Java DTD parsing errors, but happy to have a fix that doesn't involve exceptions or XML modification. In my case, this fixed some intermittent failures when parsing Apple PLIST files and when either apple.com was unreachable or when apple.com couldn't respond to the HTTP request. Much obliged. :)Razz
G
61

A similar approach to the one suggested by @anjanb

    builder.setEntityResolver(new EntityResolver() {
        @Override
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            if (systemId.contains("foo.dtd")) {
                return new InputSource(new StringReader(""));
            } else {
                return null;
            }
        }
    });

I found that simply returning an empty InputSource worked just as well?

Groat answered 30/9, 2008 at 22:19 Comment(4)
Setting the features on DocumentBuilderFactory worked for me. The solution in this post did not work.Tetrafluoroethylene
This also worked perfectly for me, even though I thought I didn't use SAXHeck
Sadly this didn't work for me. I still got the error. @jt did it for me though.Offprint
Thanks for the solution, this is the approach recommended by org.xml I think. It looks like there is a lot of material on this topic. see xerces.apache.org/xml-commons/components/resolver/…, or en.wikipedia.org/wiki/XML_Catalog and the javadoc saxproject.org/apidoc/org/xml/sax/EntityResolver.html and saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.htmlPop
E
6

I found an issue where the DTD file was in the jar file along with the XML. I solved the issue based on the examples here, as follows: -

DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new EntityResolver() {
    public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
        if (systemId.contains("doc.dtd")) {
             InputStream dtdStream = MyClass.class
                     .getResourceAsStream("/my/package/doc.dtd");
             return new InputSource(dtdStream);
         } else {
             return null;
         }
      }
});
Errand answered 28/4, 2011 at 13:34 Comment(0)
O
6

Source XML (With DTD)

<!DOCTYPE MYSERVICE SYSTEM "./MYSERVICE.DTD">
<MYACCSERVICE>
   <REQ_PAYLOAD>
      <ACCOUNT>1234567890</ACCOUNT>
      <BRANCH>001</BRANCH>
      <CURRENCY>USD</CURRENCY>
      <TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
   </REQ_PAYLOAD>
</MYACCSERVICE>

Java DOM implementation for accepting above XML as String and removing DTD declaration

public Document removeDTDFromXML(String payload) throws Exception {

    System.out.println("### Payload received in XMlDTDRemover: " + payload);

    Document doc = null;
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    try {

        dbf.setValidating(false);
        dbf.setNamespaceAware(true);
        dbf.setFeature("http://xml.org/sax/features/namespaces", false);
        dbf.setFeature("http://xml.org/sax/features/validation", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

        DocumentBuilder db = dbf.newDocumentBuilder();

        InputSource is = new InputSource();
        is.setCharacterStream(new StringReader(payload));
        doc = db.parse(is); 

    } catch (ParserConfigurationException e) {
        System.out.println("Parse Error: " + e.getMessage());
        return null;
    } catch (SAXException e) {
        System.out.println("SAX Error: " + e.getMessage());
        return null;
    } catch (IOException e) {
        System.out.println("IO Error: " + e.getMessage());
        return null;
    }
    return doc;

}

Destination XML (Without DTD)

<MYACCSERVICE>
   <REQ_PAYLOAD>
      <ACCOUNT>1234567890</ACCOUNT>
      <BRANCH>001</BRANCH>
      <CURRENCY>USD</CURRENCY>
      <TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
   </REQ_PAYLOAD>
</MYACCSERVICE> 
Obstetrics answered 10/11, 2016 at 14:0 Comment(0)
O
2

I know I do not have the dtd, nor do I need it.

I am suspicious of this statement; does your document contain any entity references? If so, you definitely need the DTD.

Anyway, the usual way of preventing this from happening is using an XML catalog to define a local path for "map.dtd".

Overall answered 30/9, 2008 at 21:13 Comment(0)
P
2

here's another user who got the same issue : http://forums.sun.com/thread.jspa?threadID=284209&forumID=34

user ddssot on that post says

myDocumentBuilder.setEntityResolver(new EntityResolver() {
          public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
                 throws SAXException, java.io.IOException
          {
            if (publicId.equals("--myDTDpublicID--"))
              // this deactivates the open office DTD
              return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
            else return null;
          }
});

The user further mentions "As you can see, when the parser hits the DTD, the entity resolver is called. I recognize my DTD with its specific ID and return an empty XML doc instead of the real DTD, stopping all validation..."

Hope this helps.

Palmar answered 30/9, 2008 at 22:11 Comment(0)
O
0

I'm working with sonarqube, and sonarlint for eclipse showed me Untrusted XML should be parsed without resolving external data (squid:S2755)

I managed to solve it using:

    factory = DocumentBuilderFactory.newInstance();

    factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

    // If you can't completely disable DTDs, then at least do the following:
    // Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-general-entities
    // Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-general-entities
    // JDK7+ - http://xml.org/sax/features/external-general-entities
    factory.setFeature("http://xml.org/sax/features/external-general-entities", false);

    // Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-parameter-entities
    // Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-parameter-entities
    // JDK7+ - http://xml.org/sax/features/external-parameter-entities
    factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

    // Disable external DTDs as well
    factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

    // and these as well, per Timothy Morgan's 2014 paper: "XML Schema, DTD, and Entity Attacks"
    factory.setXIncludeAware(false);
    factory.setExpandEntityReferences(false);
Obstinate answered 5/2, 2020 at 16:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.