How to stop a DTD connection 3
I needed to parse an XML document. The XML document had a DTD. I had no network connection. And every single time I tried to parse it, my program died because it couldn’t connect to the DTD’s URL.
You’d think that this would be easy to turn off, but there is no way to prevent this without additional code. No Xerces or JAXP feature and/or attribute will prevent an XML parser from pulling that DTD in somehow, even if you turn off validation altogether and tell it not to resolve external entities.
Eventually I found the answer in the dom4j FAQ. The crushingly obvious and straightforward “create an anonymous inner class extending EntityResolver, match the public_id and call getResourceAsStream() on the class after packaging the DTD inside the classpath” solution. How could I have possibly missed that?
EntityResolver resolver = new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId) {
if ( publicId.equals( "-//Acme//DTD Foo 1.2//EN" ) ) {
InputStream in = getClass().getResourceAsStream(
"com/acme/foo.dtd"
);
return new InputSource( in );
}
return null;
}
};
SAXReader reader = new SAXReader();
reader.setEntityResolver( resolver );
Document doc = reader.parse( “foo.xml” )
See also http://doctypechanger.sourceforge.net/ :)
I hit this problem while using Ant’s xslt task and managed to get around it by referencing dummy XML catalog references through the xmlcatalog element. That was extremely useful since I didn’t have the luxury of the code-based change you describe. On a related note, have you taken a look at Norman Walsh’s resolver classes?
thanks… mircmirc indirmirc yükle