I was just writing this : 

One can prevent DTD validation by DocumentBuildingFactory.setValidation(false);

Alternatively the DTD could be added as a local resource and set a custom entity resolver by parser.setEntityResolver(resolver) (as in Egon's email).  The resolver will return InputStream to the local DTD. 

This could work for test purposes as well as for production, overall relying on remote DTD is not really recommended.


On 6 November 2012 11:18, Egon Willighagen <egon.willighagen@gmail.com> wrote:
Ralf, John,

On Tue, Nov 6, 2012 at 9:56 AM,  <ralf@ark.in-berlin.de> wrote:
> First, the unit test probably needs its copy of the SVG DTD
> somewhere available, I have no idea where. If not, it will
> load the DTD every time from the W3C.

The trick is here to disable the validation... when you hav your SAX
parser, you can do:

parser.setFeature("http://xml.org/sax/features/validation", false);
parser.setEntityResolver(new CMLResolver());

The second turns of the validation, but then, depending on the parser
implementation, it may still try to resolve remote entities... that is
where the resolver comes in.... the default resolver goes online and
try to download the DTD, XML Schema... you can overwrite the entity
resolver, and use local copies, or just return null, which may be more
than enough for the unit test...

Here are the methods from the CMLResolver impl:

     * Not implemented, but uses resolveEntity(String publicId, String systemId)
     * instead.
    public InputSource resolveEntity(String name, String publicId,
                                     String baseURI, String systemId) {
        return resolveEntity(publicId, systemId);

     * Resolves SYSTEM and PUBLIC identifiers for CML DTDs.
     * @param publicId the PUBLIC identifier of the DTD (unused)
     * @param systemId the SYSTEM identifier of the DTD
     * @return the CML DTD as an InputSource or null if id's unresolvable
    public InputSource resolveEntity (String publicId, String systemId) {
        logger.debug("CMLResolver: resolving ", publicId, ", ", systemId);
        systemId = systemId.toLowerCase();
        if ((systemId.indexOf("cml-1999-05-15.dtd") != -1) ||
            (systemId.indexOf("cml.dtd") != -1) ||
            (systemId.indexOf("cml1_0.dtd") != -1)) {
            logger.info("File has CML 1.0 DTD");
            return getCMLType( "cml1_0.dtd" );
        } else if ((systemId.indexOf("cml-2001-04-06.dtd") != -1) ||
                   (systemId.indexOf("cml1_0_1.dtd") != -1) ||
                   (systemId.indexOf("cml_1_0_1.dtd") != -1)) {
            logger.info("File has CML 1.0.1 DTD");
            return getCMLType( "cml1_0_1.dtd" );
        } else {
            logger.warn("Could not resolve systemID: ", systemId);
            return null;

The CDK ships a few DTDs, as you can see here...


Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
Cdk-devel mailing list