#1 New feature for system identifiers

Rob Lugt

This is a request for a new standard feature to be
allocated from the "http://xml.org/sax/features/"

Feature: http://xml.org/sax/features/preserve-systemIds

When this feature is set to "true", calls to
DTDHandler::notationDecl and
DTDHandler::unparsedEntityDecl will be changed to
*NOT* turn system identifiers that are relative URLs
into absolute URLs.

Note that, according to the SAX 2.0 spec,
LexicalHandler::startDTD and
DeclHandler::externalEntityDecl already have this

There are two prime motivations for this request.
1) This aligns SAX closer with the W3C CR: XML
Information Set [1] which requires the value of system
identifiers to be made available for:
-2.5. Unexpanded Entity Reference Information Items
-2.8. The Document Type Declaration Information Item
-2.9. Unparsed Entity Information Items
-2.10. Notation Information Items

2) The OASIS ERTC [2] are working on an XML Catalog
specification which recommends processors should use
an unmodified system identifier for comparison with
catalog entries. It is not currently possible to
implement this using SAX 2.0.

This proposal does not cater for another inportant
information item: baseURI which is also required by
the XML Information Set.

Ideally a new interface should be defined which
provides this, e.g.

resolveEntity(name, publicId, systemId, baseURI).

In the meantime, the baseURI could be made available
as a property.

This will be the subject of another enhancement
request in due course.

Note: this property is not available using
Locator::getSystemId() because the baseURI is
determined by the entity in effect when the "<" of the
entity declaration was read. This is not the value
returned from the Locator.

Rob Lugt
ElCel Technology

[1] http://www.w3.org/TR/xml-infoset/
[2] http://www.oasis-open.org/committees/entity/


    • labels: 339610 --> features and properties
    • assigned_to: nobody --> dmegginson
  • David Brownell
    David Brownell

    Logged In: YES

    Attached is a file that checks what various parsers
    do for the specified methods. I checked against
    AElfred2 (current), Crimson (1.1.2beta2), and
    Xerces (1.4.3). (Oracle's parser wasn't handy, I'm
    getting a current copy.)

    Summary: resolveEntity() passes absolute URIs.
    Everywhere else, AElfred2 and Xerces already pass relative
    URIs, while Crimson passes absolutized ones.

    I think it's probably reasonable to expect everything
    except resolveEntity() to pass relative URIs normally.
    Modulo the fact that some parser(s) clearly have bugs
    in that area -- a conformance test issue.

    - Dave

  • David Brownell
    David Brownell

    show how parsers handle URIs

  • David Brownell
    David Brownell

    Logged In: YES

    OK, the Oracle XDK [beta] parser
    is like Sun's: it absolutizes everything.

    There may indeed need to be a feature flag that
    controls this; managing the installed base will
    be tricky.

  • David Brownell
    David Brownell

    Logged In: YES

    XP 0.5 also absolutizes everything. I'll call AElfred's
    behavior a bug (now fixed), ditto Xerces (not fixed).

    I've now documented a new feature flag, 'resolve-dtd-uris'.
    Default value must be backward compatible ("true"), but
    setting it to false makes reports of system IDs in
    declarations (the four methods listed above) not be
    reported as pre-absolutized.

    That leaves another half of this RFE, likely to
    be packaged as a "EntityResolver2" interface (TBD).

  • David Brownell
    David Brownell

    • assigned_to: dmegginson --> dbrownell
  • David Brownell
    David Brownell

    Logged In: YES

    EntityResolver2 is now checked in, and an
    initial implementation is available in
    AElfred2. So I'm marking this as resolved.

  • David Brownell
    David Brownell

    • status: open --> closed-fixed