Menu

#8 Parsing of successive empty parameter entities

open
None
5
2004-09-01
2004-08-28
No

If an element's content model contains two or more
parameter entities which resolve to empty strings, each
followed by white space, then the parser incorrectly
treats the next non-space character it finds (in this
case 'a') as a token separator: "Bad separator in
content model: a" (XpParser.pas line 2177).

Example DTD (eadfrag.dtd):

<!-- EAD fragment to demonstrate XMLPartner bug with
parameter entities -->

<!ENTITY % m.desc.base.dep
''

>
<!ENTITY % m.organization.dep
''

>

<!ELEMENT ead
((%m.desc.base.dep; %m.organization.dep;
accessrestrict)*)

>

<!ATTLIST ead
id ID #IMPLIED
>

Example document:

<?xml version='1.0'?>
<!DOCTYPE ead PUBLIC '+//ISBN 1-931666-00-8//DTD
ead.dtd (Encoded Archival
Description (EAD) Version 2002)//EN' 'eadfrag.dtd'>
<ead id="test1"/>

As you'll gather from the test data, I actually hit
this bug while trying to parse an EAD (Encoded Archival
Description) document.

Discussion

  • Richard Light

    Richard Light - 2004-08-28

    Logged In: YES
    user_id=966026

    The one-level case I posted can be dealt with by changing
    lines 2138-9 to:

    if TryRead(Xpc_ParamEntity) then begin
    ParseParameterEntityRef(True, False);
    {!!.51}
    SkipWhiteSpace(True);
    {!!.59 rbl}
    end;

    However, adding another level of parameter entity references:

    <!-- EAD fragment to demonstrate XMLPartner bug with
    parameter entities -->

    <!ENTITY % m.desc.base.dep
    ''

    >
    <!ENTITY % m.organization.dep
    ''

    >

    <!ENTITY % m.desc.base
    '%m.desc.base.dep; %m.organization.dep; accessrestrict'

    >

    <!ENTITY % m.desc.full
    '%m.desc.base; | dsc'

    >

    <!ELEMENT ead
    ((%m.desc.full;)*)

    >

    <!ATTLIST ead
    id ID #IMPLIED
    >

    causes the bug to re-appear. The fundamental problem, it
    seems to me, is that parameter entities should be resolved
    at a lower level in the parsing process, before you start
    trying to read tokens at all. What does anyone else think?

     
  • Richard Light

    Richard Light - 2004-09-01
    • assigned_to: nobody --> richardlight
     

Log in to post a comment.