Thread: [Sax-devel] Implementing SAX error ids

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I have been looking into the SAX error ids and what it means for AElfred. I
am frightened, although I don't know that I need to be. I am wondering if
there are plans to implement this new SAX extension. My initial thought is
to have some sort of currentParseID in the XmlParser that keeps track of
what (most specifically) is being checked. There are a couple areas of the
code that already have similar behavior (for the express purpose of error
reporting).

So for example, in James Clark's xmltest not-wf-sa-001 the check is for the
attribute production (rule [41]). The specific part that fails is the check
for the name (rule [5]) because the first character is a "?" which fails the
test for letter (rule [84]) because it is not a valid BaseChar (rule [85])
or Ideographic character (rule [86]). According to the new SAX docs, the
goal is to identify the most specific. I am not sure if this is rule [86] or
rule [41]. It must mean [86] right? It is the most specific, and as the SAX
docs mention, if we went the other direction [1] (the document production)
would be before [41]. In AElfred we could add the following into
parseAttribute:

    private void parseAttribute (String name)
    throws Exception
    {
  // store current
  String saveCurrParseID = currParseID;
  String aname;
  String type;
  String value;
  int flags = LIT_ATTRIBUTE |  LIT_ENTITY_REF;

  // Read the attribute name.
  currParseID = 'http://xml.org/sax/exception/xml/rule-41';
  aname = readNmtoken (true);
  type = getAttributeType (name, aname);

  // Parse '='
  parseEq ();

  // Read the value, normalizing whitespace
  // unless it is CDATA.
  currParseID = 'http://xml.org/sax/exception/xml/rule-10';
  if (type == "CDATA" || type == null) {
      value = readLiteral (flags);
  } else {
      value = readLiteral (flags | LIT_NORMALIZE);
  }

  currParseID = 'http://xml.org/sax/exception/xml/wfc-UniqueAttSpec';
  // WFC: no duplicate attributes
  for (int i = 0; i < tagAttributePos; i++)
      if (aname.equals (tagAttributes [i]))
    error ("duplicate attribute", aname, null);

  // restore to previous
  currParseID = saveCurrParseID;

As you can see, knowing when to switch is vague at best. In essence, all of
the above is covered by rule [41]. But the way that AElfred (and probably
every other parser) is built is that rule [85] and [86] are the most
"specific". Following the above logic, placing saveCurrParseID and setting
currParseID would make sense in each function (i.e. in readNMToken setting
to rule [5] and in parseEq setting to rule [25]).

So perhaps a better solution is some notion of an ID stack. As new IDs are
encountered they are pushed onto the stack, and popped off as they are left.
In AElfred, this is easy, in others I don't know. But again, I will bring up
the idea of wanting the full stack information at the point of failure (of
course this would be optional...):

http://xml.org/sax/exception/xml/rule-41
http://xml.org/sax/exception/xml/rule-5
http://xml.org/sax/exception/xml/rule-84
http://xml.org/sax/exception/xml/rule-85
http://xml.org/sax/exception/xml/rule-86

could be turned into something (a little) more friendly:

    "parsing an attribute (41) the name of the attribute (5) first
    character (84) not a base character or ideographic character (85,86)"

Of course, the stack would probably be much larger then, leading all the way
back to [1] (as mentioned in the SAX documentation). I should note that in
James Clark's tests he does not always list the most specific but rather the
most "helpful". In not-wf-sa-001, as mentioned above, he cites [41]. Should
there be a change to the xmltest references?

Thanks,

Jeff Rafter
Defined Systems
http://www.defined.net
XML Development and Developer Web Hosting

Thread: [Sax-devel] Implementing SAX error ids

sax-devel