Thread: [Sax-devel] Implementing SAX error ids
Brought to you by:
dmegginson
From: Jeff R. <jef...@de...> - 2002-07-08 05:47:52
|
I have been looking into the SAX error ids and what it means for AElfred. I am frightened, although I don't know that I need to be. I am wondering if there are plans to implement this new SAX extension. My initial thought is to have some sort of currentParseID in the XmlParser that keeps track of what (most specifically) is being checked. There are a couple areas of the code that already have similar behavior (for the express purpose of error reporting). So for example, in James Clark's xmltest not-wf-sa-001 the check is for the attribute production (rule [41]). The specific part that fails is the check for the name (rule [5]) because the first character is a "?" which fails the test for letter (rule [84]) because it is not a valid BaseChar (rule [85]) or Ideographic character (rule [86]). According to the new SAX docs, the goal is to identify the most specific. I am not sure if this is rule [86] or rule [41]. It must mean [86] right? It is the most specific, and as the SAX docs mention, if we went the other direction [1] (the document production) would be before [41]. In AElfred we could add the following into parseAttribute: private void parseAttribute (String name) throws Exception { // store current String saveCurrParseID = currParseID; String aname; String type; String value; int flags = LIT_ATTRIBUTE | LIT_ENTITY_REF; // Read the attribute name. currParseID = 'http://xml.org/sax/exception/xml/rule-41'; aname = readNmtoken (true); type = getAttributeType (name, aname); // Parse '=' parseEq (); // Read the value, normalizing whitespace // unless it is CDATA. currParseID = 'http://xml.org/sax/exception/xml/rule-10'; if (type == "CDATA" || type == null) { value = readLiteral (flags); } else { value = readLiteral (flags | LIT_NORMALIZE); } currParseID = 'http://xml.org/sax/exception/xml/wfc-UniqueAttSpec'; // WFC: no duplicate attributes for (int i = 0; i < tagAttributePos; i++) if (aname.equals (tagAttributes [i])) error ("duplicate attribute", aname, null); // restore to previous currParseID = saveCurrParseID; As you can see, knowing when to switch is vague at best. In essence, all of the above is covered by rule [41]. But the way that AElfred (and probably every other parser) is built is that rule [85] and [86] are the most "specific". Following the above logic, placing saveCurrParseID and setting currParseID would make sense in each function (i.e. in readNMToken setting to rule [5] and in parseEq setting to rule [25]). So perhaps a better solution is some notion of an ID stack. As new IDs are encountered they are pushed onto the stack, and popped off as they are left. In AElfred, this is easy, in others I don't know. But again, I will bring up the idea of wanting the full stack information at the point of failure (of course this would be optional...): http://xml.org/sax/exception/xml/rule-41 http://xml.org/sax/exception/xml/rule-5 http://xml.org/sax/exception/xml/rule-84 http://xml.org/sax/exception/xml/rule-85 http://xml.org/sax/exception/xml/rule-86 could be turned into something (a little) more friendly: "parsing an attribute (41) the name of the attribute (5) first character (84) not a base character or ideographic character (85,86)" Of course, the stack would probably be much larger then, leading all the way back to [1] (as mentioned in the SAX documentation). I should note that in James Clark's tests he does not always list the most specific but rather the most "helpful". In not-wf-sa-001, as mentioned above, he cites [41]. Should there be a change to the xmltest references? Thanks, Jeff Rafter Defined Systems http://www.defined.net XML Development and Developer Web Hosting |