The DTD we have been using so far has not yet been used for actual
validation of the SrcML documents. I have however decided that once
we go for actual validation we might as well drop the DTD in favor of
a proper XML Schema.
Right now you can find a 1:1 translation of the DTD as XML Schema in
api/srcml.xsd . Note that there have been several corrections to the
DTD as it hasn't been updated along with the rest of the project. The
new XML Schema however is also used in the api/tests/SchemaTest.java
unit test which parses all java files in the SrcML API module and
validates them against the schema.
Note that the srcml.xsd still can be enhanced further to make up for
the deficiencies we had in the DTD. However I suggest waiting with that
until we start developing a parser for the next language. It will be
easier and probably less error-prone to adjust the Schema to both
parsers/languages at that point.
A side effect of the XML Schema support and need of validation is
however the need for a new dependency: Xerces-J. Luckily its SAXReader
also works better than the dom4j internal SAXReader we've been using
so far. This means we get faster parsing of .srcml files and as a bonus
we can now also read out the encoding information from the .srcml file.
--
Raiser, Frank
Student @ University of Ulm (www.uni-ulm.de)
When debugging, novices insert corrective code;
experts remove defective code. (Richard Pattis)
|