[ssax-sxml] Portability improvements in SSAX and related libraries
Brought to you by:
oleg
From: <ol...@po...> - 2003-04-10 00:27:00
|
Hello! This is a notice of changes to SSAX.scm and related library code in the SSAX CVS repository. The changes were to improve portability and to better align the code with SRFIs (SRFI-13, to be precise). The compliance with the XML Recommendation with respect to handling of whitespace has been checked. It is expected that the changes might result in a small performance improvement. The previous version of SSAX included a few non-standard named character references #\tab and #\return. Although many Scheme systems support such an extension of R5RS, there are some important systems (such as Scheme48 and SCSH) that do not. Therefore, when SSAX was ported to SCSH, the lines of SSAX code that contained #\return were commented out. The resulting code was therefore, out of compliance with the XML Recommendation in that (admittedly, insignificant) respect. Furthermore, the previous version of SSAX.scm included a great deal of sequences \r, \n, and \t in character strings. These sequences occurred only in the test code. Again, although such escape sequences are supported by many Scheme systems, they are not standard. The code of the current version of SSAX has been reworked so to remove all instances of non-standard character references or string escape sequences. There are no longer any occurrence of #\return, #\tab, \n, \r, and \t. This change should make SSAX more portable. I've had a vague feeling that SSAX doesn't precisely follow the XML Recommendation in handling of Carriage Return (CR) characters. Re-reading of the Recommendation and the examination of the code showed that it was not the case. Nevertheless, a few new tests were added to help convince me that CRs are indeed processed correctly. Procedures ssax:reverse-collect-str and ssax:reverse-collect-str-drop-ws turned out quite useful. Therefore, in the new version of SSAX, they are moved to the top level. Perviously, they were internal to ssax:xml->sxml. We use ssax:reverse-collect-str-drop-ws to "intelligently" drop "insignificant" whitespace in the parsed SXML. If the strict compliance with the XML Recommendation regarding the whitespace is desired, please use the procedure ssax:reverse-collect-str instead. A close examination of the code of these procedures has led to a proof of their application to a one-element list. The proof justified a fast-path evaluation, which seems to be frequent. A new file SSAX/lib/char-encoding.scm was added to isolate the code that deals with character encoding. Unfortunately, this code must be platform-specific, since R5RS does not say much about encoding of characters. At present, char-encoding.scm contains the bare minimum of code to make SSAX happy. More code will probably be added in the future. Hopefully, the file will be obsoleted by a future character-encoding SRFI. An idiom "apply string-append" in SSAX and the library code has been replaced with string-concatenate[-reverse]/shared of SRFI-13. An examination of SSAX/lib/util.scm showed that some code exactly or nearly duplicated SRFI-13. Such code was removed. We strongly prefer SRFI-13 for all relevant string processing. A new file SSAX/lib/srfi-13-local.scm was added with the subset of SRFI-13 that is actually employed in the input parsing library, XML parser and other code. The file srfi-13-local.scm can therefore be used on those Scheme systems that do not currently offer SRFI-13 natively. A new test case, SSAX/tests/vsrfi-13.scm was added to validate the required SRFI-13 functions. All the tests pass for srfi-13-local.scm. An implementation that offers a native SRFI-13 may still wish to run vsrfi-13.scm for validation purposes. The changes in SSAX.scm, besides enhancing portability, ought to slightly improve performance. I need to run the benchmark to know for sure. The validation tests and SSAX/SXML examples have been updated to account for the changes to SSAX.scm and the library. All validation tests pass, on Gambit-C interpreter, Bigloo compiler and interpreter, and SCM interpreter. I am planning to replace all define-macros in SSAX with syntax-rules. The latter are the standard, which has become widely supported. A number of high-quality portable macro-expanders (by Dybvig, Hieb and Al Petrofsky) are available. Even Marc Feeley, a devotee of lower-level macros, found it fit to port Dybvig and Hieb's portable high-level macroexpander to Gambit and to mention it on the Gambit's web page. After that, a new release of SSAX should probably be made. Opinions? |