Re: [ssax-sxml] XML parsing bug?
Brought to you by:
oleg
From: <ol...@ok...> - 2011-08-27 03:30:24
|
Hello! > <p><b>so,</b> <b>so</b> and <b>so</b></p> > I get > (*TOP* (p > (b "so,") > (b "so") > " and " > (b "so"))) > The space between the first "b"-tags is lost. SSAX is generally a framework for XML parsing. The SSAX library includes useful instances of the framework, such as the function ssax:xml->sxml, which you must be using. That function invokes the function ssax:reverse-collect-str-drop-ws (see line 2771 of SSAX.scm), whose comments describe it as follows: ; ssax:reverse-collect-str-drop-ws LIST-OF-FRAGS -> LIST-OF-FRAGS ; given the list of fragments (some of which are text strings) ; reverse the list and concatenate adjacent text strings. ; We also drop "unsignificant" whitespace, that is, whitespace ; in front, behind and between elements. The whitespace that ; is included in character data is not affected. ; We use this procedure to "intelligently" drop "insignificant" ; whitespace in the parsed SXML. If the strict compliance with ; the XML Recommendation regarding the whitespace is desired, please ; use the ssax:reverse-collect-str procedure instead. So, you my wish to re-define ssax:xml->sxml in your project to call ssax:reverse-collect-str function. Or, add the definition (define ssax:reverse-collect-str-drop-ws ssax:reverse-collect-str) (The latter may work on some scheme systems but not the others; it depends on how top-level `define' is implemented. On Gambit, it also depends on the `block' option/declaration). To recap: the observed behavior is intentional. There is a way to change it. Cheers, Oleg |