> <p><b>so,</b> <b>so</b> and <b>so</b></p>
> I get
> (*TOP* (p
> (b "so,")
> (b "so")
> " and "
> (b "so")))
> The space between the first "b"-tags is lost.
SSAX is generally a framework for XML parsing. The SSAX library
includes useful instances of the framework, such as the function
ssax:xml->sxml, which you must be using. That function invokes the
(see line 2771 of SSAX.scm), whose comments describe it
; ssax:reverse-collect-str-drop-ws LIST-OF-FRAGS -> LIST-OF-FRAGS
; given the list of fragments (some of which are text strings)
; reverse the list and concatenate adjacent text strings.
; We also drop "unsignificant" whitespace, that is, whitespace
; in front, behind and between elements. The whitespace that
; is included in character data is not affected.
; We use this procedure to "intelligently" drop "insignificant"
; whitespace in the parsed SXML. If the strict compliance with
; the XML Recommendation regarding the whitespace is desired, please
; use the ssax:reverse-collect-str procedure instead.
So, you my wish to re-define ssax:xml->sxml in your project to call
ssax:reverse-collect-str function. Or, add the definition
(define ssax:reverse-collect-str-drop-ws ssax:reverse-collect-str)
(The latter may work on some scheme systems but not the others; it
depends on how top-level `define' is implemented. On Gambit, it also
depends on the `block' option/declaration).
To recap: the observed behavior is intentional. There is a way to
From: Andy Wingo <wingo@po...> - 2011-08-27 12:12:43
On Sat 27 Aug 2011 05:18, oleg@... writes:
> ; ssax:reverse-collect-str-drop-ws LIST-OF-FRAGS -> LIST-OF-FRAGS
> To recap: the observed behavior is intentional. There is a way to
> change it.
FWIW this seemed reasonable to me when I first started with SSAX, but it
surprised me later on in another context where I needed the whitespace.
By that time I had forgotten about this procedure and so spent some time
in figuring out what went wrong.
I would suggest that packagers of SSAX provide this whitespace-trimming
behavior as an option that is easy to enable, and furthermore that it be
off by default. That's the only way to preserve all information in the
Get latest updates about Open Source Projects, Conferences and News.