Re: [ssax-sxml] Re: sxml:sxml->xml
Brought to you by:
oleg
From: Shiro K. <sh...@la...> - 2003-10-18 00:17:52
|
From: MJ Ray <mj...@ds...> Subject: Re: [ssax-sxml] Re: sxml:sxml->xml Date: Fri, 17 Oct 2003 01:21:43 +0100 > Speaking of bad characters, I recently ran into the problem with the > PLT Scheme edition's ssax:xml->sxml hitting a UTF-8 numeric ᤓ or > similar and barfing. IMO, the real bug is that PLT's integer->char > doesn't handle anything over 255, but could that sort of entity be > handled in another way? Not exactly a bug, since R5RS doesn't precisely define the behavior of integer->char. In order to treat the entity reference portably, we need at least the following API that has to be customized by the implementations: ucs->char-list :: Integer -> [Char] Given the integer of ucs-4 codepoint, returns either * a list of characters that encode the unicode character in utf-8, if the implementation only supports single-byte characters. * or, a list of characters that encode the unicode character in utf-16, if the implementation supports up to two-byte width characters. * or, a list of one character, if the implementation supports full UCS-4. On single-byte character implementations, the output happens to work, since writing out the character sequence obtained by ucs->char-list produces a valid utf-8 string. However, to produce a valid character entity reference, we need some more APIs. --shiro |