From: David G. <dav...@po...> - 2006-01-26 03:39:30
|
Shawn Kielty wrote: > I don't think it follows that it is true in an xml document type declar= ation > as well. Is the document type declaration used for other purposes such= as > validation? Hey Shawn, The XML parser reads it so it knows the charset of the bytes that follow. But, oddly enough with tdom it gets rather difficult because you can't give [dom parse] the XML declaration because of apparently the strange way it operates. tdom instead of externalizing the string to then hand to expat's XML_Parse() so that the characters you see are the characters it gets (like any other well behaved extension), subverts the normal case and hands XML_Parse() Tcl's internal utf-8 rep. IOW, you get this: (Desktop) 211 % [dom parse {<?xml version=3D"1.0" encoding=3D"ISO-8859-1" ?><foo>=E4</foo>}] asXML <foo>=C3=A4</foo> '=E4' !=3D '=C3=A4' (its internal utf-8 rep) Even though '=E4' is a member of ISO-8859-1, and I'm claiming the documen= t to be as such, is not how tdom works. I guess one is supposed to manually read the declaration for a charset, strip the declaration, convert it with [encoding converfrom], then call [tdom parse]. Whoa.. like that was obvious :( <bangheadonkeyboard repeat=3D"4" /> |