From: Michael B. <bu...@im...> - 2006-10-01 12:53:55
|
Hi Morten, > When I try to compile a tei dictionary with the tools module > from freedict.org, I get the following error: Did you use the Makefile/code from CVS? That doesn't use xmltei2xmldict.pl and (o)nsgmls for conversion anymore, only as additional xml validator. When using sp for validation, this warning is printed, but the process succeeds. > XML::ESISParser::parse: unable to parse freedict-eng-dan.tei' > nsgmls:/usr/share/xml/declaration/xml.dcl:31:27:W: characters > in the document character set with numbers exceeding 65535 not > supported > make: *** [all] Error 29 XML::ESISParser treats this warning as an error. > I am not sure what it means, other than perhaps it is a problem > with nsgmls not dealing with unicode characters? /usr/share/doc/opensp/NEWS.gz says about 1.5pre4 : "The multibyte version of OpenSP now uses 32bit chars and supports the full UTF-16 range 0x0000-0x10ffff." So I guess sp/sgmls, of which 1.3.4-1.2.1-47 is currently in unstable, really does not support Unicode characters >65535. But that should not be a problem for FreeDict since those characters are not in use in the dictionaries currently (at least I'm not aware of it). > onsgmls has no problems, it seems. Exactly how the whole process > works is a little murky to me. I do not know what to correct to > make use of onsgmls instead of nsgmls, so I have made > /usr/bin/nsgmls a symlink to /usr/bin/onsgmls, which is of course > horrible. I found that commands like this work: SGML_CATALOG_FILES="/usr/share/xml/declaration/xml.soc:/etc/sgml/catalog" SP_ENCODING=XML SP_CHARSET_FIXED=YES nsgmls -wxml -s -E 10 /usr/share/sgml/declaration/xml1n.dcl kha-eng.tei It will use "xml1n.dcl" instead of "xml.dcl" (to which xml.soc points). A comment in "xml1n.dcl" says "Note that this declaration is not conformant with the XML 1.0 specification; it is used for processors that cannot handle Unicode characters above 65536.". We could change xml.soc to point to xml1n.dcl or introduce our own xml.soc that would point to xml1n.dcl and would have to be used when nsgmls is used. We could also set the undocumented "NSGMLSCommand" option when creating the XML::ESISParser object in tools/xmltei2xmldict.pl... We should file 2 bugs against XML::ESISParser * treats warnings as errors * has undocumented options But I don't have my CPAN RT login information at the moment... Kindest regards, Michael |