From: Michael B. <mbe...@mb...> - 2005-11-07 14:11:25
|
Phillip Oldham wrote: > I've been tasked with providing an example of storing english, spanish, > and chinese text within an XML document. Has anyone played about with > storing multiple languages in single documents? Can anyone provide any tips? I routinely do this all the time, and I can assure you there will be no problems from the eXist core (or indeed from eXist as wrapped in eXist or Jetty within the standard distribution packages.) Where you may encounter difficulties is if you decide to house eXist in your own Web application framework environment; but if that happens, you will generally be able, with some perseverance, to track down which component is corrupting your encodings on the say into and/or out of eXist. A bigger problem may be if you want to use a wrapper not supported by eXist developers and which has an awkward attitude to Unicode. The one major problem I can anticipate you might have is if the Chinese is not in Unicode. It may well not be. If that's the case you either have to hope that the English and Spanish stuff doesn't include any characters not in the character-set concerned (in which case you may get away with declaring the encoding of the whole thing to be whatever encoding the Chinese uses, though I wouldn't really recommend that) or that all the characters used can be losslessly converted to Unicode codepoints so that the whole think can be kept in utf-8. Whether they can be or not will largely depend on what sort of stuff the Chinese includes. If it is modern text that doesn't quote extremely obscure medieval sources, you will probably find lossless transcoding to utf-8 Unicode (and back, if needed) is easy. The big exception is if the "Chinese" is actually Japanese. If it is, and if it includes personal data, then it may well contain Chinese characters used in Japan to write personal and place names and which are not in Unicode. Heck, some of them aren't even in the standard Japanese character sets, since for some people regard it as prestigious to invent new character variants to name their kids or their retirement villas. Then we're in nightmare territory. But that probably won't arise in your case. If you want to talk more about this over an off-list beverage sometime in our fair city, just email me. Michael Beddow |