From: Michael C. <ch...@mc...> - 2003-12-06 11:06:34
|
I am going to re-read the locales FAQs, at least one more time ... but thus far I have had no success on Mandrake 9.2 set-up to run on UTF-8. UTF-8 characters come out on my machine with htsearch as an entit-ised version of Latin-1, e.g. Genève (G-e-n-e_grave-n-e) comes out as Genaève. Thought UTF-8 was the standard language for the web (or at least XHTML) now? The above may be my mis-settings. But Палата (that is some Cyrillic inserted in an English-language webpage, that is placed in as entities and has not been transcribed to UTF-8 (or any other encoding for that matter), shows up nicely in the original webpage, but in the htsearch output is rendered as above (that is the source code is now &#1055;&#1072;& etc.). The few non-numeric entities I have checked seem to survive digging/searching (& ®). Is there a way to stop the killing of numeric entities? Michael Chapman. |