From: Neal R. <ne...@ri...> - 2003-12-09 20:27:53
|
On Sat, 6 Dec 2003, Michael Chapman wrote: > > I am going to re-read the locales FAQs, at least one more time ... but th= us > far I have had no success on Mandrake 9.2 set-up to run on UTF-8. HtDig is not yet UTF-8. This will be a large job... there are many pieces of code with problematic assumptions about 1 byte =3D 1 char. > UTF-8 characters come out on my machine with htsearch as an entit-ised > version of Latin-1, e.g. Gen=C3=A8ve (G-e-n-e_grave-n-e) comes out as > Genaève. > Thought UTF-8 was the standard language for the web (or at least XHTML) n= ow? > > The above may be my mis-settings. > > But Палата (that is some Cyrillic > inserted in an English-language webpage, that is placed in as entities an= d > has not been transcribed to UTF-8 (or any other encoding for that matter)= , > shows up nicely in the original webpage, but in the htsearch output is > rendered as above (that is the source code is now &#1055;&#1072;&= amp; > etc.). I have an unofficial fix for this issue.. there are problems with comitting it in general. I will clean up the fix and post it soon. Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |