From: Alecs K. <al...@pe...> - 2005-06-28 22:49:34
|
On Tue, Jun 28, 2005 at 09:38:21AM +0100, Wenzhi Liang wrote: > Tested on Slackware last night and the installation went fine (as root). I did > see the problem you had tough, and on more files. Here is my list: > usr_02.txt > sponsor.txt > uganda.txt > usr_10.txt > intro.txt > pattern.txt > map.txt > windows.txt > mbyte.txt > gui_w16.txt > gui_w32.txt > if_ole.txt > os_dos.txt > os_msdos.txt > os_win32.txt > I think the actual encoding of the files are (probably) correct. We just > need to help Vim to detect it. Will look into it but it will take time. After playing around with gdb & vim, i just find the cause is indeed of the files themselves. > > ie. > > > > :h intro@cn (under GBK locale, aka in vim, enc=euc-cn) shows nothing > > but malformed characters. I _was_ kinda misleading here. The truth is that, euc is not gbk but gb2312. Some of our docs contain some 'evil' characters that have been converted from gbk. But these characters are gbk-only [1] and cannot be successfully converted to enc-cn (aka gb2312). Hence the problem arises. I followed the execution of vim and replaced all those 'evil' characters with enc-cn friendly ones. Most of them are gbk punctuation, some of them are traditional Chinese characters, others are unknown invalid chars. All changes committed to CVS. I'm about to release a 0.8.0-rc1 which you can test to see if this problem still remains and/or there are any other problems. When doing translation, remember to run the following command $ iconv -f utf-8 -t euc-cn file.txt >/dev/null to check if your doc is enc-cn friendly. [1] One exception is pattern.txt, of which the 'evil' chars are not from gbk conversion but from the original English doc. They are not evil of themselves but are evil to enc-cn. -- Alecs King |