From: SourceForge.net <no...@so...> - 2008-07-11 21:44:38
|
Bugs item #2011946, was opened at 2008-07-06 22:16 Message generated for change (Comment added) made by rawlik You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=101355&aid=2011946&group_id=1355 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: clx Group: lisp error >Status: Open Resolution: Works For Me Priority: 5 Private: No Submitted By: Drutsa Pavel (rawlik) Assigned to: Bruno Haible (haible) Summary: New-clx "*** - unknown character set "ISO-106" error Initial Comment: In the 'new-clx' package, using fonts with "ISO-10646-1" charset causes an error: *** - unknown character set "ISO-10646-1" I came to functions 'to_XChar2b' and 'cstombs' in the clx.f file ... Adding in the /usr/lib/gconv/gconv-modules a line: 'alias ISO-10646-1// UNICODE//' resolves the problem, but it's not a bug fix yet ;-( ---------------------------------------------------------------------- >Comment By: Drutsa Pavel (rawlik) Date: 2008-07-12 00:44 Message: Logged In: YES user_id=2039435 Originator: YES The charset issues in this bug can be eliminated using Xft backend, as IMHO it's more flexible and handles fonts encodings/recoding on-fly. ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2008-07-12 00:10 Message: Logged In: YES user_id=5735 Originator: NO >> Can new-clx "transparently" in clx.f use libXft instead of old fonts backend ? no. and PLEASE keep focused on the charset issues in this bug. if you want to discuss libXft, please open a NEW feature request. ---------------------------------------------------------------------- Comment By: Drutsa Pavel (rawlik) Date: 2008-07-12 00:02 Message: Logged In: YES user_id=2039435 Originator: YES Can new-clx "transparently" in clx.f use libXft instead of old fonts backend ? ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2008-07-11 16:45 Message: Logged In: YES user_id=5735 Originator: NO we can actually go the anemacs way: define a user variable: (defvar *conicalize-encoding-name* ()) (defun conicalize-encoding-name (name) (loop for current = name then next for next = (reduce #'funcall *conicalize-encoding-name* :initial-value current) when (string= current next) return current)) in make-encoding, if charset is a string, put it through conicalize-encoding-name usually, *conicalize-encoding-name* will only contain STRING-UPCASE (it is done now explicitly in make-encoding) CLX can do more... ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2008-07-11 16:34 Message: Logged In: YES user_id=5735 Originator: NO 1. X11 is a "popular" package, people can reasonably expect the encoding names it uses to be "correct", as in "accepted by other programs" 2. do all encodings we use contain ASCII? 3. http://en.wikipedia.org/wiki/UTF-32 appears to indicate ucs-4 == utf-32 == ISO-10646-1 4. we can always use the host endianness we already do some encoding normalization in clx.f (isoXXXX -> ISO-XXXX), so we can keep doing it there.... ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2008-07-11 10:49 Message: Logged In: YES user_id=5923 Originator: NO > should we? Most likely not. 1) X11 has a lot of X11 specific aliases for encodings, like SJIS -> Shift_JIS microsoft-cp1251 -> WINDOWS-1251 microsoft-cp1255 -> WINDOWS-1255 microsoft-cp1256 -> WINDOWS-1256 big5hkscs -> BIG5-HKSCS 2) Here you're dealing with *font* encodings, these are characters sets that often don't contain ASCII (e.g. JISX 0208). This suggests that the code which deals with it be contained in the CLX, NEW-CLX modules, not in the core of clisp. 3) Why should ISO-10646-1 be mapped to UCS-4, not UCS-2? As far as I know, the X11 core fonts APIs (here I mean those in libX11, as opposed to those in libXft and libXrender) are based on 16-bit glyph indices into font files. 4) When you say UNICODE-32 or UNICODE-16, the endianness is not specified, which is worrisome because if the byte swapping is done in the wrong manner, the entire output will be wrong. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2008-07-11 10:48 Message: Logged In: YES user_id=5923 Originator: NO > should we? Most likely not. 1) X11 has a lot of X11 specific aliases for encodings, like SJIS -> Shift_JIS microsoft-cp1251 -> WINDOWS-1251 microsoft-cp1255 -> WINDOWS-1255 microsoft-cp1256 -> WINDOWS-1256 big5hkscs -> BIG5-HKSCS 2) Here you're dealing with *font* encodings, these are characters sets that often don't contain ASCII (e.g. JISX 0208). This suggests that the code which deals with it be contained in the CLX, NEW-CLX modules, not in the core of clisp. 3) Why should ISO-10646-1 be mapped to UCS-4, not UCS-2? As far as I know, the X11 core fonts APIs (here I mean those in libX11, as opposed to those in libXft and libXrender) are based on 16-bit glyph indices into font files. 4) When you say UNICODE-32 or UNICODE-16, the endianness is not specified, which is worrisome because if the byte swapping is done in the wrong manner, the entire output will be wrong. ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2008-07-11 06:57 Message: Logged In: YES user_id=5735 Originator: NO OK, I can now reproduce this: *** - unknown character set "ISO-10646-1" but I don't see how this is a CLISP bug. we could, of course, make "ISO-10646-1" to be an alias of CHARSET:UCS-4 = CHARSET:UNICODE-32 = CHARSET:UNICODE-32-BIG-ENDIAN should we? ---------------------------------------------------------------------- Comment By: Drutsa Pavel (rawlik) Date: 2008-07-11 00:34 Message: Logged In: YES user_id=2039435 Originator: YES I wrote a 'little' demo, for study Lisp clx and CLOS features. I will attach it. LANG variable sould by set to ru_RU.UTF-8. simple type >$ LANG=ru_RU.UTF-8 ./run but it is necessary to add a line to gconf file before starting the program: ># echo "alias ISO-10646-1// UNICODE//" >> /usr/lib/gconv/gconv-modules File Added: CLOS_X11.tar.gz ---------------------------------------------------------------------- Comment By: Drutsa Pavel (rawlik) Date: 2008-07-11 00:06 Message: Logged In: YES user_id=2039435 Originator: YES I pressed twiced "Submit changes" button, and can't do "Redo" ... ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2008-07-11 00:03 Message: Logged In: YES user_id=5735 Originator: NO first, you re-posted your exact reply. please do not do that again. when I say "how to reproduce a bug", I mean specific instructions: what variables to set, how to invoke clisp, what to type at the clisp prompt. 'using fonts with "ISO-10646-1" charset' is not specific enough. how do I find out what fonts use this charset? $ xlsfonts *iso8859-1 lists quite a few. what does "using fonts" mean? ---------------------------------------------------------------------- Comment By: Drutsa Pavel (rawlik) Date: 2008-07-11 00:02 Message: Logged In: YES user_id=2039435 Originator: YES I will make a little demo-program tomorrow. I'm very tired now. Sorry. ---------------------------------------------------------------------- Comment By: Drutsa Pavel (rawlik) Date: 2008-07-10 23:54 Message: Logged In: YES user_id=2039435 Originator: YES Environment variable LANG=ru_RU.UTF-8 ( it's my native language :-) ) 1) $ uname -a Linux rawlik 2.6.24-gentoo-r8v1 #3 SMP Sat Jun 28 16:54:09 EEST 2008 i686 AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux $ LC_ALL=C gcc --version gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.1) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. glibc version 2.6.1 xlib version (libX11-1.1.4 ) 2,3) I used the gentoo portage dev-lisp/clisp-2.46 with USE flags "hyperspec X new-clx fastcgi gdbm gtk pari pcre postgres readline svm zlib" I think that I'm very close to make a patch for it by my self. With GDB I found where is the encoding extracted from the font, and where is used a "/* Special hack: use the font's encoding */" GDB show me: The encoding is "iso10646-1" but libc (the iconv subsystem) doesn't "understand" this encoding (very strange behavior, I found this bug also in the Debian distribution). I see a way to workaround, this encoding only, and to replace it by "UNICODE" encoding directly in C code. But I'm new in lisp, and maybe it's not a genuine Lisp Way ... ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2008-07-10 23:40 Message: Logged In: YES user_id=5735 Originator: NO so, HOW DO I REPRODUCE THE BUG? ---------------------------------------------------------------------- Comment By: Drutsa Pavel (rawlik) Date: 2008-07-10 23:24 Message: Logged In: YES user_id=2039435 Originator: YES Environment variable LANG=ru_RU.UTF-8 ( it's my native language :-) ) 1) $ uname -a Linux rawlik 2.6.24-gentoo-r8v1 #3 SMP Sat Jun 28 16:54:09 EEST 2008 i686 AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux $ LC_ALL=C gcc --version gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.1) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. glibc version 2.6.1 xlib version (libX11-1.1.4 ) 2,3) I used the gentoo portage dev-lisp/clisp-2.46 with USE flags "hyperspec X new-clx fastcgi gdbm gtk pari pcre postgres readline svm zlib" I think that I'm very close to make a patch for it by my self. With GDB I found where is the encoding extracted from the font, and where is used a "/* Special hack: use the font's encoding */" GDB show me: The encoding is "iso10646-1" but libc (the iconv subsystem) doesn't "understand" this encoding (very strange behavior, I found this bug also in the Debian distribution). I see a way to workaround, this encoding only, and to replace it by "UNICODE" encoding directly in C code. But I'm new in lisp, and maybe it's not a genuine Lisp Way ... ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2008-07-07 18:43 Message: Logged In: YES user_id=5735 Originator: NO how do I reproduce this error message? ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2008-07-07 18:43 Message: Logged In: YES user_id=5735 Originator: NO this is the standard request for more information. 1. what is your platform? ("uname -a" on a Unix system) compiler version? libc (on Linux)? 2. where did you get the sources? when? (absolute dates are prefered over the relative ones) 3. how did you build CLISP? (what command, options &c) please do a clean build (remove your build directory and build CLISP with "./configure --build build" or at least do a "make distclean" before "make") 4. if you are using pre-built binaries, the problem is likely to be in the incompatibilities between the platform on which the binary was built and yours; please try compiling the sources. 5. what is the output of (lisp-implementation-version)? 6. what is the value of *features*? 7. please supply the full output (copy and paste) of all the error messages. If you cannot build CLISP, you can obviously skip 5 and 6, but then you should provide more information in 1. please see <http://clisp.cons.org/clisp.html#bugs> for more information. Thanks. PS. This bug report is now marked "pending" and will auto-close unless you respond (in which case it will auto-re-open). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=101355&aid=2011946&group_id=1355 |