[ clisp-Bugs-2011946 ] New-clx "*** - unknown character set "ISO-106" error

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #2011946, was opened at 2008-07-06 22:16
Message generated for change (Comment added) made by rawlik
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=101355&aid=2011946&group_id=1355

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: clx
Group: lisp error
>Status: Open
Resolution: Works For Me
Priority: 5
Private: No
Submitted By: Drutsa Pavel (rawlik)
Assigned to: Bruno Haible (haible)
Summary: New-clx "*** - unknown character set "ISO-106" error

Initial Comment:
In the 'new-clx' package, using fonts with "ISO-10646-1" charset causes an error:
*** - unknown character set "ISO-10646-1"

I came to functions 'to_XChar2b' and 'cstombs'  in the clx.f file ...

Adding in the /usr/lib/gconv/gconv-modules a line: 'alias   ISO-10646-1//          UNICODE//' 
resolves the problem, but it's not a bug fix yet ;-(

----------------------------------------------------------------------

>Comment By: Drutsa Pavel (rawlik)
Date: 2008-07-12 00:44

Message:
Logged In: YES 
user_id=2039435
Originator: YES

The charset issues in this bug can be eliminated using Xft backend, 
as IMHO it's more flexible and  handles fonts encodings/recoding on-fly.

----------------------------------------------------------------------

Comment By: Sam Steingold (sds)
Date: 2008-07-12 00:10

Message:
Logged In: YES 
user_id=5735
Originator: NO

>> Can new-clx "transparently" in clx.f use libXft instead of old fonts
backend ?

no. and PLEASE keep focused on the charset issues in this bug.
if you want to discuss libXft, please open a NEW feature request.

----------------------------------------------------------------------

Comment By: Drutsa Pavel (rawlik)
Date: 2008-07-12 00:02

Message:
Logged In: YES 
user_id=2039435
Originator: YES

Can new-clx "transparently" in clx.f use libXft instead of old fonts
backend ?

----------------------------------------------------------------------

Comment By: Sam Steingold (sds)
Date: 2008-07-11 16:45

Message:
Logged In: YES 
user_id=5735
Originator: NO

we can actually go the anemacs way: define a user variable:
(defvar *conicalize-encoding-name* ())
(defun conicalize-encoding-name (name)
  (loop for current = name then next
     for next = (reduce #'funcall *conicalize-encoding-name*
:initial-value current)
     when (string= current next) return current))
in make-encoding, if charset is a string, put it through
conicalize-encoding-name
usually, *conicalize-encoding-name* will only contain STRING-UPCASE (it is
done now explicitly in make-encoding)
CLX can do more...

----------------------------------------------------------------------

Comment By: Sam Steingold (sds)
Date: 2008-07-11 16:34

Message:
Logged In: YES 
user_id=5735
Originator: NO

1. X11 is a "popular" package, people can reasonably expect the encoding
names it uses to be "correct", as in "accepted by other programs"

2. do all encodings we use contain ASCII?

3. http://en.wikipedia.org/wiki/UTF-32 appears to indicate ucs-4 == utf-32
== ISO-10646-1

4. we can always use the host endianness

we already do some encoding normalization in clx.f (isoXXXX -> ISO-XXXX),
so we can keep doing it there....

----------------------------------------------------------------------

Comment By: Bruno Haible (haible)
Date: 2008-07-11 10:49

Message:
Logged In: YES 
user_id=5923
Originator: NO

> should we?

Most likely not.

1) X11 has a lot of X11 specific aliases for encodings, like
   SJIS -> Shift_JIS
   microsoft-cp1251 -> WINDOWS-1251
   microsoft-cp1255 -> WINDOWS-1255
   microsoft-cp1256 -> WINDOWS-1256
   big5hkscs -> BIG5-HKSCS

2) Here you're dealing with *font* encodings, these are characters sets
that
   often don't contain ASCII (e.g. JISX 0208).

This suggests that the code which deals with it be contained in the CLX,
NEW-CLX modules, not in the core of clisp.

3) Why should ISO-10646-1 be mapped to UCS-4, not UCS-2? As far as I
know,
   the X11 core fonts APIs (here I mean those in libX11, as opposed to
those
   in libXft and libXrender) are based on 16-bit glyph indices into font
files.

4) When you say UNICODE-32 or UNICODE-16, the endianness is not
specified,
   which is worrisome because if the byte swapping is done in the wrong
manner,
   the entire output will be wrong.

----------------------------------------------------------------------

Comment By: Bruno Haible (haible)
Date: 2008-07-11 10:48

Message:
Logged In: YES 
user_id=5923
Originator: NO

> should we?

Most likely not.

1) X11 has a lot of X11 specific aliases for encodings, like
   SJIS -> Shift_JIS
   microsoft-cp1251 -> WINDOWS-1251
   microsoft-cp1255 -> WINDOWS-1255
   microsoft-cp1256 -> WINDOWS-1256
   big5hkscs -> BIG5-HKSCS

2) Here you're dealing with *font* encodings, these are characters sets
that
   often don't contain ASCII (e.g. JISX 0208).

This suggests that the code which deals with it be contained in the CLX,
NEW-CLX modules, not in the core of clisp.

3) Why should ISO-10646-1 be mapped to UCS-4, not UCS-2? As far as I
know,
   the X11 core fonts APIs (here I mean those in libX11, as opposed to
those
   in libXft and libXrender) are based on 16-bit glyph indices into font
files.

4) When you say UNICODE-32 or UNICODE-16, the endianness is not
specified,
   which is worrisome because if the byte swapping is done in the wrong
manner,
   the entire output will be wrong.

----------------------------------------------------------------------

Comment By: Sam Steingold (sds)
Date: 2008-07-11 06:57

Message:
Logged In: YES 
user_id=5735
Originator: NO

OK, I can now reproduce this:
 *** - unknown character set "ISO-10646-1"
but I don't see how this is a CLISP bug.

we could, of course, make "ISO-10646-1" to be an alias of 
CHARSET:UCS-4 = CHARSET:UNICODE-32 = CHARSET:UNICODE-32-BIG-ENDIAN
should we?

----------------------------------------------------------------------

Comment By: Drutsa Pavel (rawlik)
Date: 2008-07-11 00:34

Message:
Logged In: YES 
user_id=2039435
Originator: YES

I wrote a 'little' demo, for study Lisp clx and CLOS features. I will
attach it.
LANG variable sould by set to ru_RU.UTF-8.
simple type
>$ LANG=ru_RU.UTF-8 ./run
but it is necessary to add a line to gconf file before starting the
program:
># echo "alias ISO-10646-1//    UNICODE//" >>
/usr/lib/gconv/gconv-modules

File Added: CLOS_X11.tar.gz

----------------------------------------------------------------------

Comment By: Drutsa Pavel (rawlik)
Date: 2008-07-11 00:06

Message:
Logged In: YES 
user_id=2039435
Originator: YES

I pressed twiced "Submit changes" button, and can't do "Redo" ...

----------------------------------------------------------------------

Comment By: Sam Steingold (sds)
Date: 2008-07-11 00:03

Message:
Logged In: YES 
user_id=5735
Originator: NO

first, you re-posted your exact reply. please do not do that again.

when I say "how to reproduce a bug", I mean specific instructions:
what variables to set, how to invoke clisp, what to type at the clisp
prompt.
'using fonts with "ISO-10646-1" charset' is not specific enough.
how do I find out what fonts use this charset?
$ xlsfonts *iso8859-1
lists quite a few.
what does "using fonts" mean?

----------------------------------------------------------------------

Comment By: Drutsa Pavel (rawlik)
Date: 2008-07-11 00:02

Message:
Logged In: YES 
user_id=2039435
Originator: YES

I will make a little demo-program tomorrow. I'm very tired now. Sorry.

----------------------------------------------------------------------

Comment By: Drutsa Pavel (rawlik)
Date: 2008-07-10 23:54

Message:
Logged In: YES 
user_id=2039435
Originator: YES

Environment variable LANG=ru_RU.UTF-8  ( it's my native language :-) )

1) $ uname -a
Linux rawlik 2.6.24-gentoo-r8v1 #3 SMP Sat Jun 28 16:54:09 EEST 2008 i686
AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux

$ LC_ALL=C gcc --version
gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.1)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

glibc version 2.6.1
xlib  version  (libX11-1.1.4 )

2,3) I used the gentoo portage dev-lisp/clisp-2.46 
     with USE flags "hyperspec X new-clx fastcgi gdbm gtk pari pcre
postgres readline svm zlib"

I think that I'm very close to make a patch for it by my self.
With GDB I found where is the encoding extracted from the font, and where
is used a "/* Special hack: use the font's encoding */"
GDB show me: The encoding is "iso10646-1" but libc (the iconv subsystem)
doesn't "understand" this encoding (very strange behavior, I found this bug
also in the Debian distribution).
 I see a way to workaround, this encoding only, and to replace it by
"UNICODE" encoding directly in C code.
 But I'm new in lisp, and maybe it's not a genuine Lisp Way ...

----------------------------------------------------------------------

Comment By: Sam Steingold (sds)
Date: 2008-07-10 23:40

Message:
Logged In: YES 
user_id=5735
Originator: NO

so, HOW DO I REPRODUCE THE BUG?

----------------------------------------------------------------------

Comment By: Drutsa Pavel (rawlik)
Date: 2008-07-10 23:24

Message:
Logged In: YES 
user_id=2039435
Originator: YES

Environment variable LANG=ru_RU.UTF-8  ( it's my native language :-) )

1) $ uname -a
Linux rawlik 2.6.24-gentoo-r8v1 #3 SMP Sat Jun 28 16:54:09 EEST 2008 i686
AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux

$ LC_ALL=C gcc --version
gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.1)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

glibc version 2.6.1
xlib  version  (libX11-1.1.4 )

2,3) I used the gentoo portage dev-lisp/clisp-2.46 
     with USE flags "hyperspec X new-clx fastcgi gdbm gtk pari pcre
postgres readline svm zlib"

I think that I'm very close to make a patch for it by my self.
With GDB I found where is the encoding extracted from the font, and where
is used a "/* Special hack: use the font's encoding */"
GDB show me: The encoding is "iso10646-1" but libc (the iconv subsystem)
doesn't "understand" this encoding (very strange behavior, I found this bug
also in the Debian distribution).
 I see a way to workaround, this encoding only, and to replace it by
"UNICODE" encoding directly in C code.
 But I'm new in lisp, and maybe it's not a genuine Lisp Way ...

----------------------------------------------------------------------

Comment By: Sam Steingold (sds)
Date: 2008-07-07 18:43

Message:
Logged In: YES 
user_id=5735
Originator: NO

how do I reproduce this error message?

----------------------------------------------------------------------

Comment By: Sam Steingold (sds)
Date: 2008-07-07 18:43

Message:
Logged In: YES 
user_id=5735
Originator: NO

this is the standard request for more information.
1. what is your platform? 
   ("uname -a" on a Unix system)
   compiler version?  libc (on Linux)?
2. where did you get the sources?  when? 
   (absolute dates are prefered over the relative ones)
3. how did you build CLISP? (what command, options &c)
   please do a clean build (remove your build directory and
   build CLISP with "./configure --build build" or at least
   do a "make distclean" before "make")
4. if you are using pre-built binaries, the problem is likely
   to be in the incompatibilities between the platform on which
   the binary was built and yours;
   please try compiling the sources.
5. what is the output of (lisp-implementation-version)?
6. what is the value of *features*?
7. please supply the full output (copy and paste) 
   of all the error messages.
If you cannot build CLISP, you can obviously skip 5 and 6, 
but then you should provide more information in 1.
please see <http://clisp.cons.org/clisp.html#bugs> 
for more information.
Thanks.

PS. This bug report is now marked "pending" 
    and will auto-close unless you respond
    (in which case it will auto-re-open).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=101355&aid=2011946&group_id=1355