From: Frank S. <sch...@gm...> - 2009-01-22 14:07:39
|
Hi, if i read-line from an utf8-file and then make a char-code or char-int on an utf8 character then the latin1 code is returned. what can i do to gain the utf8-code? (with-open-file (f "ae-utf8.txt" :direction :input) (let ((l (read-line f))) (format t "~s~%" l) ; outputs the utf8 character (format t "~s~%" (char-code (aref l 0))) ; outputs latin1 code (format t "~s~%" (char-int (aref l 0))) ; outputs latin1 code ) ) Regards |
From: Sam S. <sd...@gn...> - 2009-01-22 14:35:25
|
Hi, Frank Schwidom wrote: > > if i read-line from an utf8-file and then make a char-code or char-int > on an utf8 character then the latin1 code is returned. > > what can i do to gain the utf8-code? > > (with-open-file (f "ae-utf8.txt" :direction :input) > (let ((l (read-line f))) > (format t "~s~%" l) ; outputs the utf8 character > (format t "~s~%" (char-code (aref l 0))) ; outputs latin1 code > (format t "~s~%" (char-int (aref l 0))) ; outputs latin1 code > ) > ) latin1 is the same as utf8 on ascii. does your file contain non-ascii characters? note also: http://clisp.cons.org/impnotes/char-sets.html CLISP uses the 21-bit UNICODE 3.2 character set (ISO 10646, also known as UCS-4). http://clisp.cons.org/impnotes/char-int.html The integer returned by CHAR-INT is the same as the character's code (CHAR-CODE). Sam. |
From: Frank S. <sch...@gm...> - 2009-01-23 11:19:17
|
On Thu, Jan 22, 2009 at 11:21:35PM +0100, Pascal J. Bourguignon wrote: > > On Jan 22, 2009, at 4:06 PM, Frank Schwidom wrote: > >> Hi, >> >> if i read-line from an utf8-file and then make a char-code or char-int >> on an utf8 character then the latin1 code is returned. >> >> what can i do to gain the utf8-code? >> >> (with-open-file (f "ae-utf8.txt" :direction :input) >> (let ((l (read-line f))) >> (format t "~s~%" l) ; outputs the utf8 character >> (format t "~s~%" (char-code (aref l 0))) ; outputs latin1 code >> (format t "~s~%" (char-int (aref l 0))) ; outputs latin1 code > > Here, we don't know that the file is read as an utf-8 stream. > Have you set custom:*default-file-encoding* ? What's its value? > > To be sure, you can specify the :external-format: > > (with-open-file (f "ae-utf8.txt" :direction :input > :external-format charset:utf-8) > ...) > > Then of course, you will have to do the same about the *standard-output*, > since you're writing unicode characters, you must ensure > custom:*terminal-encoding* is set to a character set able to display them. > > Otherwise, as Sam said, ASCII ⊂ ISO-8859-1 ⊂ UNICODE, > with ASCII, ISO-8851-1 and UNICODE being subsets or CHARACTER × INTEGER, > such as ( (c1,i1) ∈ UNICODE ∧ (c2,i2) ∈ UNICODE ) ⇒ (c1 = c2 ⇔ i1 > = i2). > > (char-code c1) = i1 > (code-char i1) = c1 > > -- > __Pascal Bourguignon__ > http://www.informatimago.com The file ae-utf8.txt contains the byte sequence c3 a4 0a. I suppost it is utf-8 because i wrote it using vim with ':set fileencoding=utf-8'. if i read-line per :external-format "iso-8859-1" or "utf-8" then in both cases the code will be the same (228), but if i use :external-format "utf-16", then i will gain what i want: 42179 (== #xa4c3, reversed byte order). This happens only if i read-line with :external-format, but how can i create strings of different character sets? I did not found matching parameters and functions '(apropos 'encoding) Regards |
From: Sam S. <sd...@gn...> - 2009-01-23 14:37:43
|
Frank Schwidom wrote: > > can i create strings of different character sets? I did not found > matching parameters and functions '(apropos 'encoding) a string is a vector of characters, it has no notion of a character set. the notion of encodings is only applicable to conversions between character sequences and byte sequences. what you are looking for is probably http://clisp.cons.org/impnotes/encoding.html#string-byte note that normally you should not need these functions, passing :external-format http://clisp.cons.org/impnotes/open.html#extfmt to open is much easier. |
From: Frank S. <sch...@gm...> - 2009-01-23 15:18:20
|
On Fri, Jan 23, 2009 at 09:37:36AM -0500, Sam Steingold wrote: > Frank Schwidom wrote: >> can i create strings of different character sets? I did not found >> matching parameters and functions '(apropos 'encoding) > > a string is a vector of characters, it has no notion of a character set. > the notion of encodings is only applicable to conversions between character > sequences and byte sequences. > what you are looking for is probably > http://clisp.cons.org/impnotes/encoding.html#string-byte > note that normally you should not need these functions, passing > :external-format > http://clisp.cons.org/impnotes/open.html#extfmt > to open is much easier. Thanks, this solves my problem. Regards |