On Thu, Jan 22, 2009 at 11:21:35PM +0100, Pascal J. Bourguignon wrote:
> On Jan 22, 2009, at 4:06 PM, Frank Schwidom wrote:
>> if i read-line from an utf8-file and then make a char-code or char-int
>> on an utf8 character then the latin1 code is returned.
>> what can i do to gain the utf8-code?
>> (with-open-file (f "ae-utf8.txt" :direction :input)
>> (let ((l (read-line f)))
>> (format t "~s~%" l) ; outputs the utf8 character
>> (format t "~s~%" (char-code (aref l 0))) ; outputs latin1 code
>> (format t "~s~%" (char-int (aref l 0))) ; outputs latin1 code
> Here, we don't know that the file is read as an utf-8 stream.
> Have you set custom:*default-file-encoding* ? What's its value?
> To be sure, you can specify the :external-format:
> (with-open-file (f "ae-utf8.txt" :direction :input
> :external-format charset:utf-8)
> Then of course, you will have to do the same about the *standard-output*,
> since you're writing unicode characters, you must ensure
> custom:*terminal-encoding* is set to a character set able to display them.
> Otherwise, as Sam said, ASCII ⊂ ISO-8859-1 ⊂ UNICODE,
> with ASCII, ISO-8851-1 and UNICODE being subsets or CHARACTER × INTEGER,
> such as ( (c1,i1) ∈ UNICODE ∧ (c2,i2) ∈ UNICODE ) ⇒ (c1 = c2 ⇔ i1
> = i2).
> (char-code c1) = i1
> (code-char i1) = c1
> __Pascal Bourguignon__
The file ae-utf8.txt contains the byte sequence c3 a4 0a. I suppost it
is utf-8 because i wrote it using vim with ':set fileencoding=utf-8'.
if i read-line per :external-format "iso-8859-1" or "utf-8" then in both
cases the code will be the same (228), but if i use :external-format
"utf-16", then i will gain what i want: 42179 (== #xa4c3, reversed byte
order). This happens only if i read-line with :external-format, but how
can i create strings of different character sets? I did not found
matching parameters and functions '(apropos 'encoding)