Don Geddis <don@...> wrote on Fri, 12 Aug 2011:
> I'm working with cmucl (20B unicode) and sbcl (1.0.50.0.debian) on both
> Debian and Ubuntu. I've got some code that takes strings, and finds a
> "best fit" ascii representation for them:
> http://don.geddis.org/lisp/asciify.lisp
[...]
> But I can't even load it in sbcl, with the literal unicode characters in
> the strings in the code:
Louis Turk <lou@...> wrote:
> I'm no expert, but perhaps seeing how I work with utf-8 will help (note
> :external-format :utf-8) in the code below:
> (with-open-file (stream-out "/home/lat/lisp/interl/8-lines.tmp"
> :external-format :utf-8
> :direction :output
> :if-exists :supersede)
It's a good hint, and I saw things like this on the web. In fact,
reading from a file is where I initially saw the problem. I just
thought it made a simpler test case to load literal strings in code.
But I've failed with your suggestion too. I put a test case here:
http://don.geddis.org/other/unicode/
There's a short function, and a data file. The lisp function
(defun last-line (&optional (file "play.log"))
(with-open-file (f file :direction :input :external-format :utf-8)
(loop
with last-line = ""
for line = (read-line f nil nil)
while line
do (setq last-line line)
finally (return last-line) )))
is just trying to duplicate the unix "tail -1" functionality.
Adding ":external-format :utf-8" doesn't seem to help (me). CMUCL works
with or without it. SBCL fails, with or without it.
(I do find it interesting that CMUCL seems to autocompile some magic
source code about a UTF-8 external-format, while executing this code!)
FWIW, SBCL gives a _different_ octet sequence for its "decoding error",
depending on whether I add the :external-format or not.
But no matter what, I can't seem to get SBCL to read the characters into
a string.
-- Don
-------------------------------------------------------------------------------
unix:~> echo $LANG
en_US.UTF-8
unix> ls -l
total 12K
-rw-r--r-- 1 geddis geddis 251 Aug 12 19:26 last.lisp
-rw-r--r-- 1 geddis geddis 7.9K Aug 12 19:22 play.log
unix> cat last.lisp
(defun last-line (&optional (file "play.log"))
(with-open-file (f file :direction :input :external-format :utf-8)
(loop
with last-line = ""
for line = (read-line f nil nil)
while line
do (setq last-line line)
finally (return last-line) )))
unix> tail -1 play.log
Zepplin - Kashmir -- Pickin On (5m8s, 5699 KB, 44 kHz, 160 kbps, 10/0)
unix> cmucl
CMU Common Lisp Debian build (20B Unicode), running on yoda
With core: /usr/lib/cmucl/lisp-sse2.core
Dumped on: Tue, 2011-08-09 08:56:17-07:00 on yoda
See <http://www.cons.org/cmucl/> for support information.
Loaded subsystems:
Unicode 1.8.4.1 with Unicode version 5.1.0
Python 1.1, target Intel x86/sse2
CLOS based on Gerd's PCL 2010-03-19 15:19:03
* (load "last.lisp")
; Loading #P"/home/geddis/www/don/other/unicode/last.lisp".
T
* (last-line)
; Comment: $Header: /project/cmucl/cvsroot/src/pcl/simple-streams/external-formats/utf-8.lisp,v 1.14.4.1 2010-08-14 23:51:08 rtoy Exp $
; Compiling DEFINE-EXTERNAL-FORMAT UTF-8:
; Compiling DEFINE-EXTERNAL-FORMAT UTF-8:
; Byte Compiling Top-Level Form:
; In: LAMBDA (STREAM::%SLOTS%)
; (STREAM::OCTETS-TO-CHAR :UTF-8 STREAM::STATE
; (AREF STREAM::OCOUNT STREAM::K)
; (IF # # #)
; ...)
; --> LET IF LET STREAM::OCTETS-TO-CODEPOINT MULTIPLE-VALUE-BIND
; --> MULTIPLE-VALUE-CALL LABELS BLOCK LET DOTIMES DO BLOCK LET TAGBODY LET
; --> TAGBODY LET IF SETF LET* MULTIPLE-VALUE-BIND LET LET
; ==>
; (SETQ #:G29 #:G40)
; Note: Doing signed word to integer coercion (cost 20) to #:G29.
;
; Compilation unit finished.
; 1 note
"Zepplin - Kashmir -- Pickin On (5m8s, 5699 KB, 44 kHz, 160 kbps, 10/0)"
* (quit)
unix> sbcl
This is SBCL 1.0.50.0.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* (load "last.lisp")
T
* (last-line)
debugger invoked on a SB-INT:STREAM-DECODING-ERROR in thread #<THREAD
"initial thread" RUNNING
{AA73909}>:
decoding error on stream
#<SB-SYS:FD-STREAM for "file /home/geddis/www/don/other/unicode/play.log"
{AAD65B9}>
(:EXTERNAL-FORMAT :UTF-8):
the octet sequence (252 114 32 69) cannot be decoded.
Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ATTEMPT-RESYNC ] Attempt to resync the stream at a character boundary
and continue.
1: [FORCE-END-OF-FILE] Force an end of file.
2: [INPUT-REPLACEMENT] Use string as replacement input, attempt to resync at
a character boundary and continue.
3: [ABORT ] Exit debugger, returning to top level.
(SB-INT:STREAM-DECODING-ERROR
#<SB-SYS:FD-STREAM for "file /home/geddis/www/don/other/unicode/play.log"
{AAD65B9}>
(252 114 32 69))
0] :0
debugger invoked on a SB-INT:STREAM-DECODING-ERROR in thread #<THREAD
"initial thread" RUNNING
{AA73909}>:
decoding error on stream
#<SB-SYS:FD-STREAM for "file /home/geddis/www/don/other/unicode/play.log"
{AAD65B9}>
(:EXTERNAL-FORMAT :UTF-8):
the octet sequence (252 114 32 69) cannot be decoded.
Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ATTEMPT-RESYNC ] Attempt to resync the stream at a character boundary
and continue.
1: [FORCE-END-OF-FILE] Force an end of file.
2: [INPUT-REPLACEMENT] Use string as replacement input, attempt to resync at
a character boundary and continue.
3: [ABORT ] Exit debugger, returning to top level.
(SB-INT:STREAM-DECODING-ERROR
#<SB-SYS:FD-STREAM for "file /home/geddis/www/don/other/unicode/play.log"
{AAD65B9}>
(252 114 32 69))
0] (quit)
unix>
-------------------------------------------------------------------------------
_______________________________________________________________________________
Don Geddis http://don.geddis.org/ don@...
Filament magazine: "At what age is it best to crush a child's dreams so that
they have an easier time stepping in to the status quo?"
|