From: Pascal J. B. <pj...@in...> - 2013-05-19 21:16:37
|
Matt Kaufmann <kau...@cs...> writes: > Hi -- > > I maintain an application that is build on top of Common Lisp, which > expects iso-8859-1 for the character encoding. I'd like to set things > up so that on a linux system, my application reads characters from a > file exactly as they were written. But my attempt to do so failed, > dropping a #\Return character, as illustrated by the log below. Is > there something simple I can do to accomplish my goal, or else might > that be the case in future CLISP releases? Note that I did see the > following note at http://www.clisp.org/impnotes/clhs-newline.html: > > Justification. Unicode Newline Guidelines say: “Even if you know > which characters represents NLF on your particular platform, on > input and in interpretation, treat CR, LF, CRLF, and NEL the > same. Only on output do you need to distinguish between them.” > > However, I'm hoping that since I'm using iso-8859-1 rather than a utf > encoding, maybe that justification doesn't need to apply. No, it still applies. Since you want to read codes such as 13 and 10, you should specify an element type of (unsigned-byte 8): [pjb@kuiper :0.0 ~]$ clisp -ansi -norc -q [1]> (deftype octet () '(unsigned-byte 8)) OCTET [2]> (with-open-file (in #P"~/tmp/misc/wang.dos" :element-type 'octet) (let ((buffer (make-array 256 :element-type 'octet))) (read-sequence buffer in) (search #(13 10) buffer))) 29 [3]> (quit) [pjb@kuiper :0.0 ~]$ -- __Pascal Bourguignon__ http://www.informatimago.com/ A bad day in () is better than a good day in {}. You can take the lisper out of the lisp job, but you can't take the lisp out of the lisper (; -- antifuchs |