Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Matt Kaufmann <kau...@cs...> writes:

> Hi --
>
> I maintain an application that is build on top of Common Lisp, which
> expects iso-8859-1 for the character encoding.  I'd like to set things
> up so that on a linux system, my application reads characters from a
> file exactly as they were written.  But my attempt to do so failed,
> dropping a #\Return character, as illustrated by the log below.  Is
> there something simple I can do to accomplish my goal, or else might
> that be the case in future CLISP releases?  Note that I did see the
> following note at http://www.clisp.org/impnotes/clhs-newline.html:
>
>   Justification. Unicode Newline Guidelines say: “Even if you know
>   which characters represents NLF on your particular platform, on
>   input and in interpretation, treat CR, LF, CRLF, and NEL the
>   same. Only on output do you need to distinguish between them.”
>
> However, I'm hoping that since I'm using iso-8859-1 rather than a utf
> encoding, maybe that justification doesn't need to apply.

No, it still applies.

Since you want to read codes such as 13 and 10, you should specify an
element type of (unsigned-byte 8):

[pjb@kuiper :0.0 ~]$ clisp -ansi -norc -q
[1]> (deftype octet () '(unsigned-byte 8))
OCTET
[2]> (with-open-file (in #P"~/tmp/misc/wang.dos"
                     :element-type 'octet)
      (let ((buffer (make-array 256 :element-type 'octet)))
        (read-sequence buffer in)
        (search #(13 10) buffer)))
29
[3]> (quit)
[pjb@kuiper :0.0 ~]$ 

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
You can take the lisper out of the lisp job, but you can't take the lisp out
of the lisper (; -- antifuchs