Thread: [clisp-list] reading of CR/LF for charset:iso-8859-1

Brought to you by: haible, hoehle, sds

clisp-list

[clisp-list] reading of CR/LF for charset:iso-8859-1

From: Matt K. <kau...@cs...> - 2013-05-19 17:56:55

Hi --

I maintain an application that is build on top of Common Lisp, which
expects iso-8859-1 for the character encoding.  I'd like to set things
up so that on a linux system, my application reads characters from a
file exactly as they were written.  But my attempt to do so failed,
dropping a #\Return character, as illustrated by the log below.  Is
there something simple I can do to accomplish my goal, or else might
that be the case in future CLISP releases?  Note that I did see the
following note at http://www.clisp.org/impnotes/clhs-newline.html:

  Justification. Unicode Newline Guidelines say: “Even if you know
  which characters represents NLF on your particular platform, on
  input and in interpretation, treat CR, LF, CRLF, and NEL the
  same. Only on output do you need to distinguish between them.”

However, I'm hoping that since I'm using iso-8859-1 rather than a utf
encoding, maybe that justification doesn't need to apply.

Here is the log promised above.  It shows that after an attempt to set
custom:*default-file-encoding* appropriately, then after writing a
string to that file containing four characters including a #\Return
character, that character is dropped when reading back in.

dunnottar:~% /usr/bin/clisp
  i i i i i i i       ooooo    o        ooooooo   ooooo   ooooo
  I I I I I I I      8     8   8           8     8     o  8    8
  I  \ `+' /  I      8         8           8     8        8    8
   \  `-+-'  /       8         8           8      ooooo   8oooo
    `-__|__-'        8         8           8           8  8
        |            8     o   8           8     o     8  8
  ------+------       ooooo    8oooooo  ooo8ooo   ooooo   8

Welcome to GNU CLISP 2.49 (2010-07-07) <http://clisp.cons.org/>

Copyright (c) Bruno Haible, Michael Stoll 1992, 1993
Copyright (c) Bruno Haible, Marcus Daniels 1994-1997
Copyright (c) Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998
Copyright (c) Bruno Haible, Sam Steingold 1999-2000
Copyright (c) Sam Steingold, Bruno Haible 2001-2010

Type :h and hit Enter for context help.

[1]> (setq custom:*default-file-encoding*
           (ext:make-encoding :charset 'charset:iso-8859-1
                              :line-terminator :unix))
#<ENCODING CHARSET:ISO-8859-1 :UNIX>
[2]> (with-open-file
      (str "test.lisp" :direction :output)
      (princ (concatenate 'string "\"" "a"
                           (string #\Return)
                           (string #\Newline)
                           "b" "\"")
             str))
"\"a
b\""
[3]> (with-open-file
      (str "test.lisp" :direction :input)
      (let ((s (read str)))
	(list (length s) (char s 0) (char s 1) (char s 2))))
You are in the top-level Read-Eval-Print loop.
Help (abbreviated :h) = this list
Use the usual editing capabilities.
(quit) or (exit) leaves CLISP.
(3 #\a #\Newline #\b)
[4]> 

Thanks --
-- Matt Kaufmann

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Pascal J. B. <pj...@in...> - 2013-05-19 21:16:37

Matt Kaufmann <kau...@cs...> writes:

> Hi --
>
> I maintain an application that is build on top of Common Lisp, which
> expects iso-8859-1 for the character encoding.  I'd like to set things
> up so that on a linux system, my application reads characters from a
> file exactly as they were written.  But my attempt to do so failed,
> dropping a #\Return character, as illustrated by the log below.  Is
> there something simple I can do to accomplish my goal, or else might
> that be the case in future CLISP releases?  Note that I did see the
> following note at http://www.clisp.org/impnotes/clhs-newline.html:
>
>   Justification. Unicode Newline Guidelines say: “Even if you know
>   which characters represents NLF on your particular platform, on
>   input and in interpretation, treat CR, LF, CRLF, and NEL the
>   same. Only on output do you need to distinguish between them.”
>
> However, I'm hoping that since I'm using iso-8859-1 rather than a utf
> encoding, maybe that justification doesn't need to apply.

No, it still applies.

Since you want to read codes such as 13 and 10, you should specify an
element type of (unsigned-byte 8):


[pjb@kuiper :0.0 ~]$ clisp -ansi -norc -q
[1]> (deftype octet () '(unsigned-byte 8))
OCTET
[2]> (with-open-file (in #P"~/tmp/misc/wang.dos"
                     :element-type 'octet)
      (let ((buffer (make-array 256 :element-type 'octet)))
        (read-sequence buffer in)
        (search #(13 10) buffer)))
29
[3]> (quit)
[pjb@kuiper :0.0 ~]$ 



-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
You can take the lisper out of the lisp job, but you can't take the lisp out
of the lisper (; -- antifuchs

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Matt K. <kau...@cs...> - 2013-05-19 22:20:57

Thank you very much for getting back to me so quickly.  That helps,
but I'd like to be able to read in code 10 using the function
READ-CHAR, and I don't see how to do that in CLISP, even though I can
do it in Allegro CL, CCL, CMUCL, LispWorks, SBCL, and GCL.  My sample
file contains six characters as follows, where the line break consists
of #\Return followed by #\Newline:

"x
y"

Below is a log showing how I get #\Return (code 10) using read-char in
those other lisps, but not CLISP.  Any suggestions?  But first, I
should mention that I tried the following in CLISP (though the error
probably won't surprise you) -- maybe you can suggest an alternative?

  [1]> (deftype octet () '(unsigned-byte 8))
  OCTET
  [2]> (with-open-file (in #P"test0"
			   :element-type 'octet)
		       (read-char in))

  *** - READ-CHAR on #<INPUT BUFFERED FILE-STREAM (UNSIGNED-BYTE 8) #P"test0">
	is illegal
  The following restarts are available:
  ABORT          :R1      Abort main loop
  Break 1 [3]> 

Anyhow, here is the log promised above.

dunnottar:~/temp% acl9
International Allegro CL Enterprise Edition
9.0 [64-bit Linux (x86-64)] (Jul 11, 2012 14:33)
Copyright (C) 1985-2012, Franz Inc., Oakland, CA, USA.  All Rights Reserved.

This development copy of Allegro CL is licensed to:
   [TC20122] University of Texas

;; Optimization settings: safety 1, space 1, speed 1, debug 2.
;; For a complete description of all compiler switches given the
;; current optimization settings evaluate (EXPLAIN-COMPILER-SETTINGS).
CL-USER(1): (setq *locale* (find-locale "C"))
#<locale "C" [:LATIN1-BASE] @ #x100004067b2>
CL-USER(2): (let (ch)
    (with-open-file (in #P"test0")
                    (loop while (setq ch (read-char in nil))
                          collect (char-code ch))))
(34 120 13 10 121 34 13 10)
CL-USER(3): (exit)
; Exiting
dunnottar:~/temp% ccl
Starting 64-bit CCL
Welcome to Clozure Common Lisp Version 1.9-dev-r15542M-trunk  (LinuxX8664)!
? (setq ccl:*default-file-character-encoding* :iso-8859-1)
:ISO-8859-1
? (let (ch)
    (with-open-file (in #P"test0")
                    (loop while (setq ch (read-char in nil))
                          collect (char-code ch))))
(34 120 13 10 121 34 13 10)
? (quit)
dunnottar:~/temp% cmucl
CMU Common Lisp snapshot-2013-05 (20D Unicode), running on dunnottar
With core: /v/filer4b/v11q001/acl2/lisps/cmucl-snapshot-2013-05-20D-Unicode/lib/cmucl/lib/lisp-sse2.core
Dumped on: Sat, 2013-05-11 11:18:42-05:00 on lorien2
See <http://www.cmucl.org/> for support information.
Loaded subsystems:
    Unicode 1.29 with Unicode version 6.2.0
    Python 1.1, target Intel x86/sse2
    CLOS based on Gerd's PCL 2010/03/19 15:19:03
* (setq *default-external-format* :iso-8859-1)

:ISO-8859-1
* (let (ch)
    (with-open-file (in #P"test0")
                    (loop while (setq ch (read-char in nil))
                          collect (char-code ch))))

(34 120 13 10 121 34 13 10)
* (quit)
; dunnottar:~/temp% lispworks
Starting 64-bit Lispworks
LispWorks(R): The Common Lisp Programming Environment
Copyright (C) 1987-2012 LispWorks Ltd.  All rights reserved.
Version 6.1.1
Saved by kaufmann as lw-terminal-only, at 26 Nov 2012 15:23
User kaufmann on dunnottar

CL-USER 1 > (setq stream::*default-external-format* '(:LATIN-1 :EOL-STYLE :LF))
(:LATIN-1 :EOL-STYLE :LF)

CL-USER 2 > (defun our-file-encoding (pathname ef-spec buffer length)
              (system:merge-ef-specs ef-spec '(:LATIN-1 :EOL-STYLE :LF)))
OUR-FILE-ENCODING

CL-USER 3 > (setq system::*file-encoding-detection-algorithm*
                  '(our-file-encoding))
(OUR-FILE-ENCODING)

CL-USER 4 > (let (ch)
              (with-open-file (in #P"test0")
                              (loop while (setq ch (read-char in nil))
                                    collect (char-code ch))))
(34 120 13 10 121 34 13 10)

CL-USER 5 > (quit)
dunnottar:~/temp% sbcl
Starting 64-bit SBCL
This is SBCL 1.1.4, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
* (setq sb-impl::*default-external-format* :iso-8859-1)

:ISO-8859-1
* (let (ch)
    (with-open-file (in #P"test0")
                    (loop while (setq ch (read-char in nil))
                          collect (char-code ch))))

(34 120 13 10 121 34 13 10)
* (quit)
dunnottar:~/temp% gcl
GCL (GNU Common Lisp)  2.6.8 CLtL1    May 11 2013 16:43:51
Source License: LGPL(gcl,gmp), GPL(unexec,bfd,xgcl)
Binary License:  GPL due to GPL'ed components: (XGCL READLINE UNEXEC)
Modifications of this banner must retain notice of a compatible license
Dedicated to the memory of W. Schelter

Use (help) to get some basic information on how to use GCL.
Temporary directory for compiler files set to /tmp/

>(let (ch)
   (with-open-file (in #P"test0")
                   (loop while (setq ch (read-char in nil))
                         collect (char-code ch))))

(34 120 13 10 121 34 13 10)

>(quit)
dunnottar:~/temp% clisp
  i i i i i i i       ooooo    o        ooooooo   ooooo   ooooo
  I I I I I I I      8     8   8           8     8     o  8    8
  I  \ `+' /  I      8         8           8     8        8    8
   \  `-+-'  /       8         8           8      ooooo   8oooo
    `-__|__-'        8         8           8           8  8
        |            8     o   8           8     o     8  8
  ------+------       ooooo    8oooooo  ooo8ooo   ooooo   8

Welcome to GNU CLISP 2.49 (2010-07-07) <http://clisp.cons.org/>

Copyright (c) Bruno Haible, Michael Stoll 1992, 1993
Copyright (c) Bruno Haible, Marcus Daniels 1994-1997
Copyright (c) Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998
Copyright (c) Bruno Haible, Sam Steingold 1999-2000
Copyright (c) Sam Steingold, Bruno Haible 2001-2010

Type :h and hit Enter for context help.

[1]> (setq custom:*default-file-encoding*
           (ext:make-encoding :charset 'charset:iso-8859-1
                              :line-terminator :unix))
#<ENCODING CHARSET:ISO-8859-1 :UNIX>
[2]> (let (ch)
       (with-open-file (in #P"test0")
                       (loop while (setq ch (read-char in nil))
                             collect (char-code ch))))
(34 120 10 121 34 10)
[3]> (quit)
Bye.
dunnottar:~/temp% 

Thanks --
-- Matt
   From: "Pascal J. Bourguignon" <pj...@in...>
   Date: Sun, 19 May 2013 23:16:14 +0200
   Organization: Informatimago

   Matt Kaufmann <kau...@cs...> writes:

   > Hi --
   >
   > I maintain an application that is build on top of Common Lisp, which
   > expects iso-8859-1 for the character encoding.  I'd like to set things
   > up so that on a linux system, my application reads characters from a
   > file exactly as they were written.  But my attempt to do so failed,
   > dropping a #\Return character, as illustrated by the log below.  Is
   > there something simple I can do to accomplish my goal, or else might
   > that be the case in future CLISP releases?  Note that I did see the
   > following note at http://www.clisp.org/impnotes/clhs-newline.html:
   >
   >   Justification. Unicode Newline Guidelines say: “Even if you know
   >   which characters represents NLF on your particular platform, on
   >   input and in interpretation, treat CR, LF, CRLF, and NEL the
   >   same. Only on output do you need to distinguish between them.”
   >
   > However, I'm hoping that since I'm using iso-8859-1 rather than a utf
   > encoding, maybe that justification doesn't need to apply.

   No, it still applies.

   Since you want to read codes such as 13 and 10, you should specify an
   element type of (unsigned-byte 8):


   [pjb@kuiper :0.0 ~]$ clisp -ansi -norc -q
   [1]> (deftype octet () '(unsigned-byte 8))
   OCTET
   [2]> (with-open-file (in #P"~/tmp/misc/wang.dos"
			:element-type 'octet)
	 (let ((buffer (make-array 256 :element-type 'octet)))
	   (read-sequence buffer in)
	   (search #(13 10) buffer)))
   29
   [3]> (quit)
   [pjb@kuiper :0.0 ~]$ 



   -- 
   __Pascal Bourguignon__                     http://www.informatimago.com/
   A bad day in () is better than a good day in {}.
   You can take the lisper out of the lisp job, but you can't take the lisp out
   of the lisper (; -- antifuchs


   ------------------------------------------------------------------------------
   AlienVault Unified Security Management (USM) platform delivers complete
   security visibility with the essential security capabilities. Easily and
   efficiently configure, manage, and operate all of your security controls
   from a single console and one unified framework. Download a free trial.
   http://p.sf.net/sfu/alienvault_d2d
   _______________________________________________
   clisp-list mailing list
   cli...@li...
   https://lists.sourceforge.net/lists/listinfo/clisp-list

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Pascal J. B. <pj...@in...> - 2013-05-19 23:31:56

Matt Kaufmann <kau...@cs...> writes:

> Thank you very much for getting back to me so quickly.  That helps,
> but I'd like to be able to read in code 10 using the function
> READ-CHAR, and I don't see how to do that in CLISP, even though I can
> do it in Allegro CL, CCL, CMUCL, LispWorks, SBCL, and GCL.  My sample
> file contains six characters as follows, where the line break consists
> of #\Return followed by #\Newline:

I told you how to do it.  Read bytes, not characters.
You can always convert bytes to characters later with
ext:convert-string-from-bytes


(deftype octet () '(unsigned-byte 8))

(with-open-file (in #P"~/tmp/misc/wang.dos"
                     :element-type 'octet)
  (loop
    :for byte = (read-byte in nil in)
    :until (eq byte in)
    :do (case byte
          ((13) (princ " CR"))
          ((10) (princ " LF") (princ #\Newline))
          (otherwise
           (if (or (<= 32 byte 126)
                   (<= 160 byte 255))
               (princ (ext:convert-string-from-bytes (vector byte) charset:iso-8859-1))
               (format "<CODE ~D>" byte)))))
  (values))

Hao Wang, logicien americain. CR LF
 CR LF
L'algorithme en  question  a  ete  publie  en  1960  dans l'IBM Journal, CR LF
article intitule "Toward  Mechanical Mathematics", avec des variantes et CR LF
une  extension au calcul  des  predicats.  Il  s'agit  ici  du  "premier CR LF
programme" de Wang, systeme "P". CR LF
 CR LF
L'article a ete ecrit en 1958, et les experiences effectuees sur IBM 704 CR LF
- machine a lampes, 32 k  mots  de 36 bits, celle-la meme qui vit naitre CR LF
LISP a la meme epoque. Le programme  a  ete ecrit en assembleur (Fortran CR LF
existait, mais il ne s'etait pas encore impose)  et  l'auteur estime que CR LF
"there is very little in the program that is not straightforward". CR LF
 CR LF
Il observe que les preuves engendrees sont "essentiellement des arbres", CR LF
et  annonce  que  la  machine  a  demontre 220 theoremes du  calcul  des CR LF
propositions  (tautologies)  en  3  minutes. Il en tire argument pour la CR LF
superiorite  d'une  approche  algorithmique  par  rapport a une approche CR LF
heuristique comme celle du "Logic Theorist" de Newell, Shaw et  Simon (a CR LF
partir de 1956 sur la machine JOHNNIAC de la Rand Corporation): un debat CR LF
qui dure encore... CR LF
 CR LF
Cet  algorithme  a  ete popularise par J. McCarthy, comme exemple-fanion CR LF
d'application  de LISP. Il figure dans le manuel de la premiere  version CR LF
de  LISP  (LISP  1,  sur IBM 704 justement, le manuel est date  de  Mars CR LF
1960), et il a ete repris dans le celebre "LISP 1.5 Programmer's Manual" CR LF
publie en 1962 par MIT Press, un des maitres-livres de l'Informatique. CR LF
 CR LF
 CR LF
 CR LF



> Below is a log showing how I get #\Return (code 10) using read-char in
> those other lisps, but not CLISP.  Any suggestions?  

You're trying to read a binary stream containing control codes.  So read
it as such, process the control codes, and convert the bytes that encode
characters into strings.  See above.



> CL-USER(1): (setq *locale* (find-locale "C"))
> #<locale "C" [:LATIN1-BASE] @ #x100004067b2>
> CL-USER(2): (let (ch)
>     (with-open-file (in #P"test0")
>                     (loop while (setq ch (read-char in nil))
>                           collect (char-code ch))))
> (34 120 13 10 121 34 13 10)

You should do the reverse: read bytes, and convert them to characters
when they are bytes encoding characters.

Beware also that char-code and code-char use an unspecified code.
Prefer functions such as #+clisp ext:convert-string-from-bytes or
com.informatimago.common-lisp.cesarum.ascii:ascii-string or from the
babel package, which use a definite encoding.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
You can take the lisper out of the lisp job, but you can't take the lisp out
of the lisper (; -- antifuchs

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Matt K. <kau...@cs...> - 2013-05-20 13:12:07

Thank you; that makes sense.

Returning to my original problem, regarding READ rather than
READ-CHAR:

You first reply told me how to do the read I desired by reading a
string into a buffer.  But for my purposes, I'd like to call READ on
arbitrary objects.  The following example illustrates how I would like
READ to invert PRIN1.  Using same set-ups as I sent you before, CLISP
returns nil in the following case, while the other six Lisps I
mentioned all return t.  Although x is a particular string in this
example, imagine that it could be any sort of object for which
equality can be tested by EQUAL.  Is there a way to set up CLISP so
that this never returns nil?

(let ((x (concatenate 'string
                      "a" (string (code-char 13)) (string #\Newline) "b")))
  (delete-file "out0")
  (with-open-file
   (out "out0" :direction :output)
   (prin1 x out))
  (with-open-file
   (in "out0" :direction :input)
   (equal (read in) x)))

Interestingly, I see that the following returns the value
(4 #\a #\Return #\Newline #\b):

(let ((s (read-from-string
          (concatenate 'string
                       "\"" "a" (string (code-char 13)) (string #\Newline) "b" "\""))))
  (list (length s) (char s 0) (char s 1) (char s 2) (char s 3)))

So one answer is to read the entire file into a string using READ-BYTE
and EXT:CONVERT-STRING-FROM-BYTES, and then call READ-FROM-STRING on
that string instead of calling READ on a stream, maintaining the next
position from which to read.  Perhaps I could even arrange for a
string input stream, so that I don't need to maintain the position.
But I'd prefer simply to use just READ, and I'd hoped that I could do
so after evaluating the following, but that's not the case.

(setq custom:*default-file-encoding*
      (ext:make-encoding :charset 'charset:iso-8859-1
                         :line-terminator :unix))

Any suggestions for how I can call read so that it inverts prin1 in
the sense explained above?

Thanks --
-- Matt
   From: "Pascal J. Bourguignon" <pj...@in...>
   Cc: cli...@li...
   Date: Mon, 20 May 2013 01:31:38 +0200

   Matt Kaufmann <kau...@cs...> writes:

   > Thank you very much for getting back to me so quickly.  That helps,
   > but I'd like to be able to read in code 10 using the function
   > READ-CHAR, and I don't see how to do that in CLISP, even though I can
   > do it in Allegro CL, CCL, CMUCL, LispWorks, SBCL, and GCL.  My sample
   > file contains six characters as follows, where the line break consists
   > of #\Return followed by #\Newline:

   I told you how to do it.  Read bytes, not characters.
   You can always convert bytes to characters later with
   ext:convert-string-from-bytes

   (deftype octet () '(unsigned-byte 8))

   (with-open-file (in #P"~/tmp/misc/wang.dos"
			:element-type 'octet)
     (loop
       :for byte = (read-byte in nil in)
       :until (eq byte in)
       :do (case byte
	     ((13) (princ " CR"))
	     ((10) (princ " LF") (princ #\Newline))
	     (otherwise
	      (if (or (<= 32 byte 126)
		      (<= 160 byte 255))
		  (princ (ext:convert-string-from-bytes (vector byte) charset:iso-8859-1))
		  (format "<CODE ~D>" byte)))))
     (values))

   Hao Wang, logicien americain. CR LF
    CR LF
   L'algorithme en  question  a  ete  publie  en  1960  dans l'IBM Journal, CR LF
   article intitule "Toward  Mechanical Mathematics", avec des variantes et CR LF
   une  extension au calcul  des  predicats.  Il  s'agit  ici  du  "premier CR LF
   programme" de Wang, systeme "P". CR LF
    CR LF
   L'article a ete ecrit en 1958, et les experiences effectuees sur IBM 704 CR LF
   - machine a lampes, 32 k  mots  de 36 bits, celle-la meme qui vit naitre CR LF
   LISP a la meme epoque. Le programme  a  ete ecrit en assembleur (Fortran CR LF
   existait, mais il ne s'etait pas encore impose)  et  l'auteur estime que CR LF
   "there is very little in the program that is not straightforward". CR LF
    CR LF
   Il observe que les preuves engendrees sont "essentiellement des arbres", CR LF
   et  annonce  que  la  machine  a  demontre 220 theoremes du  calcul  des CR LF
   propositions  (tautologies)  en  3  minutes. Il en tire argument pour la CR LF
   superiorite  d'une  approche  algorithmique  par  rapport a une approche CR LF
   heuristique comme celle du "Logic Theorist" de Newell, Shaw et  Simon (a CR LF
   partir de 1956 sur la machine JOHNNIAC de la Rand Corporation): un debat CR LF
   qui dure encore... CR LF
    CR LF
   Cet  algorithme  a  ete popularise par J. McCarthy, comme exemple-fanion CR LF
   d'application  de LISP. Il figure dans le manuel de la premiere  version CR LF
   de  LISP  (LISP  1,  sur IBM 704 justement, le manuel est date  de  Mars CR LF
   1960), et il a ete repris dans le celebre "LISP 1.5 Programmer's Manual" CR LF
   publie en 1962 par MIT Press, un des maitres-livres de l'Informatique. CR LF
    CR LF
    CR LF
    CR LF

   > Below is a log showing how I get #\Return (code 10) using read-char in
   > those other lisps, but not CLISP.  Any suggestions?  

   You're trying to read a binary stream containing control codes.  So read
   it as such, process the control codes, and convert the bytes that encode
   characters into strings.  See above.

   > CL-USER(1): (setq *locale* (find-locale "C"))
   > #<locale "C" [:LATIN1-BASE] @ #x100004067b2>
   > CL-USER(2): (let (ch)
   >     (with-open-file (in #P"test0")
   >                     (loop while (setq ch (read-char in nil))
   >                           collect (char-code ch))))
   > (34 120 13 10 121 34 13 10)

   You should do the reverse: read bytes, and convert them to characters
   when they are bytes encoding characters.

   Beware also that char-code and code-char use an unspecified code.
   Prefer functions such as #+clisp ext:convert-string-from-bytes or
   com.informatimago.common-lisp.cesarum.ascii:ascii-string or from the
   babel package, which use a definite encoding.

   -- 
   __Pascal Bourguignon__                     http://www.informatimago.com/
   A bad day in () is better than a good day in {}.
   You can take the lisper out of the lisp job, but you can't take the lisp out
   of the lisper (; -- antifuchs

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Pascal J. B. <pj...@in...> - 2013-05-20 16:19:27

Matt Kaufmann <kau...@cs...> writes:

> Thank you; that makes sense.
>
> Returning to my original problem, regarding READ rather than
> READ-CHAR:
>
> You first reply told me how to do the read I desired by reading a
> string into a buffer.  But for my purposes, I'd like to call READ on
> arbitrary objects.  The following example illustrates how I would like
> READ to invert PRIN1.  Using same set-ups as I sent you before, CLISP
> returns nil in the following case, while the other six Lisps I
> mentioned all return t.  Although x is a particular string in this
> example, imagine that it could be any sort of object for which
> equality can be tested by EQUAL.  Is there a way to set up CLISP so
> that this never returns nil?
>
> (let ((x (concatenate 'string
>                       "a" (string (code-char 13)) (string #\Newline) "b")))
>   (delete-file "out0")
>   (with-open-file
>    (out "out0" :direction :output)
>    (prin1 x out))
>   (with-open-file
>    (in "out0" :direction :input)
>    (equal (read in) x)))
>
> Interestingly, I see that the following returns the value
> (4 #\a #\Return #\Newline #\b):
>
> (let ((s (read-from-string
>           (concatenate 'string
>                        "\"" "a" (string (code-char 13)) (string #\Newline) "b" "\""))))
>   (list (length s) (char s 0) (char s 1) (char s 2) (char s 3)))
>
> So one answer is to read the entire file into a string using READ-BYTE
> and EXT:CONVERT-STRING-FROM-BYTES, and then call READ-FROM-STRING on
> that string instead of calling READ on a stream, maintaining the next
> position from which to read.  Perhaps I could even arrange for a
> string input stream, so that I don't need to maintain the position.
> But I'd prefer simply to use just READ, and I'd hoped that I could do
> so after evaluating the following, but that's not the case.
>
> (setq custom:*default-file-encoding*
>       (ext:make-encoding :charset 'charset:iso-8859-1
>                          :line-terminator :unix))
>
> Any suggestions for how I can call read so that it inverts prin1 in
> the sense explained above?

How is READ related to CRLF vs. CR vs. LF?


I still don't understand why you're concerned with how the lines are
terminated.  While it's understandable that you may want to generate
text files with a definite line termination sequence (eg. on MS-Windows
you want CRLF, but on Unix you want LF), while reading text files, why
would you care what line terminator is used?


(defparameter *unix-external-format*
  (ext:make-encoding :charset charset:iso-8859-1
                     :line-terminator :unix
                     :input-error-action :error 
                     :output-error-action :error))

(defparameter *dos-external-format*
  (ext:make-encoding :charset charset:iso-8859-1
                     :line-terminator :dos
                     :input-error-action :error 
                     :output-error-action :error))

(defun dump (pathname &optional (*standard-output* *standard-output*))
  (with-open-file (data pathname :element-type '(unsigned-byte 8))
    (let ((buffer (make-array 16 :element-type '(unsigned-byte 8))))
      (loop
        :for offset :from 0 :by 16
        :for size = (read-sequence buffer data)
        :while (plusp size)
        :do (format t "~&~8,'0X:~{ ~2,'0X~} ~{~A~}~%"
                    offset
                    (coerce (subseq buffer 0 size) 'list)
                    (map 'list (lambda (code)
                                 (if (or (<= 32 code 126)
                                         (<= 160 code 255))
                                     (code-char code)
                                     "?"))
                         (subseq buffer 0 size)))))))

(defun demo ()
  (loop
    :for efname :in '(unix dos)
    :for external-format :in (list *unix-external-format* *dos-external-format*)
    :do (print efname)
    :do (with-open-file (src "/tmp/src.lisp" 
                             :direction :output
                             :external-format external-format
                             :if-does-not-exist :create
                             :if-exists :supersede)
          (let ((*print-right-margin* 20))
            (pprint '(defun fact (x)
                      (if (zerop x)
                          1
                          (* x (fact (1- x)))))
                    src)))
    :do (dump "/tmp/src.lisp")
    :do (with-open-file (src "/tmp/src.lisp" 
                             :direction :input
                             :external-format charset:iso-8859-1
                             :if-does-not-exist :create
                             :if-exists :supersede)
          (print (read src))
          (terpri) (terpri)))
  (values))

(demo)

UNIX 
00000000: 0A 28 44 45 46 55 4E 20 46 41 43 54 20 28 58 29 ?(DEFUN FACT (X)
00000010: 0A 20 28 49 46 20 28 5A 45 52 4F 50 20 58 29 20 ? (IF (ZEROP X) 
00000020: 31 0A 20 20 28 2A 20 58 0A 20 20 20 28 46 41 43 1?  (* X?   (FAC
00000030: 54 20 28 31 2D 20 58 29 29 29 29 29 T (1- X)))))

(DEFUN FACT (X) (IF (ZEROP X) 1 (* X (FACT (1- X))))) 


DOS 
00000000: 0D 0A 28 44 45 46 55 4E 20 46 41 43 54 20 28 58 ??(DEFUN FACT (X
00000010: 29 0D 0A 20 28 49 46 20 28 5A 45 52 4F 50 20 58 )?? (IF (ZEROP X
00000020: 29 20 31 0D 0A 20 20 28 2A 20 58 0D 0A 20 20 20 ) 1??  (* X??   
00000030: 28 46 41 43 54 20 28 31 2D 20 58 29 29 29 29 29 (FACT (1- X)))))

(DEFUN FACT (X) (IF (ZEROP X) 1 (* X (FACT (1- X))))) 



As you can see, you can write files with either the unix or the dos line
terminator, and when you read them using an unspecified line terminator,
they read as the same text (and therefore the same sexp).  This allows
you to pass seamlessly files between MacOS, Unix (including MacOSX), and
MS-Windows, reading them with the same clisp program.



-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
You can take the lisper out of the lisp job, but you can't take the lisp out
of the lisper (; -- antifuchs

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Matt K. <kau...@cs...> - 2013-05-21 04:21:22

Hi --

The problem probably only arises when lines break in the middle of
strings.  To see what I mean, replace '(defun fact (x) ...) in your
definition of demo with the following.

  (concatenate 'string
	       "a" (string (code-char 13)) (string #\Newline) "b")

Here are the results.  Notice that this time, the results are
different (but that probably won't surprise you).

[5]> (demo)

UNIX 
00000000: 0A 22 61 0D 0A 62 22 ?"a??b"

"a
b" 


DOS 
00000000: 0D 0A 22 61 0D 0D 0A 62 22 ??"a???b"

"a

b" 


[6]> 

Anyhow, maybe that answers your question:

>> How is READ related to CRLF vs. CR vs. LF?

That is: the issue is when CRLF or CR or LF is in the middle of a
string object.

As I mentioned in my preceding email, I would like READ to invert
PRIN1.  This seems a natural thing to want, though I'm not claiming
it's required of CLISP or any other Lisp.  In my preceding email I
gave an example where this isn't the case in CLISP (but it is the case
in the other six Lisps I tested).  As I showed, setting
custom:*default-file-encoding* doesn't help.  Perhaps nothing helps,
but if there is a way for READ to invert PRIN1 (in the sense of the
example I sent in my preceding email), that would be great to know.
If not -- thanks anyhow for your time.

Regards,
Matt
   From: "Pascal J. Bourguignon" <pj...@in...>
   Date: Mon, 20 May 2013 18:18:47 +0200
   Organization: Informatimago

   Matt Kaufmann <kau...@cs...> writes:

   > Thank you; that makes sense.
   >
   > Returning to my original problem, regarding READ rather than
   > READ-CHAR:
   >
   > You first reply told me how to do the read I desired by reading a
   > string into a buffer.  But for my purposes, I'd like to call READ on
   > arbitrary objects.  The following example illustrates how I would like
   > READ to invert PRIN1.  Using same set-ups as I sent you before, CLISP
   > returns nil in the following case, while the other six Lisps I
   > mentioned all return t.  Although x is a particular string in this
   > example, imagine that it could be any sort of object for which
   > equality can be tested by EQUAL.  Is there a way to set up CLISP so
   > that this never returns nil?
   >
   > (let ((x (concatenate 'string
   >                       "a" (string (code-char 13)) (string #\Newline) "b")))
   >   (delete-file "out0")
   >   (with-open-file
   >    (out "out0" :direction :output)
   >    (prin1 x out))
   >   (with-open-file
   >    (in "out0" :direction :input)
   >    (equal (read in) x)))
   >
   > Interestingly, I see that the following returns the value
   > (4 #\a #\Return #\Newline #\b):
   >
   > (let ((s (read-from-string
   >           (concatenate 'string
   >                        "\"" "a" (string (code-char 13)) (string #\Newline) "b" "\""))))
   >   (list (length s) (char s 0) (char s 1) (char s 2) (char s 3)))
   >
   > So one answer is to read the entire file into a string using READ-BYTE
   > and EXT:CONVERT-STRING-FROM-BYTES, and then call READ-FROM-STRING on
   > that string instead of calling READ on a stream, maintaining the next
   > position from which to read.  Perhaps I could even arrange for a
   > string input stream, so that I don't need to maintain the position.
   > But I'd prefer simply to use just READ, and I'd hoped that I could do
   > so after evaluating the following, but that's not the case.
   >
   > (setq custom:*default-file-encoding*
   >       (ext:make-encoding :charset 'charset:iso-8859-1
   >                          :line-terminator :unix))
   >
   > Any suggestions for how I can call read so that it inverts prin1 in
   > the sense explained above?

   How is READ related to CRLF vs. CR vs. LF?


   I still don't understand why you're concerned with how the lines are
   terminated.  While it's understandable that you may want to generate
   text files with a definite line termination sequence (eg. on MS-Windows
   you want CRLF, but on Unix you want LF), while reading text files, why
   would you care what line terminator is used?


   (defparameter *unix-external-format*
     (ext:make-encoding :charset charset:iso-8859-1
			:line-terminator :unix
			:input-error-action :error 
			:output-error-action :error))

   (defparameter *dos-external-format*
     (ext:make-encoding :charset charset:iso-8859-1
			:line-terminator :dos
			:input-error-action :error 
			:output-error-action :error))

   (defun dump (pathname &optional (*standard-output* *standard-output*))
     (with-open-file (data pathname :element-type '(unsigned-byte 8))
       (let ((buffer (make-array 16 :element-type '(unsigned-byte 8))))
	 (loop
	   :for offset :from 0 :by 16
	   :for size = (read-sequence buffer data)
	   :while (plusp size)
	   :do (format t "~&~8,'0X:~{ ~2,'0X~} ~{~A~}~%"
		       offset
		       (coerce (subseq buffer 0 size) 'list)
		       (map 'list (lambda (code)
				    (if (or (<= 32 code 126)
					    (<= 160 code 255))
					(code-char code)
					"?"))
			    (subseq buffer 0 size)))))))

   (defun demo ()
     (loop
       :for efname :in '(unix dos)
       :for external-format :in (list *unix-external-format* *dos-external-format*)
       :do (print efname)
       :do (with-open-file (src "/tmp/src.lisp" 
				:direction :output
				:external-format external-format
				:if-does-not-exist :create
				:if-exists :supersede)
	     (let ((*print-right-margin* 20))
	       (pprint '(defun fact (x)
			 (if (zerop x)
			     1
			     (* x (fact (1- x)))))
		       src)))
       :do (dump "/tmp/src.lisp")
       :do (with-open-file (src "/tmp/src.lisp" 
				:direction :input
				:external-format charset:iso-8859-1
				:if-does-not-exist :create
				:if-exists :supersede)
	     (print (read src))
	     (terpri) (terpri)))
     (values))

   (demo)

   UNIX 
   00000000: 0A 28 44 45 46 55 4E 20 46 41 43 54 20 28 58 29 ?(DEFUN FACT (X)
   00000010: 0A 20 28 49 46 20 28 5A 45 52 4F 50 20 58 29 20 ? (IF (ZEROP X) 
   00000020: 31 0A 20 20 28 2A 20 58 0A 20 20 20 28 46 41 43 1?  (* X?   (FAC
   00000030: 54 20 28 31 2D 20 58 29 29 29 29 29 T (1- X)))))

   (DEFUN FACT (X) (IF (ZEROP X) 1 (* X (FACT (1- X))))) 


   DOS 
   00000000: 0D 0A 28 44 45 46 55 4E 20 46 41 43 54 20 28 58 ??(DEFUN FACT (X
   00000010: 29 0D 0A 20 28 49 46 20 28 5A 45 52 4F 50 20 58 )?? (IF (ZEROP X
   00000020: 29 20 31 0D 0A 20 20 28 2A 20 58 0D 0A 20 20 20 ) 1??  (* X??   
   00000030: 28 46 41 43 54 20 28 31 2D 20 58 29 29 29 29 29 (FACT (1- X)))))

   (DEFUN FACT (X) (IF (ZEROP X) 1 (* X (FACT (1- X))))) 



   As you can see, you can write files with either the unix or the dos line
   terminator, and when you read them using an unspecified line terminator,
   they read as the same text (and therefore the same sexp).  This allows
   you to pass seamlessly files between MacOS, Unix (including MacOSX), and
   MS-Windows, reading them with the same clisp program.



   -- 
   __Pascal Bourguignon__                     http://www.informatimago.com/
   A bad day in () is better than a good day in {}.
   You can take the lisper out of the lisp job, but you can't take the lisp out
   of the lisper (; -- antifuchs


   ------------------------------------------------------------------------------
   AlienVault Unified Security Management (USM) platform delivers complete
   security visibility with the essential security capabilities. Easily and
   efficiently configure, manage, and operate all of your security controls
   from a single console and one unified framework. Download a free trial.
   http://p.sf.net/sfu/alienvault_d2d
   _______________________________________________
   clisp-list mailing list
   cli...@li...
   https://lists.sourceforge.net/lists/listinfo/clisp-list

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Pascal J. B. <pj...@in...> - 2013-05-22 20:30:57

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
You can take the lisper out of the lisp job, but you can't take the lisp out
of the lisper (; -- antifuchs

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Pascal J. B. <pj...@in...> - 2013-05-22 21:47:43

Matt Kaufmann <kau...@cs...> writes:

> Hi --
>
> The problem probably only arises when lines break in the middle of
> strings.  To see what I mean, replace '(defun fact (x) ...) in your
> definition of demo with the following.
>
>   (concatenate 'string
> 	       "a" (string (code-char 13)) (string #\Newline) "b")
>
> Here are the results.  Notice that this time, the results are
> different (but that probably won't surprise you).
>

Ok, let's consider a string like:

(defparameter *str* "Hello
World")

Obviously, this string contains a new line.

Again, why do you care whether there's a CRLF code sequence or just a LF
code in the file?

CL-USER> (with-open-file (src "/tmp/a.lisp" :external-format (ext:make-encoding :charset charset:iso-8859-1 
                                                                                :line-terminator :dos))
           (read src))
(DEFPARAMETER *STR*
 "Hello
World")
CL-USER> (load "/tmp/a.lisp" :external-format (ext:make-encoding :charset charset:iso-8859-1 
                                                                 :line-terminator :dos))
;; Loading file /tmp/a.lisp ...
;; Loaded file /tmp/a.lisp
#P"/tmp/a.lisp"
CL-USER> (length *str*)
11
CL-USER> 

On the other hand, if you care whether your sequence contains codes 13
10 or just 10, why do you use strings?

   (concatenate 'vector #(93) #(13) #(10) #(94))
   --> #(93 13 10 94)

or just:

   (vector 93 13 10 94)
   --> #(93 13 10 94)

or just:

   #(93 13 10 94)

Now if you want to insert a lot of ASCII-encoded bytes, you can always
write a reader macro:

(defun c-escaped-character-map (escaped-character)
  (case escaped-character
    ((#\newline) -1)
    ((#\a)        7)
    ((#\b)        8)
    ((#\t)        9)
    ((#\n)       10)
    ((#\v)       11)
    ((#\f)       12)
    ((#\r)       13)
    ((#\x)       :hexa)
    ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7) :octal)
    (otherwise   :default)))

(defun character-code-reader-macro (stream quotation-mark)
  (declare (ignore quotation-mark))
  (flet ((encode (ch)
           ;; TOOD: Use babel or something else to get the uncode code-
           ;;       point of the character.
           (char-code ch)))
    (let ((ch (read-char stream)))
      (if (char= #\\ ch)
          (let ((ch (read-char stream))
                (code (c-escaped-character-map ch)))
            (flet ((read-code (*read-base* base-name)
                     (let ((code (read stream)))
                       (if (and (integerp code) (<= 0 code (1- char-code-limit)))
                           code
                           (error "Invalid hexadecimal character code: ~A" code)))))
              (case code
                (:hexa  (read-code 16 "hexadecimal"))
                (:octal (read-code  8 "octal"))
                (:default ;; In emacs ?\x = ?x
                 (encode ch))
                (otherwise code))))
          ;; or use #+clisp ext:string-to-bytes :
          (encode ch)))))

(set-macro-character #\? 'character-code-reader-macro t)

#(?a ?\a ?\r ?\n ?b ?\b ?\x41 ?\61 ?\\ ?\z ?' ?\')
--> #(97 7 13 10 98 8 65 49 92 122 39 39)

(See also:
http://paste.lisp.org/display/137262
for a C string reader.)

> Anyhow, maybe that answers your question:
>
>>> How is READ related to CRLF vs. CR vs. LF?
>
> That is: the issue is when CRLF or CR or LF is in the middle of a
> string object.
>
> As I mentioned in my preceding email, I would like READ to invert
> PRIN1.  This seems a natural thing to want, though I'm not claiming
> it's required of CLISP or any other Lisp.  

Again, what is in the string is a newline.  What clisp will read is a
newline, and what clisp will print is a newline.  Newlines
everywhere. :-)

If you should care about the codes, then you should use binary streams,
and read and write bytes, not text.  READ and PRIN1 read and write text.

What YOU should not do, is to insert into strings non-character
characters such as #\return.  For one thing, they make your program non
conforming since they are only semi-standard (ie. an implementation may
just not have them).

    (concatenate 'string
      "\"" "a" (string (code-char 13)) (string #\Newline) "b" "\"")
                       --------------
                              ^
                              |
The error is here ------------+

> [5]> (demo)
>
> UNIX 
> 00000000: 0A 22 61 0D 0A 62 22 ?"a??b"
>
> "a
> b" 

If you consider that this file is wrongly encoded (I could agree with
you on this point, IF I admited #\return (and other such strange
"characters") in strings),  the I will argue that the following file is
also ill-formed:

> DOS 
> 00000000: 0D 0A 22 61 0D 0D 0A 62 22 ??"a???b"
>
> "a
>
> b" 

Because a stray CR in a DOS file is not a good idea either.

Again, are we talking about text files?  
Or about teletype control binary streams?

There is not only #\return and its ilk that you should avoid in strings.

Let's take for example #\xd800.  You should not insert this so called
"character" into strings either because it is not a character.  It's a
unicode code point that doesn't encode any character (or even any
character part!)

If you were to put such a "character" in a clisp string, and write out a
file (eg. using utf-8 or utf-16 encoding), you would create most
probably an invalid file.  Just like your two files above.  (The first
is not a valid unix text file, the second is not a valid DOS text file).

By the way, some implementations just don't have a character with code
#xd800:

    #+ccl (code-char #xd800) --> NIL

The codes between 0 and 31, 127, and between 128 and 159, to talk only
of the code in the iso-8859-1 range, are similar: they don't encode
characters, and you should just NOT include them in any string, and of
course, not write them in a TEXT file (you can write those codes in a
binary file, if such a binary file format requires them).

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
You can take the lisper out of the lisp job, but you can't take the lisp out
of the lisper (; -- antifuchs

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Matt K. <kau...@cs...> - 2013-05-23 00:29:02

Hi --

Certainly I'd never knowingly write a #\Return, i.e. (code-char 13),
into a text file.  This problem showed up when a user of my
application created a file with #\Return characters, probably on a
Windows system, that I read back in on a Linux system.  Even then,
it's not exactly a huge problem, since the #\Return in front of each
#\Newline was dropped by READ ("newlines everywhere", as you say).
But we support 7 host Lisps, and our application, ACL2, was
complaining because a checksum computed was different for CLISP than
for the other six Lisps.  (I don't want to go into the whole story
about how ACL2 "certifies books", computes checksums, etc....)

Anyhow, thanks for your time.  I think I see your point and I may
simply not worry about the dropped #\Return characters.  Or perhaps we
should indeed disallow non-text characters such as #\Return in text
files; I may consider that.  As I've stated a couple of times, we
would like to read back in what was written; but I don't want to
defend that.  I can live with CLISP not behaving like those other six
Lisps, and I think you've answered my original question:

   Again, what is in the string is a newline.  What clisp will read is a
   newline, and what clisp will print is a newline.  Newlines
   everywhere. :-)

-- Matt
   From: "Pascal J. Bourguignon" <pj...@in...>
   Cc: cli...@li...
   Date: Wed, 22 May 2013 23:46:30 +0200

   Matt Kaufmann <kau...@cs...> writes:

   > Hi --
   >
   > The problem probably only arises when lines break in the middle of
   > strings.  To see what I mean, replace '(defun fact (x) ...) in your
   > definition of demo with the following.
   >
   >   (concatenate 'string
   > 	       "a" (string (code-char 13)) (string #\Newline) "b")
   >
   > Here are the results.  Notice that this time, the results are
   > different (but that probably won't surprise you).
   >

   Ok, let's consider a string like:

   (defparameter *str* "Hello
   World")

   Obviously, this string contains a new line.

   Again, why do you care whether there's a CRLF code sequence or just a LF
   code in the file?

   CL-USER> (with-open-file (src "/tmp/a.lisp" :external-format (ext:make-encoding :charset charset:iso-8859-1 
										   :line-terminator :dos))
	      (read src))
   (DEFPARAMETER *STR*
    "Hello
   World")
   CL-USER> (load "/tmp/a.lisp" :external-format (ext:make-encoding :charset charset:iso-8859-1 
								    :line-terminator :dos))
   ;; Loading file /tmp/a.lisp ...
   ;; Loaded file /tmp/a.lisp
   #P"/tmp/a.lisp"
   CL-USER> (length *str*)
   11
   CL-USER> 

   On the other hand, if you care whether your sequence contains codes 13
   10 or just 10, why do you use strings?

      (concatenate 'vector #(93) #(13) #(10) #(94))
      --> #(93 13 10 94)

   or just:

      (vector 93 13 10 94)
      --> #(93 13 10 94)

   or just:

      #(93 13 10 94)

   Now if you want to insert a lot of ASCII-encoded bytes, you can always
   write a reader macro:

   (defun c-escaped-character-map (escaped-character)
     (case escaped-character
       ((#\newline) -1)
       ((#\a)        7)
       ((#\b)        8)
       ((#\t)        9)
       ((#\n)       10)
       ((#\v)       11)
       ((#\f)       12)
       ((#\r)       13)
       ((#\x)       :hexa)
       ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7) :octal)
       (otherwise   :default)))

   (defun character-code-reader-macro (stream quotation-mark)
     (declare (ignore quotation-mark))
     (flet ((encode (ch)
	      ;; TOOD: Use babel or something else to get the uncode code-
	      ;;       point of the character.
	      (char-code ch)))
       (let ((ch (read-char stream)))
	 (if (char= #\\ ch)
	     (let ((ch (read-char stream))
		   (code (c-escaped-character-map ch)))
	       (flet ((read-code (*read-base* base-name)
			(let ((code (read stream)))
			  (if (and (integerp code) (<= 0 code (1- char-code-limit)))
			      code
			      (error "Invalid hexadecimal character code: ~A" code)))))
		 (case code
		   (:hexa  (read-code 16 "hexadecimal"))
		   (:octal (read-code  8 "octal"))
		   (:default ;; In emacs ?\x = ?x
		    (encode ch))
		   (otherwise code))))
	     ;; or use #+clisp ext:string-to-bytes :
	     (encode ch)))))

   (set-macro-character #\? 'character-code-reader-macro t)

   #(?a ?\a ?\r ?\n ?b ?\b ?\x41 ?\61 ?\\ ?\z ?' ?\')
   --> #(97 7 13 10 98 8 65 49 92 122 39 39)

   (See also:
   http://paste.lisp.org/display/137262
   for a C string reader.)

   > Anyhow, maybe that answers your question:
   >
   >>> How is READ related to CRLF vs. CR vs. LF?
   >
   > That is: the issue is when CRLF or CR or LF is in the middle of a
   > string object.
   >
   > As I mentioned in my preceding email, I would like READ to invert
   > PRIN1.  This seems a natural thing to want, though I'm not claiming
   > it's required of CLISP or any other Lisp.  

   Again, what is in the string is a newline.  What clisp will read is a
   newline, and what clisp will print is a newline.  Newlines
   everywhere. :-)

   If you should care about the codes, then you should use binary streams,
   and read and write bytes, not text.  READ and PRIN1 read and write text.

   What YOU should not do, is to insert into strings non-character
   characters such as #\return.  For one thing, they make your program non
   conforming since they are only semi-standard (ie. an implementation may
   just not have them).

       (concatenate 'string
	 "\"" "a" (string (code-char 13)) (string #\Newline) "b" "\"")
			  --------------
				 ^
				 |
   The error is here ------------+

   > [5]> (demo)
   >
   > UNIX 
   > 00000000: 0A 22 61 0D 0A 62 22 ?"a??b"
   >
   > "a
   > b" 

   If you consider that this file is wrongly encoded (I could agree with
   you on this point, IF I admited #\return (and other such strange
   "characters") in strings),  the I will argue that the following file is
   also ill-formed:

   > DOS 
   > 00000000: 0D 0A 22 61 0D 0D 0A 62 22 ??"a???b"
   >
   > "a
   >
   > b" 

   Because a stray CR in a DOS file is not a good idea either.

   Again, are we talking about text files?  
   Or about teletype control binary streams?

   There is not only #\return and its ilk that you should avoid in strings.

   Let's take for example #\xd800.  You should not insert this so called
   "character" into strings either because it is not a character.  It's a
   unicode code point that doesn't encode any character (or even any
   character part!)

   If you were to put such a "character" in a clisp string, and write out a
   file (eg. using utf-8 or utf-16 encoding), you would create most
   probably an invalid file.  Just like your two files above.  (The first
   is not a valid unix text file, the second is not a valid DOS text file).

   By the way, some implementations just don't have a character with code
   #xd800:

       #+ccl (code-char #xd800) --> NIL

   The codes between 0 and 31, 127, and between 128 and 159, to talk only
   of the code in the iso-8859-1 range, are similar: they don't encode
   characters, and you should just NOT include them in any string, and of
   course, not write them in a TEXT file (you can write those codes in a
   binary file, if such a binary file format requires them).

   -- 
   __Pascal Bourguignon__                     http://www.informatimago.com/
   A bad day in () is better than a good day in {}.
   You can take the lisper out of the lisp job, but you can't take the lisp out
   of the lisper (; -- antifuchs

Re: [clisp-list] reading of CR/LF for charset:iso-8859-1

From: Pascal J. B. <pj...@in...> - 2013-05-23 18:33:59

Matt Kaufmann <kau...@cs...> writes:

> Hi --
>
> Certainly I'd never knowingly write a #\Return, i.e. (code-char 13),
> into a text file.  This problem showed up when a user of my
> application created a file with #\Return characters, probably on a
> Windows system, that I read back in on a Linux system.  Even then,
> it's not exactly a huge problem, since the #\Return in front of each
> #\Newline was dropped by READ ("newlines everywhere", as you say).
> But we support 7 host Lisps, and our application, ACL2, was
> complaining because a checksum computed was different for CLISP than
> for the other six Lisps.  (I don't want to go into the whole story
> about how ACL2 "certifies books", computes checksums, etc....)

Too bad because that's obviously where the problem lies!

You should do the checksum either at the level of the characters in
lisp, or at the level of the bytes in the file.  Having checksum differ
because clisp ignores a CR while reading a text files shows obviously
that the checksum is not done as it should.

But I won't elaborate, since you don't want to go into it. ;-)

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
You can take the lisper out of the lisp job, but you can't take the lisp out
of the lisper (; -- antifuchs