From: Sam Steingold <sds@gn...> - 2002-07-25 17:14:15
It has been requested on many occasions that CLISP provide an option to
treat CR/LF/CR+LF differently on character input (right now all three
are read as #\Newline STREAM-ELEMENT-TYPE is CHARACTER).
The answer to these requests has been to use binary i/o.
6 months ago it was suggested that a :LINE-TERMINATOR-STRICT-P option
be added to the ENCODING object.
The problem is that this feature will produce unexpected results:
READ-LINE will return strings with embedded #\Newline!
ANSI does not appear to forbid it.
In CLISP, #\Newline is identical to #\Linefeed (which is specifically
permitted by <http://www.lisp.org/HyperSpec/Body/sec_13-1-7.html>).
Therefore, if the file is exactly this string:
(concatenate 'string "foo" (string #\Linefeed) "bar"
(string #\Return) (string #\Linefeed))
and we open it with
(setq e (make-encoding :charset "ascii" :line-terminator :dos
(setq s (open "foo" :external-format e))
then the string returned by (READ-LINE s) will contain an embedded
#\Newline between "foo" and "bar" (because a single #\Linefeed is not a
#\Newline in the specified encoding, it will not make READ-LINE return,
but it _is_ a CLISP #\Newline!)
Therefore, files "foo" and "bar", written with
(with-open-file (o "bar" :direction :output :external-format e)
(with-open-file (i "foo" :external-format e)
(write-line (read-line i) o)))
will be different:
---- foo ----
---- bar ----
We already have this behavior (unless the ENCODING's LINE-TERMINATOR is
:UNIX), the point here is that :LINE-TERMINATOR-STRICT-P does _not_ fix
Is anyone still interested in this :LINE-TERMINATOR-STRICT-P feature?
Do you see any problems with the behavior I just described?
Sam Steingold (http://www.podval.org/~sds) running RedHat7.3 GNU/Linux
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
You can have it good, soon or cheap. Pick two...