From: <don...@is...> - 2004-03-16 23:59:11
|
> please read my two messages referred to in the RFE above and explain > here - or in a comment to the RFE - what behavior, and _why_(!), you > consider correct in the specific cases I mention there. > > specifically, is it OK for READ-LINE to return a string with an > embedded newline? why? why not? > how do you propose to avoid it if you do not consider to OK? To me the issue is what exactly is a newline. I don't think it has to be the same as #\return or #\linefeed. I think that if the line terminator mode is :unix then it should be ok for READ-LINE to return a string with an embedded #\return and if line terminator mode is :mac then it should be ok to return a string with #\linefeed, and if :dos then it should be ok to return a string with both of those (but not #\return followed immediately by #\linefeed). I can't tell for sure but it appears to me that currently in clisp (eq #\linefeed #\newline). I suggest that this not be the case, that newline, return and linefeed be three different characters, or alternatively, that whether (eq #\linefeed #\newline) is true should depend on the "current" line terminator mode. This gets into implementation issues that I don't know about. http://article.gmane.org/gmane.lisp.clisp.general/6970 with-open-file (s "foo.dos" :direction :output :element-type '(unsigned-byte 8)) (write-sequence (mapcar #'char-code '(#\f #\o #\o #\Newline #\b #\a #\r #\Return #\Newline)) s)) now, what should (with-open-file (s "foo.dos" :direction :input :element-type 'character :external-format :dos) (read-line s)) return? There's a problem here, which is that you write and read with different element types and different external format. I therefore think that lots of different answers are permissible. I think the first form should be interpreted according to some translation of newline and return into unsigned byte 8, and the second should be interpreted according to to some understanding of what the element type and external format mean for whatever was produced by the first form. You might reasonably argue that the right string to return is "foo\fbar". Unfortunately, "\f" (== (code-char 10)) is #\Newline in CLISP, so READ-LINE would return a string with an embedded newline, which, if not outright non-compliant, would be quite surprising to a user. Because of this problem, CLISP reads CR, LF and CRLF as #\Newline. http://article.gmane.org/gmane.lisp.clisp.general/4718 It has been requested on many occasions that CLISP provide an option to probably mostly by me treat CR/LF/CR+LF differently on character input (right now all three are read as #\Newline STREAM-ELEMENT-TYPE is CHARACTER). The answer to these requests has been to use binary i/o. 6 months ago it was suggested that a :LINE-TERMINATOR-STRICT-P option be added to the ENCODING object. The problem is that this feature will produce unexpected results: READ-LINE will return strings with embedded #\Newline! ANSI does not appear to forbid it. In CLISP, #\Newline is identical to #\Linefeed (which is specifically permitted by <http://www.lisp.org/HyperSpec/Body/sec_13-1-7.html>). Therefore, if the file is exactly this string: (concatenate 'string "foo" (string #\Linefeed) "bar" (string #\Return) (string #\Linefeed)) and we open it with (setq e (make-encoding :charset "ascii" :line-terminator :dos :line-terminator-strict-p t)) (setq s (open "foo" :external-format e)) then the string returned by (READ-LINE s) will contain an embedded #\Newline between "foo" and "bar" (because a single #\Linefeed is not a #\Newline in the specified encoding, it will not make READ-LINE return, but it _is_ a CLISP #\Newline!) I'd like it to be #\return and for that not to be the same as #\newline. Therefore, files "foo" and "bar", written with (with-open-file (o "bar" :direction :output :external-format e) (with-open-file (i "foo" :external-format e) (write-line (read-line i) o))) ok, so they both use e which is strict dos So the read-line should return only one line containing a linefeed in the middle. Then the write-line should write only one line containing a linefeed in the middle. In both cases there will be a crlf at the end. Or am I missing something here? will be different: ---- foo ---- foo^Jbar ------------- ---- bar ---- foo bar ------------- We already have this behavior (unless the ENCODING's LINE-TERMINATOR is :UNIX), the point here is that :LINE-TERMINATOR-STRICT-P does _not_ fix this. Is anyone still interested in this :LINE-TERMINATOR-STRICT-P feature? Do you see any problems with the behavior I just described? |