CLISP READ-CHAR reads bytes 10 and 13 as #\Newline:
<http://article.gmane.org/gmane.lisp.clisp.general/6970>
<http://article.gmane.org/gmane.lisp.clisp.general/4718>
Is it possible to read them differently?
No. Accepting CR, LF and CRLF as different variations of
#\Newline implements the recommendations of the Unicode
consortium in http://www.unicode.org/reports/tr13/tr13-9.html. Quote:
"Even if you know which characters represents NLF on your
particular platform, on input and in interpretation, treat
CR, LF, CRLF ...L the same. Only on output do you need to
distinguish between them."
It also reflects user wishes: 1) For years, GCC used to give
parse errors on some C input files that used CRLF as line
terminators, whereas with just LF the parse succeeded. 2)
GNU gettext had similar problems, and it was reported as a
bug, because apparently users on Unix sometimes have Windows
written files on their disks.
The way CLISP does it, a priori prevents this kind of bug
from the beginning.
There is no need to add complexities to CLISP to implement
the paradigms of the 1980ies, that are just not valid any
more in today's world.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
which plain sucks for everything but the :UNIX line terminator.
How about using something other than 10 for Newline?
How about 0? (i.e., #\Null = #\Newline)
0 does not normally occur in _text_ streams, so it will not cause the confusion we are experiencing.
just about any control character (except bs/tab/nl/ret) would do too. http://en.wikipedia.org/wiki/ASCII
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Such a :line-terminator-strict option is indeed theoretically possible.
You would need to assign #\Newline to a different code point, outside the
Unicode range, for example #x110000. (The Unicode people for some time
favoured the use of #x85 as a 3rd newline character, but apparently
dropped the idea.)
But what would be the effect of such a change:
- No longer (eql #\Newline #\Linefeed) -> backward compatibility problem,
- No longer (= (char-code #\Newline) 10) -> Unix compatibility problem
(because we would be copying a DOS concept into a Unix world),
- .fas files that are edited with an editor on Windows (and thus get
LF converted into CRLF) change their meaning when being saved.
So forget about it. It creates more problems than it solves.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>Such a :line-terminator-strict option is indeed theoretically possible.
>You would need to assign #\Newline to a different code point, outside the
>Unicode range, for example #x110000.
I don't see why I cannot use #x80 (#\Code128==#\U0080) for newline.
I am not inventing a new unicode char, I am assigning an integer to a CLISP character, and this integer (128) is not used at this time.
also, your tables indicate that you are missing the point of my message.
Your first table (identical to my first table) is what you get if :line-terminator-strict is non-nil and #\newline is distinct from both #\lf and #\cr.
your second table is relevant only to binary input and cannot be produced under any combinations of :line-terminator-strict and separate #\nl proposals.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't see any compatibility issues.
any text stream knows its preferred encoding, so #\Newline is never written as its char-code.
the woe32 editing of fas files issue is fairly rare, and the only problem there would occur if there are embedded newlines in strings.
this should be addressed by always quoting CR&LF in all strings, symbols and package names in compiled files (we know that we are reading from a compiled file when stream is the same as *load-file*).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Logged In: YES
user_id=5923
No. Accepting CR, LF and CRLF as different variations of
#\Newline implements the recommendations of the Unicode
consortium in
http://www.unicode.org/reports/tr13/tr13-9.html. Quote:
"Even if you know which characters represents NLF on your
particular platform, on input and in interpretation, treat
CR, LF, CRLF ...L the same. Only on output do you need to
distinguish between them."
It also reflects user wishes: 1) For years, GCC used to give
parse errors on some C input files that used CRLF as line
terminators, whereas with just LF the parse succeeded. 2)
GNU gettext had similar problems, and it was reported as a
bug, because apparently users on Unix sometimes have Windows
written files on their disks.
The way CLISP does it, a priori prevents this kind of bug
from the beginning.
There is no need to add complexities to CLISP to implement
the paradigms of the 1980ies, that are just not valid any
more in today's world.
Logged In: YES
user_id=5735
this item is now closed as invalid.
thanks to Bruno for clarifying it.
see <impnotes.html#clhs-newline>
for the exhaustive treatement of the matter.
Logged In: YES
user_id=5735
looks like this is more than just a user issue
https://sourceforge.net/tracker/index.php?func=detail&aid=1578179&group_id=1355&atid=101355
Logged In: YES
user_id=5735
Originator: YES
Suppose we add :line-terminator-strict slot to encodings, making the newline input "faithful":
:UNIX :MAC :DOS
CR #\Return #\Newline #\Return
LF #\Newline #\Linefeed #\Linefeed
CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline
(row: input characters; column: line terminator of the encoding).
alas, in CLISP #\Linefeed == #\Newline (as explicitly permitted &c), so the reality is thus:
:UNIX :MAC :DOS
CR #\Return #\Newline #\Return
LF #\Newline #\Newline #\Newline
CRLF #\Return#\Newline #\Newline#\Newline #\Newline
which plain sucks for everything but the :UNIX line terminator.
How about using something other than 10 for Newline?
How about 0? (i.e., #\Null = #\Newline)
0 does not normally occur in _text_ streams, so it will not cause the confusion we are experiencing.
just about any control character (except bs/tab/nl/ret) would do too.
http://en.wikipedia.org/wiki/ASCII
Logged In: YES
user_id=5735
Originator: YES
actually, using #\Code128==#\U0080 seems to be a good option!
Logged In: YES
user_id=5923
Originator: NO
Such a :line-terminator-strict option is indeed theoretically possible.
You would need to assign #\Newline to a different code point, outside the
Unicode range, for example #x110000. (The Unicode people for some time
favoured the use of #x85 as a 3rd newline character, but apparently
dropped the idea.)
So reading in normal mode would produce:
:UNIX :MAC :DOS
CR #\Return #\Newline #\Return
LF #\Newline #\Linefeed #\Linefeed
CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline
And reading in :line-terminator-strict would produce:
:UNIX :MAC :DOS
CR #\Return #\Return #\Return
LF #\Linefeed #\Linefeed #\Linefeed
CRLF #\Return#\Linefeed #\Return#\Linefeed #\Return#\Linefeed
But what would be the effect of such a change:
- No longer (eql #\Newline #\Linefeed) -> backward compatibility problem,
- No longer (= (char-code #\Newline) 10) -> Unix compatibility problem
(because we would be copying a DOS concept into a Unix world),
- .fas files that are edited with an editor on Windows (and thus get
LF converted into CRLF) change their meaning when being saved.
So forget about it. It creates more problems than it solves.
Logged In: YES
user_id=5735
Originator: YES
>Such a :line-terminator-strict option is indeed theoretically possible.
>You would need to assign #\Newline to a different code point, outside the
>Unicode range, for example #x110000.
I don't see why I cannot use #x80 (#\Code128==#\U0080) for newline.
I am not inventing a new unicode char, I am assigning an integer to a CLISP character, and this integer (128) is not used at this time.
also, your tables indicate that you are missing the point of my message.
Your first table (identical to my first table) is what you get if :line-terminator-strict is non-nil and #\newline is distinct from both #\lf and #\cr.
your second table is relevant only to binary input and cannot be produced under any combinations of :line-terminator-strict and separate #\nl proposals.
Logged In: YES
user_id=5735
Originator: YES
I don't see any compatibility issues.
any text stream knows its preferred encoding, so #\Newline is never written as its char-code.
the woe32 editing of fas files issue is fairly rare, and the only problem there would occur if there are embedded newlines in strings.
this should be addressed by always quoting CR&LF in all strings, symbols and package names in compiled files (we know that we are reading from a compiled file when stream is the same as *load-file*).