From: SourceForge.net <no...@so...> - 2003-05-03 20:56:15
|
Feature Requests item #731952, was opened at 2003-05-03 16:56 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sam Steingold (sds) Assigned to: Nobody/Anonymous (nobody) Summary: faithful character i/o Initial Comment: CLISP READ-CHAR reads bytes 10 and 13 as #\Newline: <http://article.gmane.org/gmane.lisp.clisp.general/6970> <http://article.gmane.org/gmane.lisp.clisp.general/4718> Is it possible to read them differently? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 |
From: SourceForge.net <no...@so...> - 2006-10-16 14:17:25
|
Feature Requests item #731952, was opened at 2003-05-03 16:56 Message generated for change (Comment added) made by sds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Open >Resolution: None Priority: 5 Submitted By: Sam Steingold (sds) Assigned to: Bruno Haible (haible) Summary: faithful character i/o Initial Comment: CLISP READ-CHAR reads bytes 10 and 13 as #\Newline: <http://article.gmane.org/gmane.lisp.clisp.general/6970> <http://article.gmane.org/gmane.lisp.clisp.general/4718> Is it possible to read them differently? ---------------------------------------------------------------------- >Comment By: Sam Steingold (sds) Date: 2006-10-16 10:17 Message: Logged In: YES user_id=5735 looks like this is more than just a user issue https://sourceforge.net/tracker/index.php?func=detail&aid=1578179&group_id=1355&atid=101355 ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2004-05-25 10:53 Message: Logged In: YES user_id=5735 this item is now closed as invalid. thanks to Bruno for clarifying it. see <impnotes.html#clhs-newline> for the exhaustive treatement of the matter. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2004-03-18 06:55 Message: Logged In: YES user_id=5923 No. Accepting CR, LF and CRLF as different variations of #\Newline implements the recommendations of the Unicode consortium in http://www.unicode.org/reports/tr13/tr13-9.html. Quote: "Even if you know which characters represents NLF on your particular platform, on input and in interpretation, treat CR, LF, CRLF ...L the same. Only on output do you need to distinguish between them." It also reflects user wishes: 1) For years, GCC used to give parse errors on some C input files that used CRLF as line terminators, whereas with just LF the parse succeeded. 2) GNU gettext had similar problems, and it was reported as a bug, because apparently users on Unix sometimes have Windows written files on their disks. The way CLISP does it, a priori prevents this kind of bug from the beginning. There is no need to add complexities to CLISP to implement the paradigms of the 1980ies, that are just not valid any more in today's world. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 |
From: SourceForge.net <no...@so...> - 2006-11-17 18:47:13
|
Feature Requests item #731952, was opened at 2003-05-03 16:56 Message generated for change (Comment added) made by sds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Sam Steingold (sds) Assigned to: Bruno Haible (haible) Summary: faithful character i/o Initial Comment: CLISP READ-CHAR reads bytes 10 and 13 as #\Newline: <http://article.gmane.org/gmane.lisp.clisp.general/6970> <http://article.gmane.org/gmane.lisp.clisp.general/4718> Is it possible to read them differently? ---------------------------------------------------------------------- >Comment By: Sam Steingold (sds) Date: 2006-11-17 13:47 Message: Logged In: YES user_id=5735 Originator: YES Suppose we add :line-terminator-strict slot to encodings, making the newline input "faithful": :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline (row: input characters; column: line terminator of the encoding). alas, in CLISP #\Linefeed == #\Newline (as explicitly permitted &c), so the reality is thus: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Newline #\Newline CRLF #\Return#\Newline #\Newline#\Newline #\Newline which plain sucks for everything but the :UNIX line terminator. How about using something other than 10 for Newline? How about 0? (i.e., #\Null = #\Newline) 0 does not normally occur in _text_ streams, so it will not cause the confusion we are experiencing. just about any control character (except bs/tab/nl/ret) would do too. http://en.wikipedia.org/wiki/ASCII ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-10-16 10:17 Message: Logged In: YES user_id=5735 looks like this is more than just a user issue https://sourceforge.net/tracker/index.php?func=detail&aid=1578179&group_id=1355&atid=101355 ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2004-05-25 10:53 Message: Logged In: YES user_id=5735 this item is now closed as invalid. thanks to Bruno for clarifying it. see <impnotes.html#clhs-newline> for the exhaustive treatement of the matter. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2004-03-18 06:55 Message: Logged In: YES user_id=5923 No. Accepting CR, LF and CRLF as different variations of #\Newline implements the recommendations of the Unicode consortium in http://www.unicode.org/reports/tr13/tr13-9.html. Quote: "Even if you know which characters represents NLF on your particular platform, on input and in interpretation, treat CR, LF, CRLF ...L the same. Only on output do you need to distinguish between them." It also reflects user wishes: 1) For years, GCC used to give parse errors on some C input files that used CRLF as line terminators, whereas with just LF the parse succeeded. 2) GNU gettext had similar problems, and it was reported as a bug, because apparently users on Unix sometimes have Windows written files on their disks. The way CLISP does it, a priori prevents this kind of bug from the beginning. There is no need to add complexities to CLISP to implement the paradigms of the 1980ies, that are just not valid any more in today's world. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 |
From: SourceForge.net <no...@so...> - 2006-11-17 18:50:14
|
Feature Requests item #731952, was opened at 2003-05-03 16:56 Message generated for change (Comment added) made by sds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Sam Steingold (sds) Assigned to: Bruno Haible (haible) Summary: faithful character i/o Initial Comment: CLISP READ-CHAR reads bytes 10 and 13 as #\Newline: <http://article.gmane.org/gmane.lisp.clisp.general/6970> <http://article.gmane.org/gmane.lisp.clisp.general/4718> Is it possible to read them differently? ---------------------------------------------------------------------- >Comment By: Sam Steingold (sds) Date: 2006-11-17 13:50 Message: Logged In: YES user_id=5735 Originator: YES actually, using #\Code128==#\U0080 seems to be a good option! ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 13:47 Message: Logged In: YES user_id=5735 Originator: YES Suppose we add :line-terminator-strict slot to encodings, making the newline input "faithful": :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline (row: input characters; column: line terminator of the encoding). alas, in CLISP #\Linefeed == #\Newline (as explicitly permitted &c), so the reality is thus: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Newline #\Newline CRLF #\Return#\Newline #\Newline#\Newline #\Newline which plain sucks for everything but the :UNIX line terminator. How about using something other than 10 for Newline? How about 0? (i.e., #\Null = #\Newline) 0 does not normally occur in _text_ streams, so it will not cause the confusion we are experiencing. just about any control character (except bs/tab/nl/ret) would do too. http://en.wikipedia.org/wiki/ASCII ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-10-16 10:17 Message: Logged In: YES user_id=5735 looks like this is more than just a user issue https://sourceforge.net/tracker/index.php?func=detail&aid=1578179&group_id=1355&atid=101355 ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2004-05-25 10:53 Message: Logged In: YES user_id=5735 this item is now closed as invalid. thanks to Bruno for clarifying it. see <impnotes.html#clhs-newline> for the exhaustive treatement of the matter. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2004-03-18 06:55 Message: Logged In: YES user_id=5923 No. Accepting CR, LF and CRLF as different variations of #\Newline implements the recommendations of the Unicode consortium in http://www.unicode.org/reports/tr13/tr13-9.html. Quote: "Even if you know which characters represents NLF on your particular platform, on input and in interpretation, treat CR, LF, CRLF ...L the same. Only on output do you need to distinguish between them." It also reflects user wishes: 1) For years, GCC used to give parse errors on some C input files that used CRLF as line terminators, whereas with just LF the parse succeeded. 2) GNU gettext had similar problems, and it was reported as a bug, because apparently users on Unix sometimes have Windows written files on their disks. The way CLISP does it, a priori prevents this kind of bug from the beginning. There is no need to add complexities to CLISP to implement the paradigms of the 1980ies, that are just not valid any more in today's world. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 |
From: SourceForge.net <no...@so...> - 2006-11-20 16:22:54
|
Feature Requests item #731952, was opened at 2003-05-03 22:56 Message generated for change (Comment added) made by haible You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Rejected Priority: 5 Private: No Submitted By: Sam Steingold (sds) Assigned to: Bruno Haible (haible) Summary: faithful character i/o Initial Comment: CLISP READ-CHAR reads bytes 10 and 13 as #\Newline: <http://article.gmane.org/gmane.lisp.clisp.general/6970> <http://article.gmane.org/gmane.lisp.clisp.general/4718> Is it possible to read them differently? ---------------------------------------------------------------------- >Comment By: Bruno Haible (haible) Date: 2006-11-20 17:22 Message: Logged In: YES user_id=5923 Originator: NO Such a :line-terminator-strict option is indeed theoretically possible. You would need to assign #\Newline to a different code point, outside the Unicode range, for example #x110000. (The Unicode people for some time favoured the use of #x85 as a 3rd newline character, but apparently dropped the idea.) So reading in normal mode would produce: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline And reading in :line-terminator-strict would produce: :UNIX :MAC :DOS CR #\Return #\Return #\Return LF #\Linefeed #\Linefeed #\Linefeed CRLF #\Return#\Linefeed #\Return#\Linefeed #\Return#\Linefeed But what would be the effect of such a change: - No longer (eql #\Newline #\Linefeed) -> backward compatibility problem, - No longer (= (char-code #\Newline) 10) -> Unix compatibility problem (because we would be copying a DOS concept into a Unix world), - .fas files that are edited with an editor on Windows (and thus get LF converted into CRLF) change their meaning when being saved. So forget about it. It creates more problems than it solves. ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 19:50 Message: Logged In: YES user_id=5735 Originator: YES actually, using #\Code128==#\U0080 seems to be a good option! ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 19:47 Message: Logged In: YES user_id=5735 Originator: YES Suppose we add :line-terminator-strict slot to encodings, making the newline input "faithful": :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline (row: input characters; column: line terminator of the encoding). alas, in CLISP #\Linefeed == #\Newline (as explicitly permitted &c), so the reality is thus: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Newline #\Newline CRLF #\Return#\Newline #\Newline#\Newline #\Newline which plain sucks for everything but the :UNIX line terminator. How about using something other than 10 for Newline? How about 0? (i.e., #\Null = #\Newline) 0 does not normally occur in _text_ streams, so it will not cause the confusion we are experiencing. just about any control character (except bs/tab/nl/ret) would do too. http://en.wikipedia.org/wiki/ASCII ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-10-16 16:17 Message: Logged In: YES user_id=5735 looks like this is more than just a user issue https://sourceforge.net/tracker/index.php?func=detail&aid=1578179&group_id=1355&atid=101355 ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2004-05-25 16:53 Message: Logged In: YES user_id=5735 this item is now closed as invalid. thanks to Bruno for clarifying it. see <impnotes.html#clhs-newline> for the exhaustive treatement of the matter. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2004-03-18 12:55 Message: Logged In: YES user_id=5923 No. Accepting CR, LF and CRLF as different variations of #\Newline implements the recommendations of the Unicode consortium in http://www.unicode.org/reports/tr13/tr13-9.html. Quote: "Even if you know which characters represents NLF on your particular platform, on input and in interpretation, treat CR, LF, CRLF ...L the same. Only on output do you need to distinguish between them." It also reflects user wishes: 1) For years, GCC used to give parse errors on some C input files that used CRLF as line terminators, whereas with just LF the parse succeeded. 2) GNU gettext had similar problems, and it was reported as a bug, because apparently users on Unix sometimes have Windows written files on their disks. The way CLISP does it, a priori prevents this kind of bug from the beginning. There is no need to add complexities to CLISP to implement the paradigms of the 1980ies, that are just not valid any more in today's world. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 |
From: SourceForge.net <no...@so...> - 2006-11-20 16:48:12
|
Feature Requests item #731952, was opened at 2003-05-03 16:56 Message generated for change (Comment added) made by sds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Rejected Priority: 5 Private: No Submitted By: Sam Steingold (sds) Assigned to: Bruno Haible (haible) Summary: faithful character i/o Initial Comment: CLISP READ-CHAR reads bytes 10 and 13 as #\Newline: <http://article.gmane.org/gmane.lisp.clisp.general/6970> <http://article.gmane.org/gmane.lisp.clisp.general/4718> Is it possible to read them differently? ---------------------------------------------------------------------- >Comment By: Sam Steingold (sds) Date: 2006-11-20 11:48 Message: Logged In: YES user_id=5735 Originator: YES >Such a :line-terminator-strict option is indeed theoretically possible. >You would need to assign #\Newline to a different code point, outside the >Unicode range, for example #x110000. I don't see why I cannot use #x80 (#\Code128==#\U0080) for newline. I am not inventing a new unicode char, I am assigning an integer to a CLISP character, and this integer (128) is not used at this time. also, your tables indicate that you are missing the point of my message. Your first table (identical to my first table) is what you get if :line-terminator-strict is non-nil and #\newline is distinct from both #\lf and #\cr. your second table is relevant only to binary input and cannot be produced under any combinations of :line-terminator-strict and separate #\nl proposals. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2006-11-20 11:22 Message: Logged In: YES user_id=5923 Originator: NO Such a :line-terminator-strict option is indeed theoretically possible. You would need to assign #\Newline to a different code point, outside the Unicode range, for example #x110000. (The Unicode people for some time favoured the use of #x85 as a 3rd newline character, but apparently dropped the idea.) So reading in normal mode would produce: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline And reading in :line-terminator-strict would produce: :UNIX :MAC :DOS CR #\Return #\Return #\Return LF #\Linefeed #\Linefeed #\Linefeed CRLF #\Return#\Linefeed #\Return#\Linefeed #\Return#\Linefeed But what would be the effect of such a change: - No longer (eql #\Newline #\Linefeed) -> backward compatibility problem, - No longer (= (char-code #\Newline) 10) -> Unix compatibility problem (because we would be copying a DOS concept into a Unix world), - .fas files that are edited with an editor on Windows (and thus get LF converted into CRLF) change their meaning when being saved. So forget about it. It creates more problems than it solves. ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 13:50 Message: Logged In: YES user_id=5735 Originator: YES actually, using #\Code128==#\U0080 seems to be a good option! ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 13:47 Message: Logged In: YES user_id=5735 Originator: YES Suppose we add :line-terminator-strict slot to encodings, making the newline input "faithful": :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline (row: input characters; column: line terminator of the encoding). alas, in CLISP #\Linefeed == #\Newline (as explicitly permitted &c), so the reality is thus: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Newline #\Newline CRLF #\Return#\Newline #\Newline#\Newline #\Newline which plain sucks for everything but the :UNIX line terminator. How about using something other than 10 for Newline? How about 0? (i.e., #\Null = #\Newline) 0 does not normally occur in _text_ streams, so it will not cause the confusion we are experiencing. just about any control character (except bs/tab/nl/ret) would do too. http://en.wikipedia.org/wiki/ASCII ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-10-16 10:17 Message: Logged In: YES user_id=5735 looks like this is more than just a user issue https://sourceforge.net/tracker/index.php?func=detail&aid=1578179&group_id=1355&atid=101355 ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2004-05-25 10:53 Message: Logged In: YES user_id=5735 this item is now closed as invalid. thanks to Bruno for clarifying it. see <impnotes.html#clhs-newline> for the exhaustive treatement of the matter. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2004-03-18 06:55 Message: Logged In: YES user_id=5923 No. Accepting CR, LF and CRLF as different variations of #\Newline implements the recommendations of the Unicode consortium in http://www.unicode.org/reports/tr13/tr13-9.html. Quote: "Even if you know which characters represents NLF on your particular platform, on input and in interpretation, treat CR, LF, CRLF ...L the same. Only on output do you need to distinguish between them." It also reflects user wishes: 1) For years, GCC used to give parse errors on some C input files that used CRLF as line terminators, whereas with just LF the parse succeeded. 2) GNU gettext had similar problems, and it was reported as a bug, because apparently users on Unix sometimes have Windows written files on their disks. The way CLISP does it, a priori prevents this kind of bug from the beginning. There is no need to add complexities to CLISP to implement the paradigms of the 1980ies, that are just not valid any more in today's world. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 |
From: SourceForge.net <no...@so...> - 2006-12-26 22:36:15
|
Feature Requests item #731952, was opened at 2003-05-03 16:56 Message generated for change (Comment added) made by sds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Rejected Priority: 5 Private: No Submitted By: Sam Steingold (sds) Assigned to: Bruno Haible (haible) Summary: faithful character i/o Initial Comment: CLISP READ-CHAR reads bytes 10 and 13 as #\Newline: <http://article.gmane.org/gmane.lisp.clisp.general/6970> <http://article.gmane.org/gmane.lisp.clisp.general/4718> Is it possible to read them differently? ---------------------------------------------------------------------- >Comment By: Sam Steingold (sds) Date: 2006-12-26 17:36 Message: Logged In: YES user_id=5735 Originator: YES I don't see any compatibility issues. any text stream knows its preferred encoding, so #\Newline is never written as its char-code. the woe32 editing of fas files issue is fairly rare, and the only problem there would occur if there are embedded newlines in strings. this should be addressed by always quoting CR&LF in all strings, symbols and package names in compiled files (we know that we are reading from a compiled file when stream is the same as *load-file*). ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-20 11:48 Message: Logged In: YES user_id=5735 Originator: YES >Such a :line-terminator-strict option is indeed theoretically possible. >You would need to assign #\Newline to a different code point, outside the >Unicode range, for example #x110000. I don't see why I cannot use #x80 (#\Code128==#\U0080) for newline. I am not inventing a new unicode char, I am assigning an integer to a CLISP character, and this integer (128) is not used at this time. also, your tables indicate that you are missing the point of my message. Your first table (identical to my first table) is what you get if :line-terminator-strict is non-nil and #\newline is distinct from both #\lf and #\cr. your second table is relevant only to binary input and cannot be produced under any combinations of :line-terminator-strict and separate #\nl proposals. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2006-11-20 11:22 Message: Logged In: YES user_id=5923 Originator: NO Such a :line-terminator-strict option is indeed theoretically possible. You would need to assign #\Newline to a different code point, outside the Unicode range, for example #x110000. (The Unicode people for some time favoured the use of #x85 as a 3rd newline character, but apparently dropped the idea.) So reading in normal mode would produce: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline And reading in :line-terminator-strict would produce: :UNIX :MAC :DOS CR #\Return #\Return #\Return LF #\Linefeed #\Linefeed #\Linefeed CRLF #\Return#\Linefeed #\Return#\Linefeed #\Return#\Linefeed But what would be the effect of such a change: - No longer (eql #\Newline #\Linefeed) -> backward compatibility problem, - No longer (= (char-code #\Newline) 10) -> Unix compatibility problem (because we would be copying a DOS concept into a Unix world), - .fas files that are edited with an editor on Windows (and thus get LF converted into CRLF) change their meaning when being saved. So forget about it. It creates more problems than it solves. ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 13:50 Message: Logged In: YES user_id=5735 Originator: YES actually, using #\Code128==#\U0080 seems to be a good option! ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 13:47 Message: Logged In: YES user_id=5735 Originator: YES Suppose we add :line-terminator-strict slot to encodings, making the newline input "faithful": :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline (row: input characters; column: line terminator of the encoding). alas, in CLISP #\Linefeed == #\Newline (as explicitly permitted &c), so the reality is thus: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Newline #\Newline CRLF #\Return#\Newline #\Newline#\Newline #\Newline which plain sucks for everything but the :UNIX line terminator. How about using something other than 10 for Newline? How about 0? (i.e., #\Null = #\Newline) 0 does not normally occur in _text_ streams, so it will not cause the confusion we are experiencing. just about any control character (except bs/tab/nl/ret) would do too. http://en.wikipedia.org/wiki/ASCII ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-10-16 10:17 Message: Logged In: YES user_id=5735 looks like this is more than just a user issue https://sourceforge.net/tracker/index.php?func=detail&aid=1578179&group_id=1355&atid=101355 ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2004-05-25 10:53 Message: Logged In: YES user_id=5735 this item is now closed as invalid. thanks to Bruno for clarifying it. see <impnotes.html#clhs-newline> for the exhaustive treatement of the matter. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2004-03-18 06:55 Message: Logged In: YES user_id=5923 No. Accepting CR, LF and CRLF as different variations of #\Newline implements the recommendations of the Unicode consortium in http://www.unicode.org/reports/tr13/tr13-9.html. Quote: "Even if you know which characters represents NLF on your particular platform, on input and in interpretation, treat CR, LF, CRLF ...L the same. Only on output do you need to distinguish between them." It also reflects user wishes: 1) For years, GCC used to give parse errors on some C input files that used CRLF as line terminators, whereas with just LF the parse succeeded. 2) GNU gettext had similar problems, and it was reported as a bug, because apparently users on Unix sometimes have Windows written files on their disks. The way CLISP does it, a priori prevents this kind of bug from the beginning. There is no need to add complexities to CLISP to implement the paradigms of the 1980ies, that are just not valid any more in today's world. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 |
From: SourceForge.net <no...@so...> - 2006-12-31 15:45:58
|
Feature Requests item #731952, was opened at 2003-05-03 16:56 Message generated for change (Settings changed) made by sds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Open >Resolution: None Priority: 5 Private: No Submitted By: Sam Steingold (sds) Assigned to: Bruno Haible (haible) Summary: faithful character i/o Initial Comment: CLISP READ-CHAR reads bytes 10 and 13 as #\Newline: <http://article.gmane.org/gmane.lisp.clisp.general/6970> <http://article.gmane.org/gmane.lisp.clisp.general/4718> Is it possible to read them differently? ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-12-26 17:36 Message: Logged In: YES user_id=5735 Originator: YES I don't see any compatibility issues. any text stream knows its preferred encoding, so #\Newline is never written as its char-code. the woe32 editing of fas files issue is fairly rare, and the only problem there would occur if there are embedded newlines in strings. this should be addressed by always quoting CR&LF in all strings, symbols and package names in compiled files (we know that we are reading from a compiled file when stream is the same as *load-file*). ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-20 11:48 Message: Logged In: YES user_id=5735 Originator: YES >Such a :line-terminator-strict option is indeed theoretically possible. >You would need to assign #\Newline to a different code point, outside the >Unicode range, for example #x110000. I don't see why I cannot use #x80 (#\Code128==#\U0080) for newline. I am not inventing a new unicode char, I am assigning an integer to a CLISP character, and this integer (128) is not used at this time. also, your tables indicate that you are missing the point of my message. Your first table (identical to my first table) is what you get if :line-terminator-strict is non-nil and #\newline is distinct from both #\lf and #\cr. your second table is relevant only to binary input and cannot be produced under any combinations of :line-terminator-strict and separate #\nl proposals. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2006-11-20 11:22 Message: Logged In: YES user_id=5923 Originator: NO Such a :line-terminator-strict option is indeed theoretically possible. You would need to assign #\Newline to a different code point, outside the Unicode range, for example #x110000. (The Unicode people for some time favoured the use of #x85 as a 3rd newline character, but apparently dropped the idea.) So reading in normal mode would produce: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline And reading in :line-terminator-strict would produce: :UNIX :MAC :DOS CR #\Return #\Return #\Return LF #\Linefeed #\Linefeed #\Linefeed CRLF #\Return#\Linefeed #\Return#\Linefeed #\Return#\Linefeed But what would be the effect of such a change: - No longer (eql #\Newline #\Linefeed) -> backward compatibility problem, - No longer (= (char-code #\Newline) 10) -> Unix compatibility problem (because we would be copying a DOS concept into a Unix world), - .fas files that are edited with an editor on Windows (and thus get LF converted into CRLF) change their meaning when being saved. So forget about it. It creates more problems than it solves. ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 13:50 Message: Logged In: YES user_id=5735 Originator: YES actually, using #\Code128==#\U0080 seems to be a good option! ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-11-17 13:47 Message: Logged In: YES user_id=5735 Originator: YES Suppose we add :line-terminator-strict slot to encodings, making the newline input "faithful": :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Linefeed #\Linefeed CRLF #\Return#\Newline #\Newline#\Linefeed #\Newline (row: input characters; column: line terminator of the encoding). alas, in CLISP #\Linefeed == #\Newline (as explicitly permitted &c), so the reality is thus: :UNIX :MAC :DOS CR #\Return #\Newline #\Return LF #\Newline #\Newline #\Newline CRLF #\Return#\Newline #\Newline#\Newline #\Newline which plain sucks for everything but the :UNIX line terminator. How about using something other than 10 for Newline? How about 0? (i.e., #\Null = #\Newline) 0 does not normally occur in _text_ streams, so it will not cause the confusion we are experiencing. just about any control character (except bs/tab/nl/ret) would do too. http://en.wikipedia.org/wiki/ASCII ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2006-10-16 10:17 Message: Logged In: YES user_id=5735 looks like this is more than just a user issue https://sourceforge.net/tracker/index.php?func=detail&aid=1578179&group_id=1355&atid=101355 ---------------------------------------------------------------------- Comment By: Sam Steingold (sds) Date: 2004-05-25 10:53 Message: Logged In: YES user_id=5735 this item is now closed as invalid. thanks to Bruno for clarifying it. see <impnotes.html#clhs-newline> for the exhaustive treatement of the matter. ---------------------------------------------------------------------- Comment By: Bruno Haible (haible) Date: 2004-03-18 06:55 Message: Logged In: YES user_id=5923 No. Accepting CR, LF and CRLF as different variations of #\Newline implements the recommendations of the Unicode consortium in http://www.unicode.org/reports/tr13/tr13-9.html. Quote: "Even if you know which characters represents NLF on your particular platform, on input and in interpretation, treat CR, LF, CRLF ...L the same. Only on output do you need to distinguish between them." It also reflects user wishes: 1) For years, GCC used to give parse errors on some C input files that used CRLF as line terminators, whereas with just LF the parse succeeded. 2) GNU gettext had similar problems, and it was reported as a bug, because apparently users on Unix sometimes have Windows written files on their disks. The way CLISP does it, a priori prevents this kind of bug from the beginning. There is no need to add complexities to CLISP to implement the paradigms of the 1980ies, that are just not valid any more in today's world. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351355&aid=731952&group_id=1355 |