#24 fix cygwin default encoding

closed-rejected
Sam Steingold
None
5
2007-01-30
2007-01-11
Reini Urban
No

src/encoding.d (encoding_from_name) has a wrong cygwin textmount logic:

#if defined(WIN32) || (defined(UNIX) && (O_BINARY != 0))
pushSTACK(S(Kdos)); /* :line-terminator */

not binary => textmount: doseol

Attached patch (with changelog) fixes that. Thanks to Aaron Brown <arundelo@hotmail.com> for finding this.

Discussion

  • Reini Urban
    Reini Urban
    2007-01-11

     
    Attachments
  • Sam Steingold
    Sam Steingold
    2007-01-11

    • assigned_to: nobody --> sds
    • status: open --> pending-rejected
     
  • Sam Steingold
    Sam Steingold
    2007-01-11

    Logged In: YES
    user_id=5735
    Originator: NO

    This patch does not seem right.
    on linux O_BINARY==0 and I see no reason to default line termination to :DOS there.
    I am rejecting it pending your convincing me that I am wrong here.

     
  • Logged In: YES
    user_id=1312539
    Originator: NO

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
    • status: pending-rejected --> closed-rejected
     
  • Reini Urban
    Reini Urban
    2007-01-27

    Logged In: YES
    user_id=13755
    Originator: YES

    File Added: cyg-encoding2.patch

     
  • Reini Urban
    Reini Urban
    2007-01-27

    patch fixed

     
    Attachments
  • Reini Urban
    Reini Urban
    2007-01-27

    • status: closed-rejected --> open-rejected
     
  • Reini Urban
    Reini Urban
    2007-01-27

    Logged In: YES
    user_id=13755
    Originator: YES

    Sorry, I thought the UNIX logic applied to CYGWIN only.
    Fixed that in cyg-encoding2.patch to check for WIN32 and __CYGWIN__.

     
  • Sam Steingold
    Sam Steingold
    2007-01-28

    • summary: [PATCH] fix cygwin default encoding --> fix cygwin default encoding
     
  • Sam Steingold
    Sam Steingold
    2007-01-28

    Logged In: YES
    user_id=5735
    Originator: NO

    how is O_BINARY defined in cygwin?
    echo '#include <fcntl.h>' > .zzz.c; gcc -E -dM .zzz.c | grep BINARY; rm -f .zzz.c
    it appears that it is not defined on linux at all.

     
  • Reini Urban
    Reini Urban
    2007-01-28

    Logged In: YES
    user_id=13755
    Originator: YES

    $ grep _FBINARY /usr/include/sys/fcntl.h
    #define _FBINARY 0x10000
    #define O_BINARY _FBINARY

    $ echo '#include <fcntl.h>' > .zzz.c; gcc -E -dM .zzz.c | grep BINARY;
    #define _O_BINARY O_BINARY
    #define _O_RAW O_BINARY
    #define _FBINARY 0x10000
    #define O_BINARY _FBINARY

     
  • Sam Steingold
    Sam Steingold
    2007-01-28

    Logged In: YES
    user_id=5735
    Originator: NO

    what should the logic be?
    - win32 ==> :dos
    - unix: normal ==> :unix
    -- cygwin ==> ???
    where is O_BINARY documented?
    its presence appears to indicate that files can be opened either as text or binary.
    why (and how) should it affect the default encoding?
    keep in mind that CLISP _always_ opens file with O_BINARY.

     
  • Reini Urban
    Reini Urban
    2007-01-28

    Logged In: YES
    user_id=13755
    Originator: YES

    The logic should be:
    - win32 ==> :dos
    - unix: normal ==> :unix
    -- cygwin ==> :unix

    One could check on cygwin the default mountpoint, if it defines dos eol or unix eol, but we at cygwin don't want to do that for now.
    Using a cygwin default "textmount" (dos eol) is not recommended anymore.

    Why?
    [1]> *default-file-encoding*
    #<ENCODING CHARSET:ASCII :DOS>

    See the discussion starting with http://sourceware.org/ml/cygwin/2007-01/msg00052.html

     
  • Sam Steingold
    Sam Steingold
    2007-01-29

    Logged In: YES
    user_id=5735
    Originator: NO

    thanks for the reference to the cygwin mailing list.

    The original problem is best solved by a
    (setq *default-file-encoding* :unix)
    in ~/.clisprc.lisp

    now to the alleged CLISP bug.
    the logic behind the original code:
    #if defined(WIN32) || (defined(UNIX) && (O_BINARY != 0))
    pushSTACK(S(Kdos)); /* :line-terminator */
    #else
    pushSTACK(S(Kunix)); /* :line-terminator */
    #endif
    is the following:
    :external-format and encodings are only used for character (text) streams,
    so we need to guess what kind of encodings the files on this system would usually expect.
    (note that on _input_ CLISP will recognize all 3 possible line terminators:
    http://clisp.cons.org/impnotes/clhs-newline.html
    http://www.unicode.org/reports/tr13/tr13-9.html
    so this whole issue only really matters for output).
    so, the question is: what line terminators do OTHER programs expect from TEXT files?

    if we are running on a windows machine, most text files are probably CRLF and most programs expect that.
    note that even the cygwin CLISP is expected to write files useful for other (non-cygwin) programs,
    so the fact that they really expect CRLF does matter to us.

    if we are running on a UNIX box with a non-0 O_BINARY, this means that there is a separate BINARY mode
    for some files (like *.gz) and a separate TEXT mode for other files (like *.c).
    the "educated guess" here is to use CRLF.

    both these heuristics (win32 and unix/o_binary) mean that the cygwin default should indeed be :DOS.

     
  • Sam Steingold
    Sam Steingold
    2007-01-29

    • status: open-rejected --> pending-rejected
     
  • Reini Urban
    Reini Urban
    2007-01-30

    • status: pending-rejected --> closed-rejected
     
  • Reini Urban
    Reini Urban
    2007-01-30

    Logged In: YES
    user_id=13755
    Originator: YES

    Thanks,
    Let's stick with that.

     
  • Sam Steingold
    Sam Steingold
    2007-01-31

    Logged In: YES
    user_id=5735
    Originator: NO

    good.
    if Aaron is not happy, please direct him to clisp-list.
    I cannot post to cygwin lists, my mail there is rejected.

     
  • Bruno Haible
    Bruno Haible
    2007-01-31

    Logged In: YES
    user_id=5923
    Originator: NO

    I concur with Sam: The essential question is "what do other programs on the same
    platform expect?". When I ported clisp to Woe32 in 1997, Notepad - the default
    editor for small text files - did not display files with Unix NLs right.

    The best choice of default line terminator depends on what programs the user
    uses; this is a personal preference and therefore the ~/.clisprc.lisp is
    exactly the right place for handling it.

    The expression defined(UNIX) && (O_BINARY != 0) is only a generalizing way
    of writing #ifdef UNIX_CYGWIN.