Tracker: Patches

5 fix cygwin default encoding - ID: 1633552
Last Update: Comment added ( haible )

src/encoding.d (encoding_from_name) has a wrong cygwin textmount logic:

#if defined(WIN32) || (defined(UNIX) && (O_BINARY != 0))
pushSTACK(S(Kdos)); /* :line-terminator */

not binary => textmount: doseol

Attached patch (with changelog) fixes that. Thanks to Aaron Brown
<arundelo@hotmail.com> for finding this.


Reini Urban ( rurban ) - 2007-01-11 21:35

5

Closed

Rejected

Sam Steingold

None

None

Public


Comments ( 13 )

Date: 2007-01-31 23:50
Sender: haibleProject Admin

I concur with Sam: The essential question is "what do other programs on the
same
platform expect?". When I ported clisp to Woe32 in 1997, Notepad - the
default
editor for small text files - did not display files with Unix NLs right.

The best choice of default line terminator depends on what programs the
user
uses; this is a personal preference and therefore the ~/.clisprc.lisp is
exactly the right place for handling it.

The expression defined(UNIX) && (O_BINARY != 0) is only a generalizing
way
of writing #ifdef UNIX_CYGWIN.



Date: 2007-01-31 15:10
Sender: sdsProject AdminAccepting Donations

good.
if Aaron is not happy, please direct him to clisp-list.
I cannot post to cygwin lists, my mail there is rejected.


Date: 2007-01-30 07:12
Sender: rurbanProject DonorAccepting Donations

Thanks,
Let's stick with that.


Date: 2007-01-29 03:56
Sender: sdsProject AdminAccepting Donations

http://clisp.podval.org/impnotes/encoding.html#line-term-default


Date: 2007-01-29 03:13
Sender: sdsProject AdminAccepting Donations

thanks for the reference to the cygwin mailing list.

The original problem is best solved by a
(setq *default-file-encoding* :unix)
in ~/.clisprc.lisp

now to the alleged CLISP bug.
the logic behind the original code:
#if defined(WIN32) || (defined(UNIX) && (O_BINARY != 0))
pushSTACK(S(Kdos)); /* :line-terminator */
#else
pushSTACK(S(Kunix)); /* :line-terminator */
#endif
is the following:
:external-format and encodings are only used for character (text) streams,

so we need to guess what kind of encodings the files on this system would
usually expect.
(note that on _input_ CLISP will recognize all 3 possible line
terminators:
http://clisp.cons.org/impnotes/clhs-newline.html
http://www.unicode.org/reports/tr13/tr13-9.html
so this whole issue only really matters for output).
so, the question is: what line terminators do OTHER programs expect from
TEXT files?

if we are running on a windows machine, most text files are probably CRLF
and most programs expect that.
note that even the cygwin CLISP is expected to write files useful for
other (non-cygwin) programs,
so the fact that they really expect CRLF does matter to us.

if we are running on a UNIX box with a non-0 O_BINARY, this means that
there is a separate BINARY mode
for some files (like *.gz) and a separate TEXT mode for other files (like
*.c).
the "educated guess" here is to use CRLF.

both these heuristics (win32 and unix/o_binary) mean that the cygwin
default should indeed be :DOS.



Date: 2007-01-28 23:24
Sender: rurbanProject DonorAccepting Donations

The logic should be:
- win32 ==> :dos
- unix: normal ==> :unix
-- cygwin ==> :unix

One could check on cygwin the default mountpoint, if it defines dos eol or
unix eol, but we at cygwin don't want to do that for now.
Using a cygwin default "textmount" (dos eol) is not recommended anymore.

Why?
[1]> *default-file-encoding*
#<ENCODING CHARSET:ASCII :DOS>

See the discussion starting with
http://sourceware.org/ml/cygwin/2007-01/msg00052.html



Date: 2007-01-28 20:56
Sender: sdsProject AdminAccepting Donations

what should the logic be?
- win32 ==> :dos
- unix: normal ==> :unix
-- cygwin ==> ???
where is O_BINARY documented?
its presence appears to indicate that files can be opened either as text
or binary.
why (and how) should it affect the default encoding?
keep in mind that CLISP _always_ opens file with O_BINARY.


Date: 2007-01-28 19:04
Sender: rurbanProject DonorAccepting Donations

$ grep _FBINARY /usr/include/sys/fcntl.h
#define _FBINARY 0x10000
#define O_BINARY _FBINARY

$ echo '#include <fcntl.h>' > .zzz.c; gcc -E -dM .zzz.c | grep BINARY;
#define _O_BINARY O_BINARY
#define _O_RAW O_BINARY
#define _FBINARY 0x10000
#define O_BINARY _FBINARY




Date: 2007-01-28 01:57
Sender: sdsProject AdminAccepting Donations

how is O_BINARY defined in cygwin?
echo '#include <fcntl.h>' > .zzz.c; gcc -E -dM .zzz.c | grep BINARY; rm -f
.zzz.c
it appears that it is not defined on linux at all.



Date: 2007-01-27 23:02
Sender: rurbanProject DonorAccepting Donations

Sorry, I thought the UNIX logic applied to CYGWIN only.
Fixed that in cyg-encoding2.patch to check for WIN32 and __CYGWIN__.


Date: 2007-01-27 23:00
Sender: rurbanProject DonorAccepting Donations

File Added: cyg-encoding2.patch


Date: 2007-01-26 03:20
Sender: sf-robotSourceForge.net Site Admin

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).


Date: 2007-01-11 22:06
Sender: sdsProject AdminAccepting Donations

This patch does not seem right.
on linux O_BINARY==0 and I see no reason to default line termination to
:DOS there.
I am rejecting it pending your convincing me that I am wrong here.


Comments have been closed for this artifact.

Attached Files ( 2 )

Filename Description Download
cyg-encoding.patch Download
cyg-encoding2.patch patch fixed Download

Changes ( 16 )

Field Old Value Date By
close_date 2007-01-29 03:13 2007-01-30 07:12 rurban
status_id Pending 2007-01-30 07:12 rurban
close_date - 2007-01-29 03:13 sds
status_id Open 2007-01-29 03:13 sds
summary [PATCH] fix cygwin default encoding 2007-01-28 01:57 sds
File Added 213401: cyg-encoding2.patch 2007-01-27 23:00 rurban
status_id Closed 2007-01-27 23:00 rurban
close_date 2007-01-26 03:20 2007-01-27 23:00 rurban
close_date 2007-01-11 22:06 2007-01-26 03:20 sf-robot
status_id Pending 2007-01-26 03:20 sf-robot
data_type 301355 2007-01-11 22:06 sds
close_date - 2007-01-11 22:06 sds
resolution_id None 2007-01-11 22:06 sds
assigned_to nobody 2007-01-11 22:06 sds
status_id Open 2007-01-11 22:06 sds
File Added 210898: cyg-encoding.patch 2007-01-11 21:35 rurban