From: LRN <lr...@gm...> - 2011-03-04 11:04:49
Attachments:
envt.pl
|
Attached a testscript that exposes the "backticks-with-modified-env-will-trigger-a-fork-failure" bug. Funny thing about it is that when /bin/perl is used, fork works correctly. This bug does not seem to depend on perl version - i've built 5.6.1-2 from msys tarball, and it demonstrates this bug as well. To fix that i've been running tests with /bin/perl AFTER installing it. That required some fixes to the tests, since some of them do not respect $ENV{PERL} and force ./perl I think it is a reasonable assumption that msys-perl will always be running from /bin As a result i've got: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- ../ext/Cwd/t/cwd.t 2 512 29 2 6.90% 22-23 ../ext/Cwd/t/taint.t 8 2048 17 8 47.06% 2 4 6 8 10 12 14 16 ../ext/Socket/t/socketpair.t 23 5888 45 23 51.11% 22-35 37-45 ../lib/File/Copy.t 4 1024 60 4 6.67% 28-29 55-56 ../lib/File/Find/t/find.t 199 62 31.16% 26-28 38-40 51- 53 62-64 80-82 109-113 170 184 186 188 200-227 ../lib/File/Find/t/taint.t 5 1280 45 8 17.78% 26-28 32 46-48 ../lib/File/Spec/t/Spec.t 472 1 0.21% 468 ../lib/Net/Ping/t/450_service.t 26 2 7.69% 9 18 ../lib/Test/t/multiline.t 2 1 50.00% 2 lib/filefind-taint.t 255 65280 45 85 188.89% 3-45 op/stat.t 86 2 2.33% 9 32 op/taint.t 9 2304 238 420 176.47% 1-3 5 31-238 64 tests and 286 subtests skipped. Failed 12/989 test scripts, 98.79% okay. 306/116626 subtests failed, 99.74% okay. All these failures are either mysterious, or related to symlinks or are TODOS (multiline). One is a rebase failure. Not sure why it happens - i've rebased the contents of perl source tree prior to installing it (rebased with the rebase address from perlrebase script). Details: op/stat.....................................# Failed at op/stat.t line 113 # it should not be '1299234589' # but it is. # Check if you are on a tmpfs of some sort. Building in /tmp sometimes # has this problem. Building on the ClearCase VOBS filesystem may also # cause this failure. # # Darwin's UFS doesn't have a ctime concept, and thus is expected to fail # this test. # Failed at op/stat.t line 204 FAILED tests 9, 32 Failed 2/86 tests, 97.67% okay op/taint....................................F:\Msys-got\1.0\bin\perl.exe: *** unable to remap f:\src\perl-5.8.8-trunkfix\perl-5.8.8\lib\auto\Fcntl\Fcntl.dll to same address as parent -- 0x3F0000 0 [main] perl 10684 sync_with_child: child 10664(0x108) died before initialization with status code 0x1 18818 [main] perl 10684 sync_with_child: *** child state child loading dlls F:\Msys-got\1.0\bin\perl.exe: *** unable to remap f:\src\perl-5.8.8-trunkfix\perl-5.8.8\lib\auto\Fcntl\Fcntl.dll to same address as parent -- 0x3D0000 5091622 [main] perl 10684 sync_with_child: child 5136(0x124) died before initialization with status code 0x1 5091641 [main] perl 10684 sync_with_child: *** child state child loading dlls F:\Msys-got\1.0\bin\perl.exe: *** unable to remap f:\src\perl-5.8.8-trunkfix\perl-5.8.8\lib\auto\Fcntl\Fcntl.dll to same address as parent -- 0x3D0000 10154750 [main] perl 10684 sync_with_child: child 6068(0x138) died before initialization with status code 0x1 10154770 [main] perl 10684 sync_with_child: *** child state child loading dlls F:\Msys-got\1.0\bin\perl.exe: *** unable to remap f:\src\perl-5.8.8-trunkfix\perl-5.8.8\lib\auto\Fcntl\Fcntl.dll to same address as parent -- 0x3D0000 15217449 [main] perl 10684 sync_with_child: child 13012(0x144) died before initialization with status code 0x1 15217470 [main] perl 10684 sync_with_child: *** child state child loading dlls F:\Msys-got\1.0\bin\perl.exe: *** unable to remap f:\src\perl-5.8.8-trunkfix\perl-5.8.8\lib\auto\Fcntl\Fcntl.dll to same address as parent -- 0x410000 20279772 [main] perl 10684 sync_with_child: child 10004(0x150) died before initialization with status code 0x1 20279791 [main] perl 10684 sync_with_child: *** child state child loading dlls Insecure dependency in `` while running with -T switch at op/taint.t line 317. # Looks like you planned 238 tests but ran 30. dubious Test returned status 9 (wstat 2304, 0x900) DIED. FAILED tests 1-3, 5, 31-238 Failed 212/238 tests, 10.92% okay lib/filefind-taint..........................dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 3-45 Failed 43/45 tests, 4.44% okay ../ext/Cwd/t/cwd............................ # Failed test in ../ext/Cwd/t/cwd.t at line 171. # '/f/src/perl-5.8.8-trunkfix/perl-5.8.8/t/linktest' # doesn't match '(?-xism:t/_ptrslt_/_path_/_to_/_a_/_dir_$)' # Failed test in ../ext/Cwd/t/cwd.t at line 172. # '/f/src/perl-5.8.8-trunkfix/perl-5.8.8/t/linktest' # doesn't match '(?-xism:t/_ptrslt_/_path_/_to_/_a_/_dir_$)' # Looks like you failed 2 tests of 29. dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 22-23 Failed 2/29 tests, 93.10% okay (less 1 skipped test: 26 okay, 89.66%) ../ext/Cwd/t/taint.......................... # Failed test 'its return value should be tainted' # in ../ext/Cwd/t/taint.t at line 31. # Failed test 'its return value should be tainted' # in ../ext/Cwd/t/taint.t at line 31. # Failed test 'its return value should be tainted' # in ../ext/Cwd/t/taint.t at line 31. # Failed test 'its return value should be tainted' # in ../ext/Cwd/t/taint.t at line 31. # Failed test 'its return value should be tainted' # in ../ext/Cwd/t/taint.t at line 31. # Failed test 'its return value should be tainted' # in ../ext/Cwd/t/taint.t at line 31. # Failed test 'its return value should be tainted' # in ../ext/Cwd/t/taint.t at line 31. # Failed test 'its return value should be tainted' # in ../ext/Cwd/t/taint.t at line 31. # Looks like you failed 8 tests of 17. dubious Test returned status 8 (wstat 2048, 0x800) DIED. FAILED tests 2, 4, 6, 8, 10, 12, 14, 16 Failed 8/17 tests, 52.94% okay ../ext/Socket/t/socketpair.................. # Failed test 'socketpair (LEFT, RIGHT, AF_UNIX, SOCK_DGRAM, PF_UNSPEC)' # in ../ext/Socket/t/socketpair.t at line 176. binmode() on closed filehandle LEFT at ../ext/Socket/t/socketpair.t line 181. binmode() on closed filehandle RIGHT at ../ext/Socket/t/socketpair.t line 182. syswrite() on closed filehandle LEFT at ../ext/Socket/t/socketpair.t line 187. # Failed test 'syswrite to left' # in ../ext/Socket/t/socketpair.t at line 187. # got: undef # expected: '6' syswrite() on closed filehandle LEFT at ../ext/Socket/t/socketpair.t line 187. # Failed test 'syswrite to left' # in ../ext/Socket/t/socketpair.t at line 187. # got: undef # expected: '6' syswrite() on closed filehandle RIGHT at ../ext/Socket/t/socketpair.t line 191. # Failed test 'syswrite to right' # in ../ext/Socket/t/socketpair.t at line 191. # got: undef # expected: '5' syswrite() on closed filehandle RIGHT at ../ext/Socket/t/socketpair.t line 191. # Failed test 'syswrite to right' # in ../ext/Socket/t/socketpair.t at line 191. # got: undef # expected: '6' sysread() on closed filehandle LEFT at ../ext/Socket/t/socketpair.t line 199. # Failed test 'read on left' # in ../ext/Socket/t/socketpair.t at line 199. # got: undef # expected: '5' # Failed test 'content what we expected?' # in ../ext/Socket/t/socketpair.t at line 200. # got: '' # expected: 'perl ' sysread() on closed filehandle LEFT at ../ext/Socket/t/socketpair.t line 199. # Failed test 'read on left' # in ../ext/Socket/t/socketpair.t at line 199. # got: undef # expected: '6' # Failed test 'content what we expected?' # in ../ext/Socket/t/socketpair.t at line 200. # got: '' # expected: 'rules!' sysread() on closed filehandle RIGHT at ../ext/Socket/t/socketpair.t line 205. # Failed test 'read on right' # in ../ext/Socket/t/socketpair.t at line 205. # got: undef # expected: '6' # Failed test 'content what we expected?' # in ../ext/Socket/t/socketpair.t at line 206. # got: '' # expected: 'hello ' sysread() on closed filehandle RIGHT at ../ext/Socket/t/socketpair.t line 205. # Failed test 'read on right' # in ../ext/Socket/t/socketpair.t at line 205. # got: undef # expected: '6' # Failed test 'content what we expected?' # in ../ext/Socket/t/socketpair.t at line 206. # got: '' # expected: 'world # ' shutdown() on closed socket LEFT at ../ext/Socket/t/socketpair.t line 209. # Failed test 'shutdown left for writing' # in ../ext/Socket/t/socketpair.t at line 209. sysread() on closed filehandle RIGHT at ../ext/Socket/t/socketpair.t line 222. # Failed test 'alarm should have fired' # in ../ext/Socket/t/socketpair.t at line 224. # got: '0' # expected: '1' syswrite() on closed filehandle RIGHT at ../ext/Socket/t/socketpair.t line 232. # Failed test 'syswrite to right' # in ../ext/Socket/t/socketpair.t at line 232. # got: undef # expected: '1' syswrite() on closed filehandle RIGHT at ../ext/Socket/t/socketpair.t line 232. # Failed test 'syswrite to right' # in ../ext/Socket/t/socketpair.t at line 232. # got: undef # expected: '1' sysread() on closed filehandle LEFT at ../ext/Socket/t/socketpair.t line 238. # Failed test 'read on left' # in ../ext/Socket/t/socketpair.t at line 238. # got: undef # expected: '1' # Failed test 'content what we expected?' # in ../ext/Socket/t/socketpair.t at line 239. # got: '' # expected: ' ' sysread() on closed filehandle LEFT at ../ext/Socket/t/socketpair.t line 238. # Failed test 'read on left' # in ../ext/Socket/t/socketpair.t at line 238. # got: undef # expected: '1' # Failed test 'content what we expected?' # in ../ext/Socket/t/socketpair.t at line 239. # got: '' # expected: '' # Failed test 'close left' # in ../ext/Socket/t/socketpair.t at line 242. # Failed test 'close right' # in ../ext/Socket/t/socketpair.t at line 243. # Looks like you failed 23 tests of 45. dubious Test returned status 23 (wstat 5888, 0x1700) DIED. FAILED tests 22-35, 37-45 Failed 23/45 tests, 48.89% okay ../lib/File/Copy............................ # Failed test in ../lib/File/Copy.t at line 156. # Failed test in ../lib/File/Copy.t at line 158. # '' # doesn't match '(?-xism:are identical)' # Failed test in ../lib/File/Copy.t at line 156. # Failed test in ../lib/File/Copy.t at line 158. # '' # doesn't match '(?-xism:are identical)' # Looks like you failed 4 tests of 60. dubious Test returned status 4 (wstat 1024, 0x400) DIED. FAILED tests 28-29, 55-56 Failed 4/60 tests, 93.33% okay ../lib/File/Find/t/find.....................# Use of uninitialized value in pattern match (m//) at ../lib/File/Find/t/find.t line 688. rm: cannot lstat `for_find/dangling_dir': Permission denied rm: cannot lstat `for_find/fa/faa/faa_sl': Permission denied rm: cannot lstat `for_find/fb/fba': Permission denied FAILED tests 26-28, 38-40, 51-53, 62-64, 80-82, 109-113, 170, 184, 186, 188, 200-227 Failed 52/199 tests, 73.87% okay ../lib/File/Find/t/taint.................... # Failed test 'Expected and found fa/fsl/fb_ord' # in ../lib/File/Find/t/taint.t at line 107. # Failed test 'Expected and found fa/fsl/fba' # in ../lib/File/Find/t/taint.t at line 107. # Failed test 'Expected and found fa/fsl/fba/fba_ord' # in ../lib/File/Find/t/taint.t at line 107. # Failed test 'Bad untaint pattern causes death in cwd (good)' # in ../lib/File/Find/t/taint.t at line 314. # '' # doesn't match '(?-xism:insecure cwd)' # Failed test 'Cwd not untainted with bad pattern (good)' # in ../lib/File/Find/t/taint.t at line 384. # '' # doesn't match '(?-xism:insecure cwd)' rm: cannot lstat `for_find/fb/fba': Permission denied # Looks like you planned 45 tests but ran 3 extra. # Looks like you failed 5 tests of 48 run. dubious Test returned status 5 (wstat 1280, 0x500) DIED. FAILED tests 26-28, 32, 46-48 Failed 7/45 tests, 84.44% okay ../lib/File/Spec/t/Spec.....................# Test 468 got: "///../../..//a//b/../c" (../lib/File/Spec/t/Spec.t at line 672 fail #385) # Expected: "/a/b/../c" (File::Spec::Epoc->canonpath('///../../..//./././a//b/.././c/././')) # ../lib/File/Spec/t/Spec.t line 672 is: ok $got, $expected, $function; FAILED test 468 Failed 1/472 tests, 99.79% okay (less 83 skipped tests: 388 okay, 82.20%) ../lib/Net/Ping/t/450_service...............# Failed test 9 in ../lib/Net/Ping/t/450_service.t at line 84 # ../lib/Net/Ping/t/450_service.t line 84 is: ok $p -> ping("127.0.0.1"); # Failed test 18 in ../lib/Net/Ping/t/450_service.t at line 143 # ../lib/Net/Ping/t/450_service.t line 143 is: ok $p -> ack(); FAILED tests 9, 18 Failed 2/26 tests, 92.31% okay ../lib/Test/t/multiline.....................FAILED test 2 Failed 1/2 tests, 50.00% okay |
From: LRN <lr...@gm...> - 2011-03-07 05:28:01
|
The next episode of everyone's favourite "Shaping up msys-perl" show! After i concluded that perl is as good as i am ever going to make it, i've decided to work a bit more on the packaging - so that it can be shipped in its current state. Until now i've been promptly ignoring CPAN, but that had to change, since both Msys and msysgit require some extra perl modules that are not part of the default installation. And first thing i've tried to install was Compression::Zlib. Alas, CPAN refused to install it, pointing out that some testcases have failed. I've looked them up. The first one to fail is zlib-generic.pl, line 93: ok $x->close ; Banged my head against that one for some time. Eventually i've discovered that fclose() in msys returns 0 (which is good) with errno == EBADF (which is bad) almost every time. I guess that code of most programs is not paranoid enough to check errno after fclose(), especially if fclose() itself returns 0, and that is why this bug was not noticed until now. Or maybe i'm wrong. Also, perl will save and restore errno by itself when calling fclose() with file descriptor substituted to a bad one (perl wants to make CRT free the FILE stream, while keeping the underlying FD intact), and that will mask the error (intentionally, since perl expects EBADF in that case). I've tried to rig perl to immediately close every FILE it opens, and that doesn't seem to produce non-zero errno. Obviously, it's difficult for me to track all the operations made on such stream between its opening and closing, so i do not know what causes this bug. I've made a hack that resets errno to 0 if fclose() returned 0 and Compression::Zlib passed the test. Not sure how it will affect perl's own testsuide, have not run it with this hack yet. Also, i think i'd need an updated list of CPAN modules that should be installed. The list in 5.6.1 CYGWIN BUNDLE might be redundant (i think that some of these modules have made it into perl source tree; also, i've found that Digest:MD5 is not only installed, but also used by perl when it runs, making it impossible to upgrade it with CPAN). |
From: Charles W. <cwi...@us...> - 2011-03-07 14:59:27
|
On 3/7/2011 12:27 AM, LRN wrote: > The next episode of everyone's favourite "Shaping up msys-perl" show! > > After i concluded that perl is as good as i am ever going to make it, > i've decided to work a bit more on the packaging - so that it can be > shipped in its current state. Until now i've been promptly ignoring > CPAN, but that had to change, since both Msys and msysgit require some > extra perl modules that are not part of the default installation. > And first thing i've tried to install was Compression::Zlib. > Alas, CPAN refused to install it, pointing out that some testcases have > failed. I've looked them up. The first one to fail is zlib-generic.pl, > line 93: > ok $x->close ; > > Banged my head against that one for some time. Eventually i've > discovered that fclose() in msys returns 0 (which is good) with errno == > EBADF (which is bad) almost every time. This sounds like an MSYS bug. E.g. fclose should return EOF on error, and only then set errno: http://pubs.opengroup.org/onlinepubs/009695399/functions/fclose.html "Upon successful completion, fclose() shall return 0; otherwise, it shall return EOF and set errno to indicate the error." e.g. it should only set errno in the "otherwise" case -- e.g. unsuccessful completion. Instead of trying to work around the bug in perl, I'd suggest figuring out why msys is doing the wrong thing, and fix that. Do you have a simple test case (C, not perl?) that demonstrates the erroneous behavior? > Also, i think i'd need an updated list of CPAN modules that should be > installed. The list in 5.6.1 CYGWIN BUNDLE might be redundant I'd go with the list of bundled packages in the cygwin 5.8.8 releases: exts="Win32API-File-0.1001 \ Pod-Escapes-1.04 Pod-Simple-3.05 Test-Pod-1.26 Devel-Symdump-2.07 Pod-Coverage-0.18 Test-Pod-Coverage-1.08 \ IO-Compress-Base-2.005 \ Compress-Raw-Zlib-2.005 IO-Compress-Zlib-2.005 Compress-Zlib-2.005 \ Compress-Raw-Bzip2-2.005 IO-Compress-Bzip2-2.005 Compress-Bzip2-2.09 \ IO-Zlib-1.05 \ IO-String-1.08 \ MD5-2.03 \ Archive-Tar-1.32 Archive-Zip-1.20 \ Math-BigInt-FastCalc-0.15 \ Term-ReadLine-Perl-1.0302 Term-ReadLine-Gnu-1.16 TermReadKey-2.30 \ XML-NamespaceSupport-1.09 XML-SAX-0.15 XML-LibXML-Common-0.13 XML-LibXML-1.63 \ XML-Parser-2.34 \ Proc-ProcessTable-0.41 \ File-Temp-0.18 YAML-0.62 \ Config-Tiny-2.10 File-Copy-Recursive-0.33 IPC-Run3-0.037 Probe-Perl-0.01 \ Tee-0.13 IO-CaptureOutput-1.03 File-pushd-0.99 File-HomeDir-0.65 \ Digest-SHA-5.45 \ Module-Signature-0.55 \ URI-1.35 HTML-Tagset-3.10 HTML-Parser-3.56 libwww-perl-5.805 \ CPAN-1.9102 \ Test-Reporter-1.27 CPAN-Reporter-0.44 \ Net-Telnet-3.03 \ Module-ScanDeps-0.75 PAR-Dist-0.23 \ ExtUtils-CBuilder-0.19 ExtUtils-ParseXS-2.18 Regexp-Common-2.120 \ version-0.7203 podlators-2.0.5 Pod-Readme-0.09 Module-Build-0.2808 \ B-Generate-1.09 PadWalker-1.5 Alias-2.32" Plus, handled as a special case, perl-${ver}-ext-Win32CORE.tar.bz2. In fact, I'd take a very close look at the contents and build script included in cygwin's perl-5.8.8-4-src.tar.bz2 package: http://mirrors.kernel.org/sourceware/cygwin/release-legacy/perl/ Note that the cygwin-perl build script automatically builds and installs [1] each of the exts. The cygwin package ALSO includes a number of cygwin-specific patches for both the perl core and some of these exts, that with suitable munging, might also be useful on MSYS. (FYI, I would probably hesitate to "update" these external packages to their latest current release, and would lean towards using, exactly, these older versions. (A) because in some cases they have accompanying patches for cygwin, and (B) it's possible newer versions require newer perl...) [1] using install_vendor. Because modules installed into site_perl (as opposed to vendor_perl) have higher precedence, end users will be able to update to newer versions if they so desire using CPAN. This is a worry on cygwin, given that setup.exe-installed stuff is usually writable only by Administrator. MSys-perl probably doesn't need to worry about this (as we don't really have much in the way of access control on installed files), so we COULD install bundled extensions into site_perl instead...but then, mingw-get might become confused if the installed fileset doesn't match the manifest when it comes time to update. So, I'd stick with install_vendor. -- Chuck |
From: LRN <lr...@gm...> - 2011-03-07 16:02:02
|
On 07.03.2011 17:59, Charles Wilson wrote: > On 3/7/2011 12:27 AM, LRN wrote: >> The next episode of everyone's favourite "Shaping up msys-perl" show! >> >> After i concluded that perl is as good as i am ever going to make it, >> i've decided to work a bit more on the packaging - so that it can be >> shipped in its current state. Until now i've been promptly ignoring >> CPAN, but that had to change, since both Msys and msysgit require some >> extra perl modules that are not part of the default installation. >> And first thing i've tried to install was Compression::Zlib. >> Alas, CPAN refused to install it, pointing out that some testcases have >> failed. I've looked them up. The first one to fail is zlib-generic.pl, >> line 93: >> ok $x->close ; >> >> Banged my head against that one for some time. Eventually i've >> discovered that fclose() in msys returns 0 (which is good) with errno == >> EBADF (which is bad) almost every time. > This sounds like an MSYS bug. E.g. fclose should return EOF on error, > and only then set errno: > > http://pubs.opengroup.org/onlinepubs/009695399/functions/fclose.html > "Upon successful completion, fclose() shall return 0; otherwise, it > shall return EOF and set errno to indicate the error." > > e.g. it should only set errno in the "otherwise" case -- e.g. > unsuccessful completion. > > Instead of trying to work around the bug in perl, I'd suggest figuring > out why msys is doing the wrong thing, and fix that. This is fclose from msysCORE: int _DEFUN (fclose, (fp), register FILE * fp) { int r; if (fp == NULL) return (0); /* on NULL */ CHECK_INIT (fp); if (fp->_flags == 0) /* not open! */ return (0); r = fp->_flags & __SWR ? fflush (fp) : 0; if (fp->_close != NULL && (*fp->_close) (fp->_cookie) < 0) r = EOF; if (fp->_flags & __SMBF) _free_r (fp->_data, (char *) fp->_bf._base); if (HASUB (fp)) FREEUB (fp); if (HASLB (fp)) FREELB (fp); fp->_flags = 0; /* release this FILE for reuse */ return (r); } AFAICS, it does not set errno by itself, msvcrt's close() does that. > Do you have a > simple test case (C, not perl?) that demonstrates the erroneous behavior? This is the testcase: #include <stdio.h> #include <errno.h> int main (int argc, char **argv) { FILE *f; int n; printf ("errno == %d\n", errno); f = fopen ("tmpfile.txt", "wb"); printf ("Opened %p, errno == %d\n", f, errno); n = fprintf (f, "Hello World!\n"); printf ("Wrote %d bytes into %p, errno == %d\n", n, f, errno); n = fclose (f); printf ("Closed %p, got %d, errno == %d\n", f, n, errno); return 0; } To compile: gcc -Wall main.c -o msysfclose.exe I get: $ msysfclose.exe errno == 0 Opened 0xa01079c, errno == 0 Wrote 13 bytes into 0xa01079c, errno == 0 Closed 0xa01079c, got 0, errno == 9 When compiled with mingw i get correct results: $ mingwfclose.exe errno == 0 Opened 75A32960, errno == 0 Wrote 13 bytes into 75A32960, errno == 0 Closed 75A32960, got 0, errno == 0 |
From: LRN <lr...@gm...> - 2011-03-07 17:56:33
|
On 07.03.2011 19:01, LRN wrote: > On 07.03.2011 17:59, Charles Wilson wrote: >> On 3/7/2011 12:27 AM, LRN wrote: >>> The next episode of everyone's favourite "Shaping up msys-perl" show! >>> >>> After i concluded that perl is as good as i am ever going to make it, >>> i've decided to work a bit more on the packaging - so that it can be >>> shipped in its current state. Until now i've been promptly ignoring >>> CPAN, but that had to change, since both Msys and msysgit require some >>> extra perl modules that are not part of the default installation. >>> And first thing i've tried to install was Compression::Zlib. >>> Alas, CPAN refused to install it, pointing out that some testcases have >>> failed. I've looked them up. The first one to fail is zlib-generic.pl, >>> line 93: >>> ok $x->close ; >>> >>> Banged my head against that one for some time. Eventually i've >>> discovered that fclose() in msys returns 0 (which is good) with >>> errno == >>> EBADF (which is bad) almost every time. >> This sounds like an MSYS bug. E.g. fclose should return EOF on error, >> and only then set errno: >> >> http://pubs.opengroup.org/onlinepubs/009695399/functions/fclose.html >> "Upon successful completion, fclose() shall return 0; otherwise, it >> shall return EOF and set errno to indicate the error." >> >> e.g. it should only set errno in the "otherwise" case -- e.g. >> unsuccessful completion. >> >> Instead of trying to work around the bug in perl, I'd suggest figuring >> out why msys is doing the wrong thing, and fix that. > [snip] > I've modified fclose.c, here's the part that differs: if (fp->_close != NULL) { int msvcrt_r; printf ("fclose() -> _close:%p (%p), errno == %d\n", fp->_close, fp->_cookie, errno); msvcrt_r = (*fp->_close) (fp->_cookie); printf ("fclose() <- _close:%p (%p) returned %d, errno == %d\n", fp->_close, fp->_cookie, msvcrt_r, errno); if (msvcrt_r < 0) { printf ("fclose() will return %d\n", EOF); r = EOF; } } I've also reduced the testcase a bit, removing all statements between fopen() and fclose(). With a debug version of msys and with the testcase built against msys running under gdb i get this: fclose() -> _close:0x608916a6 (0xa011e64), errno == 0 warning: MsYs00000058 10 4948240 [main] msysfclose2 6336 fhandler_console::write: 54 = write_console (,..54) warning: MsYs0000004A 80 4950019 [main] msysfclose2 6336 _write: 54 = write (1, 0xA011A50, 54) warning: MsYs00000036 10 4951529 [main] msysfclose2 6336 _close: close (3) warning: MsYs00000047 10 4952787 [main] msysfclose2 6336 fhandler_base::close: handle 0x114 warning: MsYs00000083 10 4954233 [main] msysfclose2 6336 seterrno_from_win_error: /f/src/msysCORE-1.0.16-1/source/winsup/cygwin/syscalls.cc:970 errno 6 warning: MsYs00000058 10 4956782 [main] msysfclose2 6336 geterrno_from_win_error: windows error 6 == errno 9 warning: MsYs0000003A 10 4958722 [main] msysfclose2 6336 _close: 0 = close (3) warning: MsYs00000045 80 4962820 [main] msysfclose2 6336 _write: write (1, 0xA011A50, 65) warning: MsYs00000049 40 4964247 [main] msysfclose2 6336 fhandler_console::write: A011A50, 65 warning: MsYs00000052 40 4965629 [main] msysfclose2 6336 fhandler_console::write: at 102(f) state is 1 fclose() <- _close:0x608916a6 (0xa011e64) returned 0, errno == 9 If msys to be believed, the errno comes from a failing FlushFileBuffers() call in fsync() in syscalls.cc |
From: LRN <lr...@gm...> - 2011-03-07 19:56:45
|
On 07.03.2011 20:56, LRN wrote: > On 07.03.2011 19:01, LRN wrote: >> On 07.03.2011 17:59, Charles Wilson wrote: >>> On 3/7/2011 12:27 AM, LRN wrote: >>>> The next episode of everyone's favourite "Shaping up msys-perl" show! >>>> >>>> After i concluded that perl is as good as i am ever going to make it, >>>> i've decided to work a bit more on the packaging - so that it can be >>>> shipped in its current state. Until now i've been promptly ignoring >>>> CPAN, but that had to change, since both Msys and msysgit require some >>>> extra perl modules that are not part of the default installation. >>>> And first thing i've tried to install was Compression::Zlib. >>>> Alas, CPAN refused to install it, pointing out that some testcases >>>> have >>>> failed. I've looked them up. The first one to fail is zlib-generic.pl, >>>> line 93: >>>> ok $x->close ; >>>> >>>> Banged my head against that one for some time. Eventually i've >>>> discovered that fclose() in msys returns 0 (which is good) with >>>> errno == >>>> EBADF (which is bad) almost every time. >>> This sounds like an MSYS bug. E.g. fclose should return EOF on error, >>> and only then set errno: >>> >>> http://pubs.opengroup.org/onlinepubs/009695399/functions/fclose.html >>> "Upon successful completion, fclose() shall return 0; otherwise, it >>> shall return EOF and set errno to indicate the error." >>> >>> e.g. it should only set errno in the "otherwise" case -- e.g. >>> unsuccessful completion. >>> >>> Instead of trying to work around the bug in perl, I'd suggest figuring >>> out why msys is doing the wrong thing, and fix that. >> [snip] >> > [snip] > > If msys to be believed, the errno comes from a failing > FlushFileBuffers() call in fsync() in syscalls.cc This is _close() from syscalls.cc: extern "C" int _close (int fd) { int res; sigframe thisframe (mainthread); syscall_printf ("close (%d)", fd); MALLOC_CHECK; if (cygheap->fdtab.not_open (fd)) { debug_printf ("handle %d not open", fd); set_errno (EBADF); res = -1; } else { SetResourceLock (LOCK_FD_LIST,WRITE_LOCK|READ_LOCK," close"); res = cygheap->fdtab[fd]->close (); fsync(fd); cygheap->fdtab.release (fd); ReleaseResourceLock (LOCK_FD_LIST,WRITE_LOCK|READ_LOCK," close"); } syscall_printf ("%d = close (%d)", res, fd); MALLOC_CHECK; return res; } After tracing it in gdb i've discovered that this line: res = cygheap->fdtab[fd]->close (); Ultimately calls CloseHandle(h). After that anything fsync(fd) does with the same value of (h) leads to an error. The extra fsync(fd); appears somewhere between 1.0.10 and 1.0.11 Positioning it before `res = cygheap->fdtab[fd]->close ();' seems to solve the bug (the testcase reports errno == 0 after fclose()). |
From: Keith M. <kei...@us...> - 2011-03-07 20:28:37
|
On 07/03/11 19:56, LRN wrote: > The extra fsync(fd); appears somewhere between 1.0.10 and 1.0.11 > Positioning it before `res = cygheap->fdtab[fd]->close ();' seems to > solve the bug (the testcase reports errno == 0 after fclose()). But errno isn't required to be zero after a successful fclose(); the only bug I see here is in your flawed expectations -- you expect a particular outcome from undefined behaviour! -- Regards, Keith. |
From: Earnie <ea...@us...> - 2011-03-07 20:41:42
|
Keith Marshall wrote: > On 07/03/11 19:56, LRN wrote: >> The extra fsync(fd); appears somewhere between 1.0.10 and 1.0.11 >> Positioning it before `res = cygheap->fdtab[fd]->close ();' seems to >> solve the bug (the testcase reports errno == 0 after fclose()). > > But errno isn't required to be zero after a successful fclose(); the > only bug I see here is in your flawed expectations -- you expect a > particular outcome from undefined behaviour! > But you may set errno to zero before calling fclose and on successful fclose the variable should remain zero unless some other function was called and set it. The setting of errno is based always on error and never on success by the standard libraries regardless of the standard function being called. -- Earnie -- http://www.for-my-kids.com |
From: Keith M. <kei...@us...> - 2011-03-07 21:12:16
|
On 07/03/11 20:41, Earnie wrote: > But you may set errno to zero before calling fclose and on > successful fclose the variable should remain zero unless some other > function was called and set it. You may, but you have no guarantee that it will remain so. > The setting of errno is based always on error and never on success by > the standard libraries regardless of the standard function being > called. Exactly. The system function you called may have called some other function internally. That function may have set errno; the one you called may have handled the exception, such that it can then return successfully, and the value of errno is then undefined. The only circumstance under which you may legitimately test errno, and expect a defined value, is *after* you have tested the return value from a function you've called, *and* verified an *unsuccessful* outcome. -- Regards, Keith. |
From: Charles W. <cwi...@us...> - 2011-03-07 20:51:44
|
On 3/7/2011 3:28 PM, Keith Marshall wrote: > On 07/03/11 19:56, LRN wrote: >> The extra fsync(fd); appears somewhere between 1.0.10 and 1.0.11 >> Positioning it before `res = cygheap->fdtab[fd]->close ();' seems to >> solve the bug (the testcase reports errno == 0 after fclose()). > > But errno isn't required to be zero after a successful fclose(); the > only bug I see here is in your flawed expectations -- you expect a > particular outcome from undefined behaviour! I disagree. I realize this is arguing from silence, but the spec does NOT say that errno can be set when success -- it only says that errno SHALL be set on failure. By implication, this means it should not be *explicitly* set on success. It may (or may not) be explicitly cleared by the successful call to fclose, depending on whether the runtime wants to allow failure codes arising from earlier failed syscalls to persist or not -- but that's not what's going on in this test case. Remember that we're talking about MSys here, not msvcrt directly. MSys, like cygwin, should give POSIX behavior whenever possible. cygwin has recently (last several years) adopted the view that, when the POSIX specification is unclear, and there is no overriding reason (*) to choose otherwise, behavior should match GNU/Linux. On Fedora 14 (2.6.35.10-74.fc14.x86_64) with glibc-2.13-1.x86_64 and gcc-4.5.1-4.fc14.x86_64, I get this behavior: errno == 0 Opened 0xb8f010, errno == 0 Wrote 13 bytes into 0xb8f010, errno == 0 Closed 0xb8f010, got 0, errno == 0 Now, I haven't had a chance to look into the Msys guts yet, but if it really is as simple as moving an fsync() -- and that change has no other ill effects -- then I don't see why we wouldn't do that. This is, of course, a *separate* question as to why perl is checking errno after a "successful" call to fclose in the first place. (*) such as "it would really complicate the internal cygwin (msys) code, for only minimal gain" -- Chuck |
From: Keith M. <kei...@us...> - 2011-03-07 21:45:27
|
On 07/03/11 20:51, Charles Wilson wrote: > On 3/7/2011 3:28 PM, Keith Marshall wrote: >> On 07/03/11 19:56, LRN wrote: >>> The extra fsync(fd); appears somewhere between 1.0.10 and 1.0.11 >>> Positioning it before `res = cygheap->fdtab[fd]->close ();' seems >>> to solve the bug (the testcase reports errno == 0 after >>> fclose()). >> >> But errno isn't required to be zero after a successful fclose(); >> the only bug I see here is in your flawed expectations -- you >> expect a particular outcome from undefined behaviour! > > I disagree. Then we must agree to disagree; however, I don't see disagreement in your follow up... > I realize this is arguing from silence, but the spec does NOT say > that errno can be set when success -- it only says that errno SHALL > be set on failure. By implication, this means it should not be > *explicitly* set on success. Exactly so; it *may* be set to anything at all, or untouched. > It may (or may not) be explicitly cleared by the successful call to > fclose, Again, exactly so; it *may* be cleared, but it isn't *required* that it be; thus, it *is* a bug to expect that it is. > Remember that we're talking about MSys here, not msvcrt directly. > MSys, like cygwin, should give POSIX behavior whenever possible. > cygwin has recently (last several years) adopted the view that, when > the POSIX specification is unclear, and there is no overriding > reason (*) to choose otherwise, behavior should match GNU/Linux. > > On Fedora 14 (2.6.35.10-74.fc14.x86_64) with glibc-2.13-1.x86_64 and > gcc-4.5.1-4.fc14.x86_64, I get this behavior: Where the standard doesn't require any particular behaviour, what Fedora does is irrelevant. > errno == 0 Opened 0xb8f010, errno == 0 Wrote 13 bytes into 0xb8f010, > errno == 0 Closed 0xb8f010, got 0, errno == 0 > > > Now, I haven't had a chance to look into the Msys guts yet, but if > it really is as simple as moving an fsync() -- and that change has no > other ill effects -- then I don't see why we wouldn't do that. That is entirely down to you and Cesar to choose; you *may* do so, but you aren't *required* to. > This is, of course, a *separate* question as to why perl is checking > errno after a "successful" call to fclose in the first place. Exactly my point; after a successful call, the standard doesn't require that errno be set to *any* particular value; it *is* a bug (in perl) to expect that it is, and that this bug may not be caught on some systems doesn't make it any less a bug. -- Regards, Keith. |
From: Paul L. <sa2...@cy...> - 2011-03-08 09:51:42
|
> I disagree. I realize this is arguing from silence, but the spec does > NOT say that errno can be set when success -- it only says that errno > SHALL be set on failure. By implication, this means it should not be > *explicitly* set on success. It may (or may not) be explicitly cleared > by the successful call to fclose, depending on whether the runtime wants > to allow failure codes arising from earlier failed syscalls to persist > or not -- but that's not what's going on in this test case. It does actually say that errno can be set on success - see below. As an aside, the IEEE Std 1003.1 specification page for fclose (linked somewhere earlier in this thread) explicitly states that it defers to the ISO C standard, and that any conflict is unintentional (however, that page doesn't address this issue anyway). The actual ISO C spec says, on page 186, note 3: > The value of errno is zero at program startup, but is never set to zero by any library > function. The value of errno may be set to nonzero by a library function call > whether or not there is an error, provided the use of errno is not documented in the > description of the function in this International Standard. The 'fclose' description on page 270 says nothing further. If perl/whatever is really checking for a non-zero errno when fclose returns zero (I've lost track), then it's just plain wrong. My recollection is that libc info pages on Linux are also pretty explicit about this, but I don't a system on at the moment. |
From: Charles W. <cwi...@us...> - 2011-03-08 16:07:44
|
On 3/8/2011 4:51 AM, Paul Leder wrote: > It does actually say that errno can be set on success - see below. > ...The actual ISO C spec says, on page 186, note 3: > >> The value of errno is zero at program startup, but is never set to zero by any library >> function. The value of errno may be set to nonzero by a library function call >> whether or not there is an error, provided the use of errno is not documented in the >> description of the function in this International Standard. Well, that settles it. The primary bug is perl, checking something that ought not be checked. If we want to change MSys to act more like GNU/Linux/glibc in the fclose() case, we can, but it's purely gravy. So, to LRN: I'd modify perl first, and don't worry about when/if MSys behavior may change. To Cesar: it's up to you; Keith's argument about the Msys::fclose() coder calling msvcrt::_fsync() after msvcrt::_fclose() is a good one. If *I* had made the same choice, then I would have /wanted/ to explicitly set errno to zero after the "failed" call to _fsync -- but THAT is disallowed by the ANSI C spec, apparently. So...the current implementation of Msys::fclose is ok, if a little odd. I'm going to check modern cygwin::fclose and see if they've changed or kept this odd behavior... -- Chuck |
From: LRN <lr...@gm...> - 2011-03-07 21:44:36
|
On 07.03.2011 23:28, Keith Marshall wrote: > On 07/03/11 19:56, LRN wrote: >> The extra fsync(fd); appears somewhere between 1.0.10 and 1.0.11 >> Positioning it before `res = cygheap->fdtab[fd]->close ();' seems to >> solve the bug (the testcase reports errno == 0 after fclose()). > But errno isn't required to be zero after a successful fclose(); the > only bug I see here is in your flawed expectations -- you expect a > particular outcome from undefined behaviour! > s/your/perl's or Compress::Zlib's/ s/you/perl or Compress::Zlib/ There IS an ACTUAL bug in there - an attempt to fsync() a closed file. I expect that it will be fixed, which will automagically fix errno to remain 0 (or whatever it was before the call). Thus, the point is largely moot. In the worst case i still can fix perl itself. Everything else is just bikeshedding. By the way, i am still unsure of WHY perl freaks out - i mean, fclose() DOES return 0, and as you've said, perl has no reason to check errno - and in the function that i've seen it indeed doesn't; if it does, that happens somewhere between perl's *close() and zlib test calling $x->close(), because according to perldoc the `ok' function that is used to check the teststep only checks the return value. |
From: LRN <lr...@gm...> - 2011-03-08 08:54:45
|
On 08.03.2011 0:43, LRN wrote: > On 07.03.2011 23:28, Keith Marshall wrote: >> On 07/03/11 19:56, LRN wrote: >>> The extra fsync(fd); appears somewhere between 1.0.10 and 1.0.11 >>> Positioning it before `res = cygheap->fdtab[fd]->close ();' seems to >>> solve the bug (the testcase reports errno == 0 after fclose()). >> But errno isn't required to be zero after a successful fclose(); the >> only bug I see here is in your flawed expectations -- you expect a >> particular outcome from undefined behaviour! >> > s/your/perl's or Compress::Zlib's/ > s/you/perl or Compress::Zlib/ > > There IS an ACTUAL bug in there - an attempt to fsync() a closed file. > I expect that it will be fixed, which will automagically fix errno to > remain 0 (or whatever it was before the call). Thus, the point is > largely moot. In the worst case i still can fix perl itself. > Everything else is just bikeshedding. A quick update: After testing the new msys-1.0.dll i've discovered that it affects some applications in a way i do not understand. The result is that perl configure script freezes at random points. To investigate that i've compiled 3 fresh version of msysCORE - the "stock" one (no modifications), the "moved" one (fsync() call is moved before close()) and the "commented" one (fsync() call is simply commented out). Tried to configure perl with each one of them. Only the "moved" one makes it freeze. So i am led to think that the best (most compatible, less disturbing) way to fix this is to simply comment-out fsync() rather than move it. Because at the moment (in the "stock" version) fsync's only contribution to fclose() is updating errno to 9, FlushFileBuffers() has no effect. Moving it before close() changes the situation - it actually flushes before closing. My guess is that this flush is not something that applications (or maybe the other parts of msys - i've no idea) expect, and that breaks things. Commenting out fsync() call will remove the unneeded errno (fixing the original symptom for me) without altering the behaviour. |
From: Albrecht S. <vms...@go...> - 2011-03-08 10:25:44
|
Disclaimer: I don't know the MSYS (code) internals, but I followed the discussion with interest. As has been clearly said, checking errno after a successful fclose() call is wrong, and should not be done. I'm trying to argue just from a logical programmer's view... On 08.03.2011 09:54, LRN wrote: >> There IS an ACTUAL bug in there - an attempt to fsync() a closed file. Then this one should be fixed independently of perl's suspected bug. Really. >> I expect that it will be fixed, which will automagically fix errno to >> remain 0 (or whatever it was before the call). Thus, the point is >> largely moot. In the worst case i still can fix perl itself. >> Everything else is just bikeshedding. IMHO the point isn't moot. In the contrary: as a developer I'd be glad to have such a reproducible test case. With this, the first step should be to check what's going on in perl and fix this. Nobody guarantees that there can't be another case where fclose() *might* set errno to *any* other value in the future (this has been discussed here). > A quick update: After testing the new msys-1.0.dll i've discovered that > it affects some applications in a way i do not understand. The result is > that perl configure script freezes at random points. > To investigate that i've compiled 3 fresh version of msysCORE - the > "stock" one (no modifications), the "moved" one (fsync() call is moved > before close()) and the "commented" one (fsync() call is simply > commented out). Tried to configure perl with each one of them. Only the > "moved" one makes it freeze. > So i am led to think that the best (most compatible, less disturbing) > way to fix this is to simply comment-out fsync() rather than move it. Wrong conclusion. The best solution would be the *correct* way to fix it, not the most compatible with the previous, potenially wrong behavior. > Because at the moment (in the "stock" version) fsync's only contribution > to fclose() is updating errno to 9, FlushFileBuffers() has no effect. That's probably true, but... > Moving it before close() changes the situation - it actually flushes > before closing. My guess is that this flush is not something that > applications (or maybe the other parts of msys - i've no idea) expect, As a programmer I *would* *strongly* expect that file buffers would be flushed when I call fclose(). Thus, if this hasn't been done (or will be done in any other part of fclose), then it should be done and not commented out. > and that breaks things. Commenting out fsync() call will remove the > unneeded errno (fixing the original symptom for me) without altering the > behaviour. ... and thus keeping a potential bug (not to flush buffers). Fixing symptoms alone is not a correct solution. As a user of MSYS, I'd appreciate if someone could do the right thing and not fix symptoms only. I apologize that I can't contribute code or patches, but I hope that a user's (and developer's) view helps a little bit... Thanks to all that contribute to MinGW/MSYS development, BTW ! Albrecht |
From: LRN <lr...@gm...> - 2011-03-08 11:39:23
|
On 08.03.2011 13:25, Albrecht Schlosser wrote: > Disclaimer: I don't know the MSYS (code) internals, but I followed the > discussion with interest. As has been clearly said, checking errno after > a successful fclose() call is wrong, and should not be done. > > I'm trying to argue just from a logical programmer's view... > > On 08.03.2011 09:54, LRN wrote: > >>> There IS an ACTUAL bug in there - an attempt to fsync() a closed file. > Then this one should be fixed independently of perl's suspected bug. > Really. Agreed. > >>> I expect that it will be fixed, which will automagically fix errno to >>> remain 0 (or whatever it was before the call). Thus, the point is >>> largely moot. In the worst case i still can fix perl itself. >>> Everything else is just bikeshedding. > IMHO the point isn't moot. In the contrary: as a developer I'd be glad > to have such a reproducible test case. With this, the first step should > be to check what's going on in perl and fix this. Nobody guarantees that > there can't be another case where fclose() *might* set errno to *any* > other value in the future (this has been discussed here). I'm pressing this for one simple reason: this close<->fsync thing in msys i KNOW. I've FOUND it. I've TESTED it. I've found a way to fix it. Fixing it doesn't change its behaviour _by design_, it changes only its undefined behaviour (see below). "What's going on in perl" is anybody's guess. I hate perl as a language and i hate perl interpreter as a C program, and since my aim is to bring some particular pieces of software into shape, i would rather just fix it the quickest way possible (as long as it does not upset the balance of things). > >> A quick update: After testing the new msys-1.0.dll i've discovered that >> it affects some applications in a way i do not understand. The result is >> that perl configure script freezes at random points. >> To investigate that i've compiled 3 fresh version of msysCORE - the >> "stock" one (no modifications), the "moved" one (fsync() call is moved >> before close()) and the "commented" one (fsync() call is simply >> commented out). Tried to configure perl with each one of them. Only the >> "moved" one makes it freeze. >> So i am led to think that the best (most compatible, less disturbing) >> way to fix this is to simply comment-out fsync() rather than move it. > Wrong conclusion. The best solution would be the *correct* way to fix > it, not the most compatible with the previous, potenially wrong > behavior. Can you REALLY claim that not flushing a WinAPI HANDLE explicitly before closing it is "potentially wrong" - considering the fact that *this is how Msys worked for _years_*, and considering the fact that this evidently (if my personal experience counts for evidence) breaks some other applications. > >> Because at the moment (in the "stock" version) fsync's only contribution >> to fclose() is updating errno to 9, FlushFileBuffers() has no effect. > That's probably true, but... > >> Moving it before close() changes the situation - it actually flushes >> before closing. My guess is that this flush is not something that >> applications (or maybe the other parts of msys - i've no idea) expect, > As a programmer I *would* *strongly* expect that file buffers would be > flushed when I call fclose(). Thus, if this hasn't been done (or will > be done in any other part of fclose), then it should be done and not > commented out. From MSDN: "When an application is finished using the object handle returned by CreateFile, use the *CloseHandle* function to close the handle. This not only frees up system resources, but *can have wider influence on things like sharing the file or device and committing data to disk*." That is, it might just do the right thing. Ensuring that by calling FlushFileBuffers() might be redundant. Or it might not be. I do not know that (unless someone digs up WinAPI sources) and i do not care to find it out. As long as it Just Works (TM). > >> and that breaks things. Commenting out fsync() call will remove the >> unneeded errno (fixing the original symptom for me) without altering the >> behaviour. > ... and thus keeping a potential bug (not to flush buffers). Fixing > symptoms alone is not a correct solution. > > As a user of MSYS, I'd appreciate if someone could do the right thing > and not fix symptoms only. I apologize that I can't contribute code or > patches, but I hope that a user's (and developer's) view helps a > little bit... > > Thanks to all that contribute to MinGW/MSYS development, BTW ! > > Albrecht > I do not feel myself responsible for correctness of Msys' implementation, nor do i have any interest in debugging this any further than i have to. While i do not propose to just hackfix-and-forget it, i must point out that finding someone who WOULD be willing to solve this problem the "right" way (if we assume that my way is not the "right" one) might take some time. Like, months. Or years. All the while the problem will remain, in its worst (most disruptive to applications) form. Making a quick fix removes the symptoms, allowing Compression::Zlib to install without troubles, which would allow me to finally push msys-perl-5.8.8 (in foreseeable future). And it won't make things any worse (unless there's a completely weird piece of code somewhere that EXPECTS fclose() to return 0 and set errno to EBADF - yeah, that one would fail). And future generations of developers will be free revert this change (mark it with a comment to make it easier to spot in the code, or file a bug on the tracker to keep track of it; i can even help filing the bug or coming up with a nice witty in-code comment), recompile msys with it and happily plunge into depths of msys hacking - as long as i am not involved in any way. I think the choice is simple. At least, from my point of view (considering the state of all objects involved and the information i possess at the moment). Just a matter of approving this change with msys maintainer(s?). |
From: Keith M. <kei...@us...> - 2011-03-08 12:43:03
|
On 8 March 2011 11:38, LRN <lr...@gm...> wrote: > On 08.03.2011 13:25, Albrecht Schlosser wrote: >> Disclaimer: I don't know the MSYS (code) internals, but I followed the >> discussion with interest. As has been clearly said, checking errno after >> a successful fclose() call is wrong, and should not be done. >> >> I'm trying to argue just from a logical programmer's view... >> >> On 08.03.2011 09:54, LRN wrote: >>>> There IS an ACTUAL bug in there - an attempt to fsync() a closed file. >> >> Then this one should be fixed independently of perl's suspected bug. >> Really. > > Agreed. > >>>> I expect that it will be fixed, which will automagically fix errno to >>>> remain 0 (or whatever it was before the call). Thus, the point is >>>> largely moot. In the worst case i still can fix perl itself. >>>> Everything else is just bikeshedding. >> >> IMHO the point isn't moot. In the contrary: as a developer I'd be glad >> to have such a reproducible test case. With this, the first step should >> be to check what's going on in perl and fix this. Nobody guarantees that >> there can't be another case where fclose() *might* set errno to *any* >> other value in the future (this has been discussed here). > > I'm pressing this for one simple reason: this close<->fsync thing in > msys i KNOW. I've FOUND it. I've TESTED it. I've found a way to fix it. Okay. I'm playing Devil's Advocate here; I will continue to do so, while you persist in flogging this dead horse. I'll ask it again -- more directly this time: why are you so hell bent on "fixing" SOMETHING WHICH ISN'T BROKEN in the first place? Sure, the way MSYS' fclose() is implemented may not be 100% rational, but, in terms of the applicable standards, IT ISN'T BROKEN! Yes, it has a side effect which exposes a REAL bug, in the perl code you are hacking; THAT is the bug you should be tracking down, and fixing. >>> So i am led to think that the best (most compatible, less disturbing) >>> way to fix this is to simply comment-out fsync() rather than move it. >> >> Wrong conclusion. Like Albrecht says -- wrong conclusion; and yet, you still persist in pursuing it. Yes, you have identified something within MSYS which may not be entirely rational, but IT ISN'T A BUG (in the sense that it causes a problem for calling code, if that code is not itself defective). Your "fix" does no more than modify a side effect, replacing one manifestation of undefined behaviour with another. However, undefined behaviour means just that; your "fix" is NO MORE CORRECT than what it replaces, (because there is no "correct" implementation of undefined behaviour). You haven't fixed anything. Sorry to be blunt about it; until you "get this", you are just wasting everybody's time, not least your own [*]. [*] Yes, I'm sure the MSYS Developers will be grateful for the head's up on the odd behaviour, but SOMEBODY (maybe not you) needs to focus on the real bug here, (misuse of errno in perl code), rather than faffing at attacking some unexpected side effect, which is firmly in the realm of undefined behaviour but certainly not illegitimate, just because that side effect happens to expose the real bug. -- Regards, Keith. |
From: LRN <lr...@gm...> - 2011-03-08 13:13:14
|
On 08.03.2011 15:42, Keith Marshall wrote: > On 8 March 2011 11:38, LRN<lr...@gm...> wrote: >> On 08.03.2011 13:25, Albrecht Schlosser wrote: >>> Disclaimer: I don't know the MSYS (code) internals, but I followed the >>> discussion with interest. As has been clearly said, checking errno after >>> a successful fclose() call is wrong, and should not be done. >>> >>> I'm trying to argue just from a logical programmer's view... >>> >>> On 08.03.2011 09:54, LRN wrote: >>>>> There IS an ACTUAL bug in there - an attempt to fsync() a closed file. >>> Then this one should be fixed independently of perl's suspected bug. >>> Really. >> Agreed. >> >>>>> I expect that it will be fixed, which will automagically fix errno to >>>>> remain 0 (or whatever it was before the call). Thus, the point is >>>>> largely moot. In the worst case i still can fix perl itself. >>>>> Everything else is just bikeshedding. >>> IMHO the point isn't moot. In the contrary: as a developer I'd be glad >>> to have such a reproducible test case. With this, the first step should >>> be to check what's going on in perl and fix this. Nobody guarantees that >>> there can't be another case where fclose() *might* set errno to *any* >>> other value in the future (this has been discussed here). >> I'm pressing this for one simple reason: this close<->fsync thing in >> msys i KNOW. I've FOUND it. I've TESTED it. I've found a way to fix it. > Okay. I'm playing Devil's Advocate here; I will continue to do so, > while you persist in flogging this dead horse. > > I'll ask it again -- more directly this time: why are you so hell bent > on "fixing" SOMETHING WHICH ISN'T BROKEN in the first place? Sure, the > way MSYS' fclose() is implemented may not be 100% rational, but, in > terms of the applicable standards, IT ISN'T BROKEN! Yes, it has a side > effect which exposes a REAL bug, in the perl code you are hacking; THAT > is the bug you should be tracking down, and fixing. Why - easier/faster. >>>> So i am led to think that the best (most compatible, less disturbing) >>>> way to fix this is to simply comment-out fsync() rather than move it. >>> Wrong conclusion. > Like Albrecht says -- wrong conclusion; and yet, you still persist in > pursuing it. > > Yes, you have identified something within MSYS which may not be entirely > rational, but IT ISN'T A BUG (in the sense that it causes a problem for > calling code, if that code is not itself defective). Your "fix" does no > more than modify a side effect, replacing one manifestation of undefined > behaviour with another. However, undefined behaviour means just that; > your "fix" is NO MORE CORRECT than what it replaces, (because there is > no "correct" implementation of undefined behaviour). You haven't fixed > anything. > > Sorry to be blunt about it; until you "get this", you are just wasting > everybody's time, not least your own [*]. > > [*] Yes, I'm sure the MSYS Developers will be grateful for the head's up > on the odd behaviour, but SOMEBODY (maybe not you) needs to focus on the > real bug here, (misuse of errno in perl code), rather than faffing at > attacking some unexpected side effect, which is firmly in the realm of > undefined behaviour but certainly not illegitimate, just because that > side effect happens to expose the real bug. > I think this particular discussion is going nowhere. Actually, it's been going nowhere for some time. The problem is known. 2 of 3 possible solutions. Advantages and disadvantages of both (and some anticipated advantages/disadvantages of the third one) are presented. Dixi. /me goes back to trying to install CPAN modules. |
From: Keith M. <kei...@us...> - 2011-03-08 13:49:49
|
On 8 March 2011 13:12, LRN wrote: >> I'll ask it again -- more directly this time: why are you so hell bent >> on "fixing" SOMETHING WHICH ISN'T BROKEN in the first place? Sure, the >> way MSYS' fclose() is implemented may not be 100% rational, but, in >> terms of the applicable standards, IT ISN'T BROKEN! Yes, it has a side >> effect which exposes a REAL bug, in the perl code you are hacking; THAT >> is the bug you should be tracking down, and fixing. > > Why - easier/faster. And ultimately ineffective; all you are doing is masking the real bug, until it rears its ugly head again. > I think this particular discussion is going nowhere. Agreed. > Actually, it's been > going nowhere for some time. The problem is known. 2 of 3 possible > solutions. Advantages and disadvantages of both (and some anticipated > advantages/disadvantages of the third one) are presented. And the major disadvantage of your proposal is that it isn't actually a solution; it's a work around which happens to get you past an immediate problem, by sweeping it under the carpet. Not good. -- Keith. |
From: Charles W. <cwi...@us...> - 2011-03-08 20:39:28
|
On 3/8/2011 8:12 AM, LRN wrote: > On 08.03.2011 15:42, Keith Marshall wrote: >> [*] Yes, I'm sure the MSYS Developers will be grateful for the head's up >> on the odd behaviour, but SOMEBODY (maybe not you) needs to focus on the >> real bug here, (misuse of errno in perl code), rather than faffing at >> attacking some unexpected side effect, which is firmly in the realm of >> undefined behaviour but certainly not illegitimate, just because that >> side effect happens to expose the real bug. >> > I think this particular discussion is going nowhere. Actually, it's been > going nowhere for some time. The problem is known. 2 of 3 possible > solutions. Advantages and disadvantages of both (and some anticipated > advantages/disadvantages of the third one) are presented. Dixi. LRN, I've come to the conclusion that MSys's behavior, while odd, is not actually a bug. Also, I've found that the "offending" call to fysnc() was *added* by MSys. It does not appear in the cygwin version of _close on which msys's was based: Here's the cygwin impl from that time period: extern "C" int _close (int fd) { int res; sigframe thisframe (mainthread); syscall_printf ("close (%d)", fd); MALLOC_CHECK; if (cygheap->fdtab.not_open (fd)) { debug_printf ("handle %d not open", fd); set_errno (EBADF); res = -1; } else { SetResourceLock (LOCK_FD_LIST,WRITE_LOCK|READ_LOCK," close"); res = cygheap->fdtab[fd]->close (); cygheap->fdtab.release (fd); ReleaseResourceLock (LOCK_FD_LIST,WRITE_LOCK|READ_LOCK," close"); } syscall_printf ("%d = close (%d)", res, fd); MALLOC_CHECK; return res; } The *only* difference between msys's version and cygwin's (at the time) is the addition, in msys's implementation, of the call to fsync(). This was added here: 2007.01.12 Earnie Boyd <...> * dir.cc (rmdir): Add usleep kludge for ACCESS_DENIED errors. * fhandler.cc (fhandler_base:open): Ditto. * dtable.cc (dtable::release): Add resource lock/unlock for. * syscalls.cc (_unlink): Rework usleep kludge for speed. (_close): Add fsync(fd) before releasing descriptor. Remove usleep kludge because of speed. (stat_worker): Remove secondary exit path. (_rename): Add usleep kludge. (cygwin, soon after msys's fork, completely reworked how fd's are managed, so we can't look to them for any similar change). Now, Earnie obviously found it necessary to add this, so I would NOT go willy-nilly commenting it out. And, since Earnie explicitly decided to (a) call it after cygheap->fdtab[fd]->close(), and (b) ignore the return value, I assume he had a good reason for doing so. (BTW, you can't rely on CloseHandle "committing data to disk" because that only applies to data that is in the Win32 subsystem's buffers. MSys can possibly maintain its own level of buffering for data bound to the fd, and IT needs to flush that data INTO the Win32 subsystem's handle's buffer, before CloseHandle would be able to do any such thing. Again, Earnie does not add code to MSYS for no reason; I am 99.40% positive that there was an observed bug, which this addition fixed. Removing it breaks something -- dunno what. Earnie? Is there perhaps a ticket for this item?) My bet is that this was added to handle a case where the call to cygheap->fdtab[fd]->close() failed during exit(), and data was lost. By adding fsync, you ensure that the disk actually gets all the data as intended, and then the fd is closed by the windows subsys on _exit(). Now, PERHAPS a workaround in MSYS would be to do this, instead: + if (! cygheap->fdtab.not_open (fd))) fsync(fd); but, as Keith has pointed out, that just papers over a bug in perl, which is checking errno when it shouldn't. And, by papering over the bug, it might show up in other contexts, so we need to fix the perl bug. > /me goes back to trying to install CPAN modules. And that's fine. I'll try to determine how to fix perl-core to ignore this (not really an) error in the runtime. What gets me is this: > The first one to fail is zlib-generic.pl, line 93: > ok $x->close ; Now, the 'ok' function -- when called this way -- fails only if the *return value* of the function is 0 (e.g. failure, in perl speak). So, $x->close IS returning failure, regardless of what the underlying MSYS close() is returning. Now, note that perl-5.8.x and above -- unlike perl-5.6.x -- use the "PerlIO" subsystem, and NOT regular stdio (unless you build with -Uuseperlio or run with PERLIO=:stdio -- but that is NOT recommended). So, underneath the hood, the PerlIO subsystem is implemented using the system (Msys) open and close, NOT fopen and fclose. So...I'm wondering how you ever got IN to fclose in the first place. Did you build with -Uuseperlio? If so, I think that is a mistake... Do you have any kind of backtrace (either in interpreted perl, or in the underlying C impl) that shows the callstack leading up to the call to msys fclose()? -- Chuck |
From: Earnie <ea...@us...> - 2011-03-08 22:21:55
|
Charles Wilson wrote: > > Now, Earnie obviously found it necessary to add this, so I would NOT go > willy-nilly commenting it out. And, since Earnie explicitly decided to > (a) call it after cygheap->fdtab[fd]->close(), and (b) ignore the return > value, I assume he had a good reason for doing so. > And I certainly do not remember why but could have been due to strangeness with the version of Windows I was using at the time which was W95. I also do not know why the call is positioned after the fclose and will leave it to Cesar to make the final call on how to handle it. -- Earnie -- http://www.for-my-kids.com |
From: Cesar S. <ces...@gm...> - 2011-03-09 02:48:37
|
On 3/8/2011 7:21 PM, Earnie wrote: > Charles Wilson wrote: >> >> Now, Earnie obviously found it necessary to add this, so I would NOT go >> willy-nilly commenting it out. And, since Earnie explicitly decided to >> (a) call it after cygheap->fdtab[fd]->close(), and (b) ignore the return >> value, I assume he had a good reason for doing so. >> > > And I certainly do not remember why but could have been due to > strangeness with the version of Windows I was using at the time which > was W95. I also do not know why the call is positioned after the fclose > and will leave it to Cesar to make the final call on how to handle it. > From what I could see, fsync just calls FlushFileBuffers, with no other side effects. Since the Windows handle was just closed, the call doesn't currently seem to have any purpose (besides setting errno, as LRN found out). My inclination is to remove the fsync call (as a clean-up of the MSYS code-base, not as a workaround for a Perl bug). Regards, Cesar |
From: Charles W. <cwi...@us...> - 2011-03-09 15:54:24
|
On 3/8/2011 9:48 PM, Cesar Strauss wrote: > On 3/8/2011 7:21 PM, Earnie wrote: > From what I could see, fsync just calls FlushFileBuffers, with no other > side effects. Since the Windows handle was just closed, the call doesn't > currently seem to have any purpose (besides setting errno, as LRN found > out). > > My inclination is to remove the fsync call (as a clean-up of the MSYS > code-base, not as a workaround for a Perl bug). After looking around in the tracker and mailing lists, the only reference I found to this change: http://sourceforge.net/mailarchive/forum.php?thread_name=E1H5UGL-00050h-7B%40mail.sourceforge.net&forum_name=mingw-cvs was a slightly different proposed patch http://sourceforge.net/tracker/index.php?func=detail&aid=1636199&group_id=2435&atid=302435 from around the same time period, that affected some of the same code, but was not accepted. However, the proposed (and rejected) patch did not include the bit that added fsync(fd) to _close(). My hunch is that the proposed patch triggered Earnie to investigate further, and devise a better patch. But, I can find no evidence that the fsync(fd) was actually necessary -- it smells like a "hmm...these changes might mean the cygheap fd object wrapper gets deleted before the data is flushed...better fix that..." stab in the dark. If you want to keep that "stab in the dark", my (untested) suggestion is this: else { SetResourceLock (LOCK_FD_LIST,WRITE_LOCK|READ_LOCK," close"); res = cygheap->fdtab[fd]->close (); - fsync(fd) + /* If failed, try one more time but flush the handle first */ + if (res != 0) + { + HANDLE h = cygheap->fdtab[fd]->get_handle (); + (void) FlushFileBuffers (h); /* ignore any errors */ + res = cygheap->fdtab[fd]->close (); + } cygheap->fdtab.release (fd); ReleaseResourceLock (LOCK_FD_LIST,WRITE_LOCK|READ_LOCK," close"); } But I think it would actually be safe to remove the fsync(fd) entirely, and not replace it with anything. I'm still going to try to figure out what's going on with perl, though. -- Chuck |
From: Charles W. <cwi...@us...> - 2011-03-28 22:17:14
|
On 3/9/2011 10:54 AM, Charles Wilson wrote: > If you want to keep that "stab in the dark", my (untested) suggestion is > this: > > else > { > SetResourceLock (LOCK_FD_LIST,WRITE_LOCK|READ_LOCK," close"); > res = cygheap->fdtab[fd]->close (); > - fsync(fd) > + /* If failed, try one more time but flush the handle first */ > + if (res != 0) > + { > + HANDLE h = cygheap->fdtab[fd]->get_handle (); > + (void) FlushFileBuffers (h); /* ignore any errors */ > + res = cygheap->fdtab[fd]->close (); > + } > cygheap->fdtab.release (fd); > ReleaseResourceLock (LOCK_FD_LIST,WRITE_LOCK|READ_LOCK," close"); > } > > But I think it would actually be safe to remove the fsync(fd) entirely, > and not replace it with anything. > > I'm still going to try to figure out what's going on with perl, though. Well, I thought it would be enough to simply have perl, in its wrapper around close(), do this: C pseudo code, in perl's PerlSIO_close(?) implementation: rv = close(...); + if (rv == 0 && errno != 0) + errno = 0; And sure enough, that "solves" the problem when perl itself is opening or closing a file. However...it does NOT solve the problem: here's one example: perl pseudocode: handle_some_errors if system ("/bin/prog-returns-nonzero-exitcode --args >/tmp/output-redirect") Now, perl's implementation of system doesn't 'know' that the command has an output redirect. So, it goes ahead and sets up a pipe so that the stdout of the command can be captured, in case the (perl) caller wants to analyze it. strace shows that the LAST thing to happen, in perl's process, AFTER the child processes have all completed and exited, is the pipe that captured the stdout/stderr output is closed...which on MSYS means that errno gets set to EBADF (even though there was nothing bad about the pipe fd we just closed; it's just that pesky fsync() that is erroneously(?) called by msys itself). In the particular failure I am seeing (using msys-1.0.16 stock, plus perl-5.8.8, running the autoconf-2.67 test suite), what happens is Autom4te::FileUtil.pm's error handler is expected to be called -- and it is called -- because the test launches a command it expects to fail. HOWEVER, the expected output is not matched. Here's the FileUtil errhandler code: 246 sub handle_exec_errors ($;$$) 247 { 248 my ($command, $expected, $hint) = @_; ... 259 $command = (split (' ', $command))[0]; 260 if ($!) 261 { 262 fatal "failed to run $command: $!" . $hint; 263 } 264 else 265 { 266 use POSIX qw (WIFEXITED WEXITSTATUS WIFSIGNALED WTERMSIG); 267 268 if (WIFEXITED ($?)) 270 my $status = WEXITSTATUS ($?); 271 # Propagate exit codes. 272 fatal ('', 273 "$command failed with exit status: $status", The test *expects* to end up at 268, and the expected output is therefore autom4te: /bin/m4 failed with exit status: 1 However, because of the close of the stdout pipe perl's system() implementation, Autom4te::FileUtil sees a nonzero errno code (EBADF) and takes the branch at 260, and the actual output is autom4te: failed to run /bin/m4: Bad file number But this is not, in fact true: /bin/m4 did run as expected. The exit code propagated back to perl is in fact '1' as expected...but we don't see that because we check $! first. So, I can go into the implementation of PERL_pp_system() and after each call to close (actually, to the macro PerlLIO_close) add the same check 'if (rv == 0 && errno != 0) { errno = 0; }'. And repeat for all 62 other instances, some of which require code reorg thanks to existing constructs like: while (PerlLIO_close(fd) != 0) { So, obviously, just modifying the PerlIOStdio_close function to check for non-zero errno + successful return value, and reset errno, would be good enough. And, it seems that perl's own test suite is happy with just that one simple fix. But...it appears that external clients, not so much, and I'll need to tackle the other 61 instances. Alternatively: perl currently uses the following macro: #ifdef PERL_IMPLICIT_SYS #define PerlLIO_close(fd) (*PL_LIO->pClose)(PL_LIO, (fd)) #else #define PerlLIO_close(fd) close((fd)) #endif throughout, rather than calling 'close' directly. (PERL_IMPLICIT_SYS is not used on cygwin or msys; it is used on 'native' win32/wince+ithreads, and netware.) Now, I suppose I could do something like this: #ifdef PERL_IMPLICIT_SYS #define PerlLIO_close(fd) (*PL_LIO->pClose)(PL_LIO, (fd)) #else # if defined(__MSYS__) # int msys_close(int); # define PerlLIO_close(fd) msys_close((fd)) # else # define PerlLIO_close(fd) close((fd)) # endif #endif and then, somewhere, do #ifdef __MSYS__ int msys_close(int fd) { int rv = close(fd); if (rv == 0 && errno != 0) errno = 0; return rv; } #endif But (a) I'm not sure about exporting the symbol from libperl so that it can be used by extensions, and (b) let's be honest. For all the talk about how POSIX allows for errno to nonzero when close() returns successfully, EVEN IF you explicitly set it to zero right before calling close() -- it really is QUITE counter-intuitive for a successful close() to go mucking with errno on success. This is a perfect case in point: perl is trying to propagate an error value that might have occurred due to a failed fork/exec, and is getting tripped up because it *successfully* closed the pipe capturing the output of a *successful* fork/exec. Apparently MSYS is the only platform on which perl has been ported that behaves in this way. Even if the current behavior is allowed by POSIX, can we please be like all the other kids on the block, and remove the (destined to fail) call to fsync() in msys's _close()? Pretty please? And then I'll just say "perl-5.8.8 needs msys-1.0.17 for best results"... -- Chuck |