From: SourceForge.net <no...@so...> - 2006-06-13 09:51:41
|
Patches item #1503729, was opened at 2006-06-09 22:09 Message generated for change (Settings changed) made by dkf You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=310894&aid=1503729&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 39. Dynamic Loading Group: None >Status: Pending >Resolution: Invalid Priority: 5 Submitted By: Kenneth Cox (kenstir) >Assigned to: Donal K. Fellows (dkf) Summary: TclpDlopen latent bug now crashes after SunOS linker patch Initial Comment: There is a long-standing bug in tclLoadDl.c which is exacerbated by recent Solaris linker patches. Basically, after a failed dlopen(), you must call dlerror() right away, before any further dynamic linking activity. Otherwise, you risk the dlerror string being corrupted. It doesn't seem to be as simple as dlerror() returning NULL, because that wouldn't cause the crash. SYMPTOM OF CRASH $ ./tclsh % load xxx Segmentation Fault (core dumped) puccini:~/build/rel50/src/vendor/tcl/tcl8.4.2/unix $ pstack core core 'core' of 26650: ./tclsh ff0331b4 strlen (ffbee770, 14, 4, ffbee860, 1, 10) + 1c ff30ec98 Tcl_AppendResult (22c08, ff357dd8, 1, ff357df0, ff3df8f8, 0) + 1c ff3268fc TclpDlopen (22c08, 33458, ffbeea34, ffbeed9c, ff326830, 42048) + cc ff2f50c0 Tcl_FSLoadFile (22c08, 33458, ff34de08, ffbeeb2c, ffbeea3c, ffbeea34) + 54 ff2fb234 Tcl_LoadObjCmd (22c08, 0, 2, 260d4, 0, ff345a9c) + 530 ff2a9868 TclEvalObjvInternal (24748, 2, 0, 0, 0, 1) + 188 ff2d5eb0 TclExecuteByteCode (ff34df24, ff34df2c, 2d014, 0, 260d4, 1) + 688 ff2d54e8 TclCompEvalObj (0, 163, ff345a9c, 2cfa8, 2c990, 22c08) + 184 ff2aa848 Tcl_EvalObjEx (0, 0, 20000, ff345a9c, 22c08, 2c990) + 60 ff2e5010 Tcl_RecordAndEvalObj (20000, 2ca98, 20000, 22c08, 2c990, ff345a9c) + b8 ff2fbdb4 Tcl_Main (1, 22c08, 1082c, ffbef334, 222b0, 2) + 4b0 0001080c main (1, ffbef334, ffbef33c, 20800, 0, 0) + 24 000107c0 _start (0, 0, 0, 0, 0, 0) + f8 EXPECTED BEHAVIOR $ ./tclsh % load xxx couldn't load file "xxx": ld.so.1: tclsh: fatal: xxx: open failed: No such file or directory SYSTEM PATCH INFORMATION Linker patch 109147-40 (latest as of this writing) exhibits the problem. Linker patch 109147-34 does not. I am unsure of other versions. In order to see the problem you have to compile tclLoadDl.c optimized with the Sun compiler. ---------------------------------------------------------------------- >Comment By: Donal K. Fellows (dkf) Date: 2006-06-13 10:51 Message: Logged In: YES user_id=79902 Tcl most certainly isn't waiting a long time between calling dlopen() and dlerror(); it only does a few calls between to perform minor memory management and which are unlikely to cause any OS traps at all (malloc implementations being the way they are). As you note, the problem is the compiler. According to the Sun documentation, the pragma should mean that TclpDlOpen() doesn't get optimized - not a big deal from our perspective and surely not that hard for a compiler to do! - and therefore the bug is definitely compiler-caused. So not our fault! :-) A workaround might be to try to compile that file with gcc by hand...? Messy though. Another possibility might be to put the result of dlerror() into a local variable before passing it to Tcl_AppendResult(); if that stops the compiler from going wrong, please reopen this issue and let me know so that we can add a suitable kludge... ---------------------------------------------------------------------- Comment By: Kenneth Cox (kenstir) Date: 2006-06-12 19:20 Message: Logged In: YES user_id=246646 I take it back. The Tcl code was not the root cause of the crash. Though it is probably bad style to wait a long-ish time after calling dlopen() and before calling dlerror(), it is not an error unless you call some other dl* function in between. I verified with truss that Tcl was not. The real problem appears to be an optimizer bug in the Sun Forte 6 (cc: Sun WorkShop 6 2000/06/19 C 5.1 Patch 109491-02) compiler exacerbated by the linker patch. The linker patch included a patch to a system header file which did this: #pragma unknown_control_flow(dlopen, dlsym, dlclose, dlerror) With this change, the compiler generated different (and apparently bad) assembler code. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=310894&aid=1503729&group_id=10894 |