From: SourceForge.net <no...@so...> - 2011-03-24 18:40:42
|
Bugs item #1961211, was opened at 2008-05-09 13:24 Message generated for change (Comment added) made by andreas_kupries You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1961211&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 40. Dynamic Loading Group: obsolete: 8.5.2 Status: Closed Resolution: Fixed Priority: 8 Private: No Submitted By: Sbastien BARRE (sebbarre) Assigned to: Daniel A. Steffen (das) Summary: TclpDlopen failing on MacOSX 10.4 and later Initial Comment: Hi A change was made to tclLoadDyld.c to use the dlfcn API (dlopen) on MacOSX 10.4 or later instead of the obsolete/deprecated NSModule API. However, the call to dlopen is problematic: dlHandle = dlopen(nativePath, RTLD_NOW | RTLD_LOCAL); RLTD_LOCAL is *not* the default value for dlopen, the default is RTLD_GLOBAL. RLTD_LOCAL prevents Tcl/Tk from being able to load any dynamic library/module which depends (i.e. was linked against) a previously loaded dynamic library/module. Unless I missed the rational for picking RLTD_LOCAL, could Tcl/Tk please use the default RLTD_GLOBAL? >From the man page: http://developer.apple.com/documentation/Darwin/Reference/ManPages/man3/dlopen.3.html RTLD_GLOBAL: Symbols exported from this image (dynamic library or bundle) will be available to any images build with -flat_namespace option to ld(1) or to calls to dlsym() when using a spe-cial specialcial handle. RTLD_LOCAL: Symbols exported from this image (dynamic library or bundle) are generally hidden and only availble to dlsym() when directly using the handle returned by this call to dlopen(). In the second case, a library/module A will load correctly but will hide its symbols. Tcl/Tk will fail to load a library/module B if B depends on symbols in A (i.e. was dynamically linked against A). Example: barre [562] $ /opt/tcltk8.5.0/bin/wish8.5 % load libvtkCommonTCL.dylib % load libvtkFilteringTCL.dylib dlopen(libvtkFilteringTCL.dylib, 6): Symbol not found: __Z14vtkTclInDeleteP10Tcl_Interp Referenced from: /Users/barre/build/VTK-VTK-5-2-tcl85-debug/bin/libvtkFilteringTCL.dylib Expected in: flat namespace the __Z14vtkTclInDeleteP10Tcl_Interp symbol *is* actually in libvtkCommonTCL.dylib. barre [563] $ otool -L libvtkFilteringTCL.dylib libvtkFilteringTCL.dylib: libvtkFilteringTCL.5.2.dylib (compatibility version 0.0.0, current version 0.0.0) libvtkFiltering.5.2.dylib (compatibility version 0.0.0, current version 0.0.0) libvtkCommonTCL.5.2.dylib (compatibility version 0.0.0, current version 0.0.0) libvtkCommon.5.2.dylib (compatibility version 0.0.0, current version 0.0.0) /opt/tcltk8.5.0/lib/libtcl8.5.dylib (compatibility version 8.5.0, current version 8.5.0) [...] Manually changing RLTD_LOCAL to RLTD_GLOBAL, I was able to load both libraries without any problem inside Tcl/Tk 8.5.2. Thank you ---------------------------------------------------------------------- >Comment By: Andreas Kupries (andreas_kupries) Date: 2011-03-24 11:40 Message: The recent change to the core fixing #3216070 reintroduced this bug, by switching back to RTLD_LOCAL. If you are getting tripped, notably on Darwin, i.e. OS X, see below: If your library is linked using -flat_namespace and fails to load with a message like dyld: Symbol not found: ... Referenced from: ... Expected in: flat namespace Trace/BPT trap then this option has to be removed, and a possibly present -undefined suppress|warning as well, to make the library loadable again. This happened for Metakit. ---------------------------------------------------------------------- Comment By: Daniel A. Steffen (das) Date: 2009-04-10 15:32 Message: committed switch to RTLD_GLOBAL to HEAD and core-8-5-branch ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2008-05-16 01:22 Message: Logged In: YES user_id=61031 Originator: NO > but don't call other people's work > "broken" when it's obviously not. I don't consider your work broken, I consider the Mac option -flat_namespace broken when used in combination with Tcl extesions. Tcl expects its extensions to be linked using the command defined in tclConfig.sh (see TCL_SHLIB_LD in that file). Additional flags are not Tcl's reposibility, sorry for that..... So, what you can do is create a shared library containing the XXX_Init function only. All it does is dlopen the remaining of the libraries with the RTLD_GLOBAL flag, and call whatever functions it wants. All other libraries can be legacy libraries, and compiled with or without -flat_namespace whatever you like. Only the library that is loaded by Tcl cannot be compiled with -flat_namespace. This way, you can have your legacy libraries as you like, while the Tcl extension itself, which is only a small wrapper, conforms with the Tcl guidelines. So, yes, Tcl supports legacy libraries just fine, you only have to wrap it up following the Tcl guidelines. If that guidelines conflict with how the legacy libraries are build, then that's a (solvable) problem. Regards, Jan Nijtmans ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-05-15 21:28 Message: Logged In: NO It's funny how those build used to be very much "not broken" before that change occurred in tclLoadDyld.c... If you do not want to support legacy libraries, that's not a problem, you are entitled to, just document it, but don't call other people's work "broken" when it's obviously not. Removing -flat_namespace was just a workaround; if you switch to RLTD_LOCAL on all other Unix platforms, and there is no such workaround available, you will probably hear people complaining about their "broken" build again, rightfully so. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-05-15 19:27 Message: Logged In: NO Tcl does not need to support every combination of broken build ever conceived of. Truly. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-05-15 08:24 Message: Logged In: NO And maybe they can not rebuild their extension. Maybe it's old. Maybe they don't have the sources. etc. With or without -flat_namespace, you could use RLTD_GLOBAL so that any extension is supported. ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2008-05-15 08:17 Message: Logged In: YES user_id=61031 Originator: NO Wow, I didn't expect your problem to be fixed that easy. Shouldn't we document then that Tcl extensions cannot be compiled with -flat_namespace on the Mac? So, I change my recommendation to just do nothing (except eventually modify documentation), and close this Issue. At least on Mac we don't have the problem about possible symbol conflicts. On other platforms we still have it (see tktoolkit-Bugs-1958367), but that is a separate issue. Regards, Jan Nijtmans ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-05-15 07:39 Message: Logged In: NO Removing -flat_namespace did the trick for me on my 10.5 testing machine. X11 didn't fire properly on our regression machines last night, so I'll keep you posted about 10.4. Thanks! ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2008-05-15 00:41 Message: Logged In: YES user_id=61031 Originator: NO This is related to [ tktoolkit-Bugs-1958367 ] Tk no longer builds correctly on Tru64 which is fact is a TEA bug. How can we expect Tcl extensions to link with all dependant libraries, if TEA and Tk don't do it correctly for all platforms!!! Therefore, I suggest to change the flag to RLTD_GLOBAL now, then make sure that [tktoolkit-Bugs-1958367] gets fixed and that Sebastian's problem gets solved indepandant from the flag value (did removing the '-flat_namespace' help???). Only when those steps succeed, we can even think of changing the flag to RLTD_LOCAL. > I think your example will happen much less often than the scenario I > described initially. Agreed. But if the real bug here can be fixed, then we can have both. ---------------------------------------------------------------------- Comment By: Sbastien BARRE (sebbarre) Date: 2008-05-13 16:40 Message: Logged In: YES user_id=214100 Originator: YES > nobody: > Anyway, I rest my case. I think your example will happen much less often than the scenario I described initially. Anyway. > das: >i.e. compile with > -DTCL_DYLD_USE_DLFCN=0 -DTCL_DYLD_USE_NSMODULE=1 -DTCL_DEBUG_LOAD >and with > -DTCL_DYLD_USE_DLFCN=1 -DTCL_DYLD_USE_NSMODULE=0 -DTCL_DEBUG_LOAD >and if there is a difference, paste results of your [load]s above. Yes, I had tried that last week, since I test using different major versions of Tcl/Tk compiled from source; sadly, the difference is that Tcl 8.5 would hang while loading the second library, on MacOSX 10.5. Though I had not tried with TCL_DEBUG_LOAD, I can try that again tomorrow. > Please make sure to test on both OSX 10.4 and 10.5 if possible and with > binaries linked on 10.5 as well as on 10.4, the dyld and linker 10.4 and 10.5 failed in the exact same way for me. One of our regression machine is 10.4, which is where I spotted the problem. I then tried on my own Mac, running 10.5, and this failed as well. Another one of our regression computer is running 10.3, and has no problem (since it's not using dlopen). > BTW, have you considered not linking with -flat_namespace? that is a > legacy option that completely changes how symbols are resolved, using > two-level namespaces will record which library a given symbol comes I wasn't aware of that, and will try. Thanks ---------------------------------------------------------------------- Comment By: Daniel A. Steffen (das) Date: 2008-05-13 16:23 Message: Logged In: YES user_id=90580 Originator: NO can you confirm that the current implementation via NSModule behaves differently than implementation via dlfcn in this case? i.e. compile with -DTCL_DYLD_USE_DLFCN=0 -DTCL_DYLD_USE_NSMODULE=1 -DTCL_DEBUG_LOAD and with -DTCL_DYLD_USE_DLFCN=1 -DTCL_DYLD_USE_NSMODULE=0 -DTCL_DEBUG_LOAD and if there is a difference, paste results of your [load]s above. Please make sure to test on both OSX 10.4 and 10.5 if possible and with binaries linked on 10.5 as well as on 10.4, the dyld and linker implementations in 10.5 are very different from 10.4... if the behaviour is indeed different between NSModule and dlfcn in 10.5, I'd want to fix it irrespective of the general RLTD_GLOBAL vs RLTD_LOCAL debate. BTW, have you considered not linking with -flat_namespace? that is a legacy option that completely changes how symbols are resolved, using two-level namespaces will record which library a given symbol comes from at link time, which may take care of the problem at hand... ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-05-13 15:59 Message: Logged In: NO > My question is, and I'm not certain you answered it: what would Tcl > *break* by using RLTD_GLOBAL consistently? Using RLTD_LOCAL would allow two versions of the same library cooperate without problems. e.g. suppose we have a library libtclcrypt.so that depends on libssl.so.1 and another libtclssl.so which depends on libssl.so.2. Then, even if libssl.so.1 and libssl.so.2 have the same symbols, both Tcl extensions can cooperate fine. Using RLTD_GLOBAL the outcome is platform-dependant. If there is only one version of each library, then it is highly unlikely that such a thing happens. But you can never assure that two different libraries don't define the same symbol (e.g. myalloc()) and by accident both export them. Which myalloc() will be used then?...... Anyway, I rest my case. ---------------------------------------------------------------------- Comment By: Sbastien BARRE (sebbarre) Date: 2008-05-13 15:24 Message: Logged In: YES user_id=214100 Originator: YES > No, RLTD_LOCAL does not mean that all static members are duplicated. OK, good to know. > there is a problem with run-time resolution of symbols in your libraries. I know Tcl/Tk is solid, but this C++ toolkit (VTK) is more than 10 years old. We have wrapped it using Tcl for pretty much the same amount time (as well as Python and Java), and have performed "package require" or calls to Tcl's "load" in hundreds of Tcl tests to exercise the toolkit every night, for years, on dozens of Unix platforms and Win32 platforms, using many, many compilers. It has always been divided into different shared libraries. Problems arose only recently when testing on MacOSX > 10.4 with Tcl/Tk 8.5, since it is now using dlopen (instead of NSModule). > It might be that library A and B are correct, but the problem is in > C which is used by both A and B. It's hard to tell from here. Please check my first message. I'm firing the shell, then loading A (libvtkCommonTCL). Then loading B (libvtkFilteringTCL). The symbol it is clearly complaining about (__Z14vtkTclInDeleteP10Tcl_Interp) *is* in A, not in a library that would be in a dependency of A. > I'm not trying to break currently working code. But some platforms, > like win32, don't support undefined symbols in dll's at all, so > if your libraries have a symbol resolution problem it will be > impossible to port your libraries to win32. This toolkit has been cross-platform from its origin. Regression testing shows we have no resolution problems on Win32 platforms, from Win 2000 to Vista, using MsDev6 to VisualStudio8. Therefore, I'm inclined to think there is no symbol resolution issue at the moment, though I might be wrong, but we do stress test VTK *a lot*, every night, and in a continuous manner during the day. I think there is probably a good reason why the default is RLTD_GLOBAL and not RLTD_LOCAL, on MacOSX, as opposed to some other OS... My question is, and I'm not certain you answered it: what would Tcl *break* by using RLTD_GLOBAL consistently? I also did some quick Googling: http://groups.google.com/group/comp.lang.perl.misc/msg/a2877cf7e0c656fe => this Perl user seemed to have the exact same problem while loading a Perl module. Another msg that seems to indicate that RLTD_LOCAL can not be used if you load two libs A and B, B depending on A (wherewas RLTD_GLOBAL would allow it to work, and not break other examples, unless I missed something) http://gcc.gnu.org/ml/gcc/2002-05/msg02034.html Thank you ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-05-13 14:38 Message: Logged In: NO > But even if it did work with RLTD_LOCAL, there would be a problem with > global variables (i.e. static members of classes for example). They would > be duplicated if each library gets its own version of them, and hell would > break loose. No, RLTD_LOCAL does not mean that all static members are duplicated. It only changes the visibility of the symbols, not the way the library is loaded. It still will be loaded once for each application (executable) no matter how many dlopen's are done. I handled more similar problems in the past, and still I am convinced that there is a problem with run-time resolution of symbols in your libraries. It might be that library A and B are correct, but the problem is in C which is used by both A and B. It's hard to tell from here. One way to find out is make a dependancy graph of all your libraries, and try to load them separately from bottom to top. Does your linker have an option like --no-undefined? Then your linker can find out about such problems at build-time. I'm not trying to break currently working code. But some platforms, like win32, don't support undefined symbols in dll's at all, so if your libraries have a symbol resolution problem it will be impossible to port your libraries to win32. Good luck. If you have more questions, feel free to ask. ---------------------------------------------------------------------- Comment By: Sbastien BARRE (sebbarre) Date: 2008-05-13 08:31 Message: Logged In: YES user_id=214100 Originator: YES Jan, Thanks for your comment. However, I think your statement might be incorrect: > The 'correct' way to solve this is make sure that when compiling B, > make sure to add '-lA' to the link line, then B will > see the symbols. Additional advantage: loading B will load A > automatically when it is not already done. Our libraries were linked that way. If you check my first email, you will see that I ran "otool -L" against libvtkFilteringTCL, and it correctly reports libvtkCommonTCL as a known dependency. I also just double-checked our link line, it is correct. So we suspect the problem is on the dlopen side and that very specific flag. But even if it did work with RLTD_LOCAL, there would be a problem with global variables (i.e. static members of classes for example). They would be duplicated if each library gets its own version of them, and hell would break loose. > Because tclLoadDl uses RLTD_GLOBAL as well, probably there are already > libraries out there who fail to indicate all linked libraries. > Therefore, I would recommend to change it to RLTD_GLOBAL > in tclLoadDyld.c as well. Would be great. > However, I suggest to change it to RLTD_LOCAL in Tcl 8.6, and > document the change clearly. I'm afraid I don't follow the rational here. Not only would the (reasonable) example I describe in my first email fail on MacOSX >= 10.4 (it does fail, I assure you), but it would start failing on all our others Unix platforms. We have nightly regressions tests here that show it works fine on our Unix systems and MacOSX < 10.4. It would be unfortunate if that situation was reverted. Thank you ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2008-05-13 01:17 Message: Logged In: YES user_id=61031 Originator: NO 2008/5/9 SourceForge.net <no...@so...>: > However, the call to dlopen is problematic: > dlHandle = dlopen(nativePath, RTLD_NOW | RTLD_LOCAL); > > RLTD_LOCAL is *not* the default value for dlopen, the default is RTLD_GLOBAL. RLTD_LOCAL prevents Tcl/Tk from being able to load any dynamic library/module which depends (i.e. was linked against) a previously loaded dynamic library/module. > > Unless I missed the rational for picking RLTD_LOCAL, could Tcl/Tk please use the default RLTD_GLOBAL? Generally, undefined symbols in libraries are a bad idea, because at run-time those unresolved symbols must be resolved. Therefore, RLTD_LOCAL is faster but has the disadvantage that all libraries must know which other libraries they depend on. In my view, RLTD_LOCAL is prefered whenever possible. > In the second case, a library/module A will load correctly but will hide its symbols. Tcl/Tk will fail to load a library/module B if B depends on symbols in A (i.e. was dynamically linked against A). The 'correct' way to solve this is make sure that when compiling B, make sure to add '-lA' to the link line, then B will see the symbols. Additional advantage: loading B will load A automatically when it is not already done. > I'll try to track down where the decision to use RLTD_LOCAL came from but > at first glance I agree that this appears to be a bug, thanks for the > report. Because tclLoadDl uses RLTD_GLOBAL as well, probably there are already libraries out there who fail to indicate all linked libraries. Therefore, I would recommend to change it to RLTD_GLOBAL in tclLoadDyld.c as well. However, I suggest to change it to RLTD_LOCAL in Tcl 8.6, and document the change clearly. Regards, Jan Nijtmans ---------------------------------------------------------------------- Comment By: Daniel A. Steffen (das) Date: 2008-05-09 13:31 Message: Logged In: YES user_id=90580 Originator: NO I'll try to track down where the decision to use RLTD_LOCAL came from but at first glance I agree that this appears to be a bug, thanks for the report. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1961211&group_id=10894 |