Xavier, all,
I noticed that Modules-Tcl can be thrown off kilter when:
* The tclsh that interprets <MODULES_TCL>/libexec/modulecmd.tcl is dynamically linked, and
* LD_LIBRARY_PATH happens to contain an incompatible libtcl.so version under the same name that this tclsh wants.
== STEPS TO REPRODUCE ==
First, a contrived example that works without a wayward libtcl.so on hand (Bash syntax, on Linux):
read libc_used libtcl_wanted <<<$( ldd /usr/bin/tclsh | sort | awk '/libc.so/ { print $3 } /libtcl.*so/ {print $1}' )
echo $libc_used $libtcl_wanted
mod_tmp=$( mktemp -d )
ln -s $libc_used $mod_tmp/$libtcl_wanted
LD_LIBRARY_PATH=$mod_tmp module --version
On a CentOS-6.9 system, I get:
$ echo $libc_used $libtcl_wanted
/lib64/libc.so.6 libtcl8.5.so
$ LD_LIBRARY_PATH=$mod_tmp module --version
/usr/bin/tclsh: symbol lookup error: /usr/bin/tclsh: undefined symbol: Tcl_Main
(The exact error message does not matter here, the key is that tclsh took the bait.)
In case the contrivance above is too much, what actually happened was that I one of my modules contained:
> $ module show python-env/2.7/anaconda-4
> …
> prepend-path LD_LIBRARY_PATH /opt/apps/python-env/2.7.11-09-anaconda-4-EL6/lib
> …
> $ find /opt/apps/python-env/2.7.11-09-anaconda-4-EL6/lib -name libtcl\*
> /opt/apps/python-env/2.7.11-09-anaconda-4-EL6/lib/libtclstub8.5.a
> /opt/apps/python-env/2.7.11-09-anaconda-4-EL6/lib/libtcl.so
> /opt/apps/python-env/2.7.11-09-anaconda-4-EL6/lib/libtcl8.5.so
Uh-oh! Normally, tclsh uses the system-provided libtcl.so:
> $ ldd /usr/bin/tclsh | grep tcl
> libtcl8.5.so => /usr/lib64/libtcl8.5.so (0x0000003ac4000000)
But with the module above loaded, that tclsh instance picked up the application's idea of libtcl.so:
> $ ldd /usr/bin/tclsh | grep tcl
> libtcl8.5.so => /opt/apps/python-env/2.7.11-09-anaconda-4-EL6/lib/libtcl8.5.so (0x00002b39346bd000)
And thus, the following happens with each module(1) invocation, e.g. --version:
> $ module --version
> application-specific initialization failed: Can't find a usable init.tcl in the following directories:
> /opt/anaconda1anaconda2anaconda3/lib/tcl8.5 /usr/lib/tcl8.5 /lib/tcl8.5 /usr/library /library /tcl8.5.18/library /tcl8.5.18/library
>
>
>
> This probably means that Tcl wasn't installed properly.
>
> Modules Release Tcl 1.962 (2017-08-09)
FWIW, the tclsh version coming from the interfering application works just fine. Also, it was probably compiled using an -rpath incantation:
$ ldd /opt/apps/python-env/2.7.11-09-anaconda-4-EL6/bin/tclsh8.5 | grep libtcl
libtcl8.5.so => /opt/apps/python-env/2.7.11-09-anaconda-4-EL6/bin/../lib/libtcl8.5.so (0x00002b90dddfd000)
== DISCUSSION ==
There should (must) be a way to isolate the modules-internal tclsh instance from side effects caused by the user's environment. Naively, clearing LD_LIBRARY_PATH and LD_PRELOAD etc. could be done, but if I'm not mistaken, that would shut out modulecmd.tcl from reading and updating existing values of this very variable for the user.
I had two ideas, both of which need changes to modulecmd.tcl beyond my comfort level with Tcl:
(1) .../tclsh modulecmd.tcl is run under a sanitized LD_LIBRARY_PATH; it will read the user's original LD_LIBRARY_PATH from a saved or detour variable (LD_LIBRARY_PATH_modshare?), but issues commands to update LD_LIBRARY_PATH as usual.
(2) If no module subcommand ever reads stdin (!?), then modulecmd.tcl could get the user's environment not natively via ::env() but instead read it from stdin, in the manner of GNU which(1): "env | which --read-alias"
Would it be sensible to pursue either idea? (1) looks to be small in scope but might have to be made more cognizant of other ld(1) intricacies and of non-Linux platforms. (2) is rather more serious and would mean to work a tclsh wrapper into the Modules-Tcl install procedure.
Best wishes,
--
Michael Sternberg, Ph.D.
Principal Scientific Computing Administrator
Center for Nanoscale Materials
Argonne National Laboratory |