Re: [Lmod-users] Question about startup module behavior
A Lua based environment module system that reads TCL modulefiles.
Brought to you by:
rtmclay
|
From: Shahzeb S. <sha...@lb...> - 2021-09-10 21:16:28
|
The `/etc/profile.d/zz-cray-pe.sh` script is written in this bizarre way which I personally don’t like. They define a list of modules in their init_module_list which is the startup modules loaded. I would have preferred Cray just provide a single modulefile that provides the startup but instead they tie Lmod configuration to external script to define list of modules
siddiq90@login01> cat /etc/profile.d/zz-cray-pe.sh
#! /bin/sh
###############################################################################
#
# /etc/bash.bashrc.local
# /etc/profile.d/zz-cray-pe.sh (RHEL systems)
#
# Initializes the Cray PE environment. To customize what modules are
# loaded by default, modify /etc/cray-pe.d/cray-pe-configuration.sh.
#
# Copyright 2021 Hewlett Packard Enterprise Development LP
#
###############################################################################
# Shasta PE images exist on UAI, UAN, NCN, and Compute nodes.
# NCNs don't use the Cray Programming Environment.
if [[ -z $(echo $HOSTNAME | grep ncn) ]] ; then
# Source configuration file
. /etc/cray-pe.d/cray-pe-configuration.sh
if [ ! -z "$BASH" ] ; then
my_shell=${BASH##*/}
elif [ ! -z "$ZSH_NAME" ] ; then
my_shell=$ZSH_NAME
else
my_shell=$(/usr/bin/ps -p $$ -ocomm=)
fi
if [[ $module_prog = "environment modules" ]]; then
mod_paths="/opt/cray/pe/craype-targets/default/modulefiles $mpaths"
if [ -e /opt/cray/pe/modules/default/init/$my_shell ] ; then
. /opt/cray/pe/modules/default/init/$my_shell
elif [ -e /opt/cray/pe/modules/default/init/sh ] ; then
. /opt/cray/pe/modules/default/init/sh
fi
else
mod_paths="/opt/cray/pe/lmod/modulefiles/core
/opt/cray/pe/lmod/modulefiles/craype-targets/default
$mpaths
/opt/cray/modulefiles
/opt/modulefiles"
if [ -e /usr/share/lmod/lmod/init/profile ] ; then
. /usr/share/lmod/lmod/init/profile
elif [ -e /usr/local/lmod/lmod/init/profile ] ; then
. /usr/local/lmod/lmod/init/profile
fi
fi
if [ -z "$PELOCAL_PRGENV" ] ; then
# Initialize PE
if [[ -z $MODULESHOME ]] ; then
echo "Error: Unable to initialize $module_prog." > /dev/stderr
else
# Add module paths
for p in $(echo $mod_paths) ; do
if [ -d $p ] ; then
MODULEPATH=$MODULEPATH:$p
fi
done
export MODULEPATH
if [[ $module_prog = "environment modules" ]]; then
# In systems that haven't updated their conf files to
# use the new variables, loading the PrgEnv gets you
# everything.
for m in $(echo ${init_module_list:-PrgEnv-$default_prgenv} \
| sed "s,:, ,g") ; do
module load $m
done
else
LMOD_SYSTEM_DEFAULT_MODULES=${LMOD_SYSTEM_DEFAULT_MODULES:-$(echo ${init_module_list:-PrgEnv-$default_prgenv} | sed "s_ *_:_g")}
export LMOD_SYSTEM_DEFAULT_MODULES
module --initial_load --no_redirect restore
fi
# Protect against multiple execution
readonly PELOCAL_PRGENV=true
export PELOCAL_PRGENV
fi
else
if [[ $module_prog = "lmod" ]] ; then
module refresh
fi
fi
fi
siddiq90@login01> cat /etc/cray-pe.d/cray-pe-configuration.sh
#! /bin/bash
################################################################################
#
# /etc/cray-pe.d/cray-pe-configuration.sh
#
# Defines site preferences for:
# Module command, i.e. Environment Modules vs Lmod
# Default PrgEnv
# Additional, non-Cray module paths to use,
# modules to load on initilization,
# modules to be part of the PrgEnv module set.
#
# Sourced by:
# /etc/profile.d/cray-pe.sh
# /etc/cray-pe.d/gen-prgenv.sh
#
#
# Copyright 2020-2021 Hewlett Packard Enterprise Development LP
#
################################################################################
# Define the module command to use:
# environment modules (TCL) or lmod
module_prog="lmod"
# Define the default PrgEnv to use
default_prgenv="nvidia"
# Define any addtional module paths to use
mpaths=""
# Define the list of modules to be loaded on login,
# e.g. workload managers, site modules.
# This list can be space or colon separated.
init_module_list="
craype-x86-rome
craype-network-ofi
perftools-base
xpmem
PrgEnv-$default_prgenv
cray-pmi
cray-pmi-lib
xalt/2.10.2
darshan/3.2.1
"
# Define the list of modules in the PrgEnv collection
# excluding cpe-$env, craype, and compiler as those
# are added by the PrgEnv module itself.
# This list can be space or colon separated.
prgenv_module_list="cray-dsmml cray-mpich cray-libsci"
# Define set_default scripts to run.
# This enables a product default version
# outside of the default release.
# E.G. If 21.02 is the default release, but
# you want the craype version from
# the 20.12 release, you would add
# /opt/cray/pe/admin-pe/set_default_files/set_default_craype_2.7.4
one_off_set_defaults=“"
The Lmod configuration that we provide puts the Lmod tweaks that we have in place to make Lmod work including SitePackages, hooks, etc… Shown below is our configuration file
siddiq90@login01> cat /etc/profile.d/zzz-lmod.sh
#!/bin/bash
# root must never source anything in a network filesystem, these settings are for users
if [ "$UID" != 0 ]; then
NERSC_MODULES_REV=$(cat /etc/nersc_modules_rev)
if [ -e ~/.nersc_modules_rev ]; then
NERSC_MODULES_REV=$(cat ~/.nersc_modules_rev)
fi
if [ -e /etc/nomodules ]; then
NERSC_MODULES_REV=""
elif [ -e ~/.nomodules ]; then
NERSC_MODULES_REV=""
elif [ "${USER}" = "root" ]; then
NERSC_MODULES_REV=""
fi
# See Lmod Configuration: https://lmod.readthedocs.io/en/latest/090_configuring_lmod.html#environment-variables-only
# Used to set LMOD System Name for shared filesystem see
# https://lmod.readthedocs.io/en/latest/120_shared_home_directories.html#shared-home-file-system
export LMOD_SYSTEM_NAME="muller"
# We want a separate config path for muller and perlmutter
LMOD_CONFIG_DIR=/global/common/software/nersc/$LMOD_SYSTEM_NAME/$NERSC_MODULES_REV/lmod-configuration
# to debug changes uncomment line below assuming you are sourcing z01_lmod.sh
# LMOD_CONFIG_DIR=$PWD
# sets site name to be used in Lmod family such as NERSC_FAMILY_COMPILER=nvidia.
# The Site Name gets injected in form <Site>_FAMILY_*
export LMOD_SITE_NAME="NERSC"
# used for Module Tracking feature see https://lmod.readthedocs.io/en/latest/300_tracking_module_usage.html#tracking-usage
export LMOD_SYSHOST="muller"
# need to figure location to this path
export LMOD_PACKAGE_PATH=$LMOD_CONFIG_DIR/sitepackages
# need to figure location to this path
export LMOD_RC=$LMOD_PACKAGE_PATH/lmodrc.lua
# case independent sorting
export LMOD_CASE_INDEPENDENT_SORTING=yes
# LMOD_CACHED_LOADS improve performance by reading from spider cache instead
# of traversing MODULEPATH. This option can be used if LMOD_IGNORE_CACHE is not
# set. See https://lmod.readthedocs.io/en/latest/320_improving_perf.html
##### SPIDER CACHE LOGIC #####
# DISABLING the spider cache because the cache needs to consider node-local,
# PE, and /global/common module contributions and must not assume those are
# correctly integrated into a single cache (given the complexity of a rolling
# update).
export LMOD_CACHED_LOADS=no
export LMOD_IGNORE_CACHE=1
# path where lmod writes system spider cache (note LMOD_SPIDER_CACHE_DIR is not just a variable we set)
# export LMOD_SPIDER_CACHE_DIR=/global/common/software/nersc/$NERSC_MODULES_REV/lmod_spider_cache_dir
##### SPIDER CACHE LOGIC #####
# customize moduletree output to prefer grouped. One can do
# 'module -s system avail' or 'module -s grouped avail' to change style
# of output. The 'system' is default
# style with this change we changed the default to 'grouped'
export LMOD_AVAIL_STYLE="grouped:system"
# uncomment line below in-case we need to switch default
#export LMOD_AVAIL_STYLE="system:grouped"
# file used for setting module defaults, hide versions, alias see
# https://lmod.readthedocs.io/en/latest/093_modulerc.html
export LMOD_MODULERCFILE=$LMOD_PACKAGE_PATH/modulerc.lua
# Lmod Default modules (lmod, settarg)
module use $LMOD_PKG/modulefiles/Core
# Setup NERSC module locations
if [ -n $NERSC_MODULES_REV ]; then
if [ ! -d /global/common/software/nersc/${NERSC_MODULES_REV} ]; then
echo "NERSC modules revision $NERSC_MODULES_REV does not exist"
fi
#
# Add general module paths
#
if [ -d /global/common/software/nersc/${NERSC_MODULES_REV}/modulefiles ]; then
module use --append /global/common/software/nersc/${NERSC_MODULES_REV}/modulefiles
fi
if [ -d /global/common/software/nersc/${NERSC_MODULES_REV}/extra_modulefiles ]; then
module use --append /global/common/software/nersc/${NERSC_MODULES_REV}/extra_modulefiles
fi
fi
fi
Removing `zzz-lmod.sh` won’t solve the problem that is response for MODULEPATH in our /global/common/… where modules are provided by NERSC.
> On Sep 10, 2021, at 4:11 PM, Robert McLay <mc...@ta...> wrote:
>
> It is useful to include the version of Lmod you are using. But I know that you are using Lmod 8+ because you got the message:
> Resetting modules to system default. Resetting $MODULEPATH back to system default. All extra directories will be removed from $MODULEPATH.
>
> The issue is that you need to define $MODULEPATH to have all the paths you want before the first module command which is in your case is in "zz-cray-pe.*"
> You have not shown how the cray is setting $MODULEPATH. You will probably want to get in front of that by using export MODULEPATH=... commands instead of module use.
>
> The other issue you are seeing might be related to the Cray is sourcing Lmod startup files and you are doing it again with zzz-lmod.*
>
> What happens if you remove zzz-lmod.* files and define $MODULEPATH with NERSC locations before zz-cray-pe.*?
>
> In other words, what is the purpose of the zzz-lmod.* files ?
>
> Best,
> Robert.
>
>
>
> On Fri, Sep 10, 2021 at 2:37 PM Shahzeb Siddiqui <sha...@lb... <mailto:sha...@lb...>> wrote:
> Unrelated to this when we login to the system we can see our lmod configuration is not showing the color coding property and my guess is the order of startup script. however when i purge and restore then i see the configuration
> <Screen Shot 2021-09-10 at 3.30.14 PM.png>
>
>
> <Screen Shot 2021-09-10 at 3.31.37 PM.png>
>
>
> The cray configuration are marked in green and our configuration are in yellow.
>
> siddiq90@login01> ls -l /etc/profile.d/
> total 149
> -rw-r--r-- 1 root root 1220 Dec 3 2020 alljava.csh
> -rw-r--r-- 1 root root 1679 Dec 3 2020 alljava.sh
> -rw-r--r-- 1 root root 662 Oct 29 2018 bash_completion.sh
> -rw-r--r-- 1 root root 30230 Apr 12 20:14 bindkey.tcsh
> -rw-r--r-- 1 root root 39814 Apr 12 20:14 complete.tcsh
> -rw-r--r-- 1 root root 603 Apr 9 2018 csh.ssh
> -rw-r--r-- 1 root root 1107 May 25 2018 gawk.csh
> -rw-r--r-- 1 root root 757 May 25 2018 gawk.sh
> -rw-r--r-- 1 root root 186 Mar 5 2021 git.csh
> -rw-r--r-- 1 root root 378 Nov 9 2020 krb5.csh
> -rw-r--r-- 1 root root 366 Nov 9 2020 krb5.sh
> -rw-r--r-- 1 root root 3053 Apr 9 2018 lang.csh
> -rw-r--r-- 1 root root 2444 Apr 9 2018 lang.sh
> -rwxr-xr-x 1 root root 772 Nov 4 2020 mpi-selector.csh
> -rwxr-xr-x 1 root root 743 Nov 4 2020 mpi-selector.sh
> -rw-r--r-- 1 root root 3963 Apr 9 2018 profile.csh
> -rw-r--r-- 1 root root 3487 Apr 9 2018 profile.sh
> -rw-r--r-- 1 root root 520 Sep 10 08:54 ps1.sh
> -rw-r--r-- 1 root root 690 Dec 3 2020 sh.ssh
> -rw-r--r-- 1 root root 983 Apr 9 2018 xdg-environment.csh
> -rw-r--r-- 1 root root 1425 Apr 9 2018 xdg-environment.sh
> -rw-r--r-- 1 root root 2839 Sep 10 10:00 zz-cray-pe.csh
> -rw-r--r-- 1 root root 3124 Sep 10 10:00 zz-cray-pe.sh
> -rw-r--r-- 1 root root 14013 Mar 10 2021 zzz-glib2.csh
> -rw-r--r-- 1 root root 11799 Mar 10 2021 zzz-glib2.sh
> -rw-r--r-- 1 root root 381 May 25 2018 zzz-groff.csh
> -rw-r--r-- 1 root root 256 May 25 2018 zzz-groff.sh
> -rw-r--r-- 1 root root 3696 Sep 10 10:02 zzz-lmod.csh
> -rw-r--r-- 1 root root 3666 Sep 10 10:01 zzz-lmod.sh
> -rw-r--r-- 1 root root 1481 Sep 10 10:01 zzz-nerscenv.csh
> -rw-r--r-- 1 root root 1639 Sep 10 12:27 zzz-nerscenv.sh
>
>
> The zz-cray-pe.sh script are provided in the CPE so this may change over time the problem is they are putting the logic for $LMOD_SYSTEM_DEFAULT_MODULES in their script so it only works with Cray modules and doesn’t work with our modules. We couldn’t move our configuration before their script because their script is also setting up lmod where they have this if condition based on the module system
>
> if [[ $module_prog = "environment modules" ]]; then
> mod_paths="/opt/cray/pe/craype-targets/default/modulefiles $mpaths"
> if [ -e /opt/cray/pe/modules/default/init/$my_shell ] ; then
> . /opt/cray/pe/modules/default/init/$my_shell
> elif [ -e /opt/cray/pe/modules/default/init/sh ] ; then
> . /opt/cray/pe/modules/default/init/sh
> fi
> else
> mod_paths="/opt/cray/pe/lmod/modulefiles/core
> /opt/cray/pe/lmod/modulefiles/craype-targets/default
> $mpaths
> /opt/cray/modulefiles
> /opt/modulefiles"
> if [ -e /usr/share/lmod/lmod/init/profile ] ; then
> . /usr/share/lmod/lmod/init/profile
> elif [ -e /usr/local/lmod/lmod/init/profile ] ; then
> . /usr/local/lmod/lmod/init/profile
> fi
> fi
>
>
> Let me know if there is a workaround to this issue. My initial thought was hacking up their script and move the module —initial_load at a later point.
>
>
>> On Sep 10, 2021, at 3:22 PM, Shahzeb Siddiqui <Sha...@lb... <mailto:Sha...@lb...>> wrote:
>>
>> This is a question about how startup configuration works for Perlmutter.
>>
>> Cray has setup the following lines to load modules in startup in /etc/profile.d/zz-cray-pe.sh
>>
>>
>>
>> LMOD_SYSTEM_DEFAULT_MODULES=${LMOD_SYSTEM_DEFAULT_MODULES:-$(echo ${init_module_list:-PrgEnv-$default_prgenv} | sed "s_ *_:_g")}
>> export LMOD_SYSTEM_DEFAULT_MODULES
>> module --initial_load --no_redirect restore
>>
>>
>> Our startup modules look as follows
>>
>> siddiq90@login01> echo $LMOD_SYSTEM_DEFAULT_MODULES
>> craype-x86-rome:craype-network-ofi:perftools-base:xpmem:PrgEnv-nvidia:cray-pmi:cray-pmi-lib:xalt/2.10.2:darshan/3.2.1
>>
>>
>> We have a few modules darshan, xalt coming from our NERSC modules that are not coming up after restore.
>>
>> siddiq90@login01> module purge
>> siddiq90@login01> module restore
>> Resetting modules to system default. Reseting $MODULEPATH back to system default. All extra directories will be removed from $MODULEPATH.
>> siddiq90@login01> module -t list
>> craype-x86-rome
>> libfabric/1.11.0.4.79
>> craype-network-ofi
>> perftools-base/21.09.0
>> xpmem/2.2.40-7.0.1.0_2.7__g1d7a24d.shasta
>> nvidia/21.7
>> craype/2.7.10
>> cray-dsmml/0.2.1
>> cray-mpich/8.1.9
>> cray-libsci/21.08.1.2 <http://21.08.1.2/>
>> PrgEnv-nvidia/8.1.0
>> cray-pmi/6.0.13
>> cray-pmi-lib/6.0.13
>>
>> Our startup Lmod configuration is /etc/profile.d/zzz-lmod.sh which is sourced after /etc/profile.d/zz-cray-pe.sh and it exposes MODULEPATH to NERSC software stack. The problem is “module —initial_load” command doesn’t find modules in our MODULEPATH.
>>
>>
>> The only thing i see is we need to delay running “module —-initial_load —no_redirect restore” to later script in the profile.d order. Is that correct?
>>
>>
>
> _______________________________________________
> Lmod-users mailing list
> Lmo...@li... <mailto:Lmo...@li...>
> https://lists.sourceforge.net/lists/listinfo/lmod-users <https://lists.sourceforge.net/lists/listinfo/lmod-users>
> >> This message is from an external sender. Learn more about why this <<
> >> matters at https://links.utexas.edu/rtyclf <https://links.utexas.edu/rtyclf>. <<
>
>
> --
> Robert McLay, Ph.D.
> Manager of Software Tools, HPC
> mc...@ta... <mailto:mc...@ta...>
>
|