Hello Xavier,
from your summary below it is not clear to me if your proposed 'modulepath-only' and 'modulepath-ignore' commands would solely work in modifying the modulepath or if they would work on a per file level (e.g. to exclude certain file types from being considered modulefiles).
The use case I was hoping to address with the new feature is the following: For the sake of modularity we have some files in module directories that are e.g. read by modulefiles but which are not valid modulefiles themselves. Currently these are hidden files (i.e. their name starting with a dot, e.g. '.some_function.tcl') so that the module command does not 'see' them.
When managing the module directory tree these hidden files sometimes cause confusion. If we were able to define a modulefile name filter (e.g. ignore all *.tcl files) we could make such files regular files w/o any impact on the traversing of the modulepath. Hence the idea of using the gitignore syntax as this would enable ignoring both (classes of) files as well as directories. Or explicitly including them via negation.
Best Regards,
Martin
-----Original Message-----
From: Xavier Delaruelle <xav...@gm...>
Sent: Freitag, 17. Januar 2025 07:52
To: Environment Modules usage and discussion. <mod...@li...>
Subject: Re: [Modules] Prevent modulefile search from looking at all files in directory
🛑 ACHTUNG: Diese E-Mail stammt nicht von AMTC oder Tekscend Photomask. Vor dem Öffnen von Links oder Anhängen bitte Absender und Kontext sorgfältig prüfen. // This email does not originate from AMTC or Tekscend Photomask. Please consider the sender and context before opening links or attachments.
Many thanks Martin helping to craft this new feature.
This is how I see it currently:
* multiple pattern may be specified on one command and command may be defined several times
* this will give a list of all pattern to ignore (or only include)
* as soon as a "modulepath-only" command is set, "modulepath-ignore"
patterns are all ignored
* patterns are prefixed by the location of the modulerc where they are defined
for instance "modulepath-ignore 1*" in "modpath/foo/.modulerc"
produces the "modpath/foo/1*" ignore pattern
I am quite in line with your approach of basing things on something already existing and well used/designed. I have also to mix this motto with having a performant solution (important to have a new command in .modulerc file to avoid having to check for the existence of an additional file in every walked down directory) and making a progressive implementation: something simpler (just glob pattern for
instance) implemented first, then later refined to get full gitignore-like syntax capability.
I have looked to the negate pattern. From my understanding, it is there to undo an ignore pattern. So another command will be needed to express the *include only* list.
Many thanks again, that's truly help to design this new feature.
Regards,
Xavier
Le mer. 15 janv. 2025 à 11:19, Bloecker, Martin <Mar...@am...> a écrit :
>
> Xavier, Paul,
>
> I understand the reasoning behind implementing this via the .modulerc file. There are a couple of questions that need to be thought of when going down that route:
> 1. Can multiple search and/or ignore patterns be specified?
> - In a single command?
> - Via multiple subsequent commands?
> 2. Can search and ignore patterns be mixed? What takes precedence?
> 3. In case of multiple search and/or ignore patterns: Can other module commands be used between them? What would be the impact on implementation and module behavior?
>
> Apart from answering these questions I think it is still worthwhile considering the gitignore format to specify search and ignore patterns. I'm pretty sure the git developers have well though about any conceivable filtering requirement 😉. Namely there is a syntax to negate patterns which would allow for a single .modulerc command to cover both inclusion (i.e. search) and exclusion (i.e. ignore) patterns.
>
> Regards,
>
> Martin
> -----Original Message-----
> From: Xavier Delaruelle <xav...@gm...>
> Sent: Mittwoch, 15. Januar 2025 07:43
> To: Environment Modules usage and discussion.
> <mod...@li...>
> Subject: Re: [Modules] Prevent modulefile search from looking at all
> files in directory
>
> 🛑 ACHTUNG: Diese E-Mail stammt nicht von AMTC oder Tekscend Photomask. Vor dem Öffnen von Links oder Anhängen bitte Absender und Kontext sorgfältig prüfen. // This email does not originate from AMTC or Tekscend Photomask. Please consider the sender and context before opening links or attachments.
>
> Many thanks Martin and Paul. That's very good idea to have a mechanism to define patterns to ignore when walking down a modulepath tree.
>
> I also prefer to have this implemented through new commands in .modulerc file.
>
> In addition to a command to ignore pattern, I feel like a command that defines patterns to only take into account may be useful, especially in a case like this one where modulefiles are a few files within a large amount of other kind of files.
>
> I have added a comment to the github issue
> (https://github.com/envmodules/modules/issues/561) to specify how things could be implemented. Feel free to comment.
>
> Regards,
> Xavier
>
> Le mar. 14 janv. 2025 à 15:50, Paul Markfort <pau...@gm...> a écrit :
> >
> > I would say, just add this functionality to .modulerc (why create another file?) - it is already read first.
> > And - yes, restriction to an extension (or set of extensions) should not be the default setting (just an option that can be turned on).
> >
> >
> > On 2025-01-14 2:59 AM, Bloecker, Martin wrote:
> > > All,
> > >
> > > following the discussion below one thought crossed my mind: Exclusion of certain (module)files from reading/processing could be done in much the same way as git is doing with .gitignore i.e. via a (to be implemented) .modulesignore file. From my perspective this approach would have several benefits:
> > >
> > > 1. No change in behavior for existing installations that don't require the new functionality.
> > > 2. Existing installations/modulefile trees can easily be retrofitted by adding .modulesignore file(s). Neither changes in modulefiles nor other modifications to existing files/directories should be required.
> > > 3. Assuming a similar function set as the .gitignore file, using .modulesignore would enable
> > > - Excluding complete subdirectory trees from further processing (e.g. enable SW installations in the same tree as the respective modulefiles).
> > > - Restrict processing to certain file types (e.g. only '*.modfl' files).
> > > - Etc.
> > >
> > > I would assume that such a .modulesignore file would need to be the very first file being processed in every directory (i.e. right before the .modulerc file).
> > >
> > > Would this be a feasible option?
> > >
> > > Regards,
> > >
> > > Martin
> > >
> > > -----Original Message-----
> > > From: Xavier Delaruelle <xav...@gm...>
> > > Sent: Dienstag, 14. Januar 2025 07:41
> > > To: Environment Modules usage and discussion.
> > > <mod...@li...>
> > > Subject: Re: [Modules] Prevent modulefile search from looking at
> > > all files in directory
> > >
> > > 🛑 ACHTUNG: Diese E-Mail stammt nicht von AMTC oder Tekscend Photomask. Vor dem Öffnen von Links oder Anhängen bitte Absender und Kontext sorgfältig prüfen. // This email does not originate from AMTC or Tekscend Photomask. Please consider the sender and context before opening links or attachments.
> > >
> > > Hello Eric,
> > >
> > > Le lun. 13 janv. 2025 à 19:48, Miller, Eric via Modules-interest <mod...@li...> a écrit :
> > >> The symlink approach requires modifying files that use the $ModulesCurrentModulefile variable.
> > >
> > > The following Tcl commands will help to get back the real path:
> > > [file normalize [file readlink $ModulesCurrentModulefile]]
> > >
> > >> For testing purposes I just replaced the variable with the full path to the original file.
> > >>
> > >> I tested the modulefile-only tree by loading a package having 435 versions and module load package/1.2.3, which resolves to an actual version file.
> > >> 3.2.10: 70ms
> > >> 5.5: 570ms
> > >> OS disk caching (i.e. repeated runs) knocks that down to 50ms and 150ms respectively.
> > >>
> > >> This is better than using the release repository for the module path, but still too slow to justify switching from 3.2.10.
> > >
> > > I think introducing an option to avoid looking at adjacent files would be interesting for your setup.
> > >
> > > I have thought about Paul's suggestion on file extension: as it would help with non-modulefiles I think it will not solve the issue in case of large number of adjacent version files.
> > >
> > > I have created an issue to track this feature request:
> > > https://github.com/envmodules/modules/issues/561
> > >
> > > I would be glad to get some sponsoring to work on this. You can contact me directly to exchange about that.
> > >
> > > Regards,
> > > Xavier
> > >
> > >> The modulecache file is 7.3 MB and loads take ~220ms, regardless of OS caching.
> > >> For packages with a small number of versions, using the cache file is much slower.
> > >>
> > >> Eric
> > >>
> > >> -----Original Message-----
> > >> From: Paul Markfort <pau...@gm...>
> > >> Sent: Monday, January 13, 2025 10:13 AM
> > >> To: mod...@li...
> > >> Subject: Re: [Modules] Prevent modulefile search from looking at
> > >> all files in directory
> > >>
> > >> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > >>
> > >>
> > >> First Make sure you read this (if you haven't already):
> > >> https://modules.readthedocs.io/en/stable/modulefile.html#locating
> > >> -m
> > >> odu
> > >> lefiles
> > >>
> > >> An option you might try is to make a module tree that is simply populated with links to the actual module files. And point MODULEPATH to that tree. If that tree were structured exactly the same as your current tree (but only containing the modulefiles); the modulefiles would not have to be rewritten.
> > >>
> > >>
> > >> I suspect, if modules could be told to restrict modulefiles to a particular file extension (what is the extension used for the windows version of modules?), the processing would speed up considerably (as it would not have to open every file to check if it is a module file). I should point out this is NOT a current feature - just asking if such an OPTIONAL feature would even be useful.
> > >>
> > >> Assuming that extension were .modfl (It should probably be different), You would not have to change the name of the modulefile, just add the extension to the name (the extension wouldn't be needed when referencing the file - as it would be assumed).
> > >>
> > >> So the module file would be:
> > >> /release_path/package_name/version_directory/modulefile.modfl
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On 2025-01-13 10:47 AM, Miller, Eric via Modules-interest wrote:
> > >>> [AMD Official Use Only - AMD Internal Distribution Only]
> > >>>
> > >>> Hi Xavier-
> > >>>
> > >>> Thank you for the quick reply.
> > >>>
> > >>> You are correct, we do see poor performance when running "module avail" under 3.2.10.
> > >>> However, we never use that command. Perhaps we would if the performance was better.
> > >>>
> > >>> It is not a hardware issue, but rather a scaling issue. We have thousands of software packages with millions of non-modulefile files across the version directories.
> > >>> All our packages are structured as:
> > >>> /release_path/package_name/version_directory/modulefile
> > >>>
> > >>> For now we are specifying the full path just as you recommended in the second email, but this approach is just a short term fix.
> > >>>
> > >>> Another workaround we have tried is isolating the modulefiles from the release repository into a standalone module_path. However, this requires us to rewrite all our existing modulefiles (~10k).
> > >>> I'll continue to do experiments in this area. Perhaps we can automate the conversion and add it to our release process.
> > >>>
> > >>> The cache currently has 2 issues that prevent us from using it with our current setup:
> > >>> 1. Creating the cache is slow and uses lots of memory. I killed cachebuild after running for 2.5hrs and using 81GB of memory on modern enterprise hardware.
> > >>> 2. Even if the cache is created, using it is still somewhat slow. Removing the module-invalid commands in the cache file helps, but is still slower than the search behavior in 3.2.10.
> > >>>
> > >>> I would ask for the following enhancements to bring the search performance back to parity with version 3 (or even improve on it):
> > >>> 1. Grant the ability to tune the modulefile search so it does not inspect all files in a version directory.
> > >>> 2. If there is an exact path match to the search specifier, use it without searching or inspecting the modulefiles.
> > >>> 3. Improve the cache file search performance and memory usage (i.e. do not load the entire cache into memory). This can be tricky.
> > >>>
> > >>> Regards,
> > >>> Eric Miller
> > >>>
> > >>> -----Original Message-----
> > >>> From: Xavier Delaruelle <xav...@gm...>
> > >>> Sent: Sunday, January 12, 2025 11:58 PM
> > >>> To: Environment Modules usage and discussion.
> > >>> <mod...@li...>
> > >>> Subject: Re: [Modules] Prevent modulefile search from looking at
> > >>> all files in directory
> > >>>
> > >>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > >>>
> > >>>
> > >>> I just remind of an alternative way which is to refer to the module by its full path name:
> > >>>
> > >>> module load /path/to/ModulePath/MyModule/1.0/modulefile
> > >>>
> > >>> This way, no other files will be scanned/evaluated.
> > >>>
> > >>> This solution may help you to find the speed you expect.
> > >>>
> > >>> Regards,
> > >>> Xavier
> > >>>
> > >>> Le dim. 12 janv. 2025 à 20:40, Xavier Delaruelle <xav...@gm...> a écrit :
> > >>>>
> > >>>> Hello Eric,
> > >>>>
> > >>>> Best solution I can think of is using module cache.
> > >>>>
> > >>>> It is a bit slower but it should not be noticeable. If it does,
> > >>>> I would suggest to look at the performance of the underlying
> > >>>> storage system. If there is no issue with the storage system, I
> > >>>> would be happy to get some debugging output in --timer mode.
> > >>>>
> > >>>> With large number of non-modulefiles in modulepaths, load is
> > >>>> slower on newer version of Modules than 3.2 due to collecting
> > >>>> all module symbols that applied to loading module. Even if load
> > >>>> is efficient with this kind of setup on version 3.2, bad
> > >>>> performance should be observed on "module avail".
> > >>>>
> > >>>> Regards,
> > >>>> Xavier
> > >>>>
> > >>>> Le ven. 10 janv. 2025 à 16:59, Miller, Eric via
> > >>>> Modules-interest <mod...@li...> a écrit :
> > >>>>>
> > >>>>> [AMD Official Use Only - AMD Internal Distribution Only]
> > >>>>>
> > >>>>>
> > >>>>> My question: Is there a way to optimize the modulefile search algorithm for cases where the directory contains many files that are NOT modulefiles?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Background:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Given a directory structure like:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> ModulePath/MyModule/1.0/
> > >>>>>
> > >>>>> ModulePath/MyModule/1.1/
> > >>>>>
> > >>>>> ...
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> And given that each version directory contains one file named "modulefile" at the root of the directory and also contains a large number of other subdirectories and files.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> The command: "module load MyModule/1.0" will be very slow.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> The root cause appears to be that the modulefile search algorithm stats and reads every file in each of the ModulePath/MyModule/* directories.
> > >>>>>
> > >>>>> Going back to modulecmd 3.2.10 we do not see this issue.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> I've tried setting MODULES_MCOOKIE_CHECK=eval, but that caused errors when the loader encountered non-modulefile files.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> I've also tried using the cachebuild command, and while it helps, it is still slower than 3.2.10.
> > >>>>>
> > >>>>> The cachefile also appears to be tied to that specific version of modulecmd due to the header: #%Module5.5.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>> Eric Miller
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Modules-interest mailing list
> > >>>>> Mod...@li...
> > >>>>> https://lists.sourceforge.net/lists/listinfo/modules-interest
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Modules-interest mailing list
> > >>> Mod...@li...
> > >>> https://lists.sourceforge.net/lists/listinfo/modules-interest
> > >>>
> > >>> _______________________________________________
> > >>> Modules-interest mailing list
> > >>> Mod...@li...
> > >>> https://lists.sourceforge.net/lists/listinfo/modules-interest
> > >>>
> > >>
> > >> --
> > >> --------------------------------------------------------
> > >> The views and opinions expressed above are strictly those of the author(s). The content of this message has not been reviewed nor approved by any entity whatsoever.
> > >> --------------------------------------------------------
> > >> Paul FM Info: http://paulfm.com/~paulfm/
> > >> --------------------------------------------------------
> > >>
> > >>
> > >> _______________________________________________
> > >> Modules-interest mailing list
> > >> Mod...@li...
> > >> https://lists.sourceforge.net/lists/listinfo/modules-interest
> > >>
> > >>
> > >> _______________________________________________
> > >> Modules-interest mailing list
> > >> Mod...@li...
> > >> https://lists.sourceforge.net/lists/listinfo/modules-interest
> > >
> > >
> > > _______________________________________________
> > > Modules-interest mailing list
> > > Mod...@li...
> > > https://lists.sourceforge.net/lists/listinfo/modules-interest
> > >
> > > _______________________________________________
> > > Modules-interest mailing list
> > > Mod...@li...
> > > https://lists.sourceforge.net/lists/listinfo/modules-interest
> > >
> >
> > --
> > --------------------------------------------------------
> > The views and opinions expressed above are strictly those of the
> > author(s). The content of this message has not been reviewed nor
> > approved by any entity whatsoever.
> > --------------------------------------------------------
> > Paul FM Info: http://paulfm.com/~paulfm/
> > --------------------------------------------------------
> >
> >
> > _______________________________________________
> > Modules-interest mailing list
> > Mod...@li...
> > https://lists.sourceforge.net/lists/listinfo/modules-interest
>
>
> _______________________________________________
> Modules-interest mailing list
> Mod...@li...
> https://lists.sourceforge.net/lists/listinfo/modules-interest
>
> _______________________________________________
> Modules-interest mailing list
> Mod...@li...
> https://lists.sourceforge.net/lists/listinfo/modules-interest
_______________________________________________
Modules-interest mailing list
Mod...@li...
https://lists.sourceforge.net/lists/listinfo/modules-interest
|