Hello Eric,
I would suggest you set the "cache_buffer_bytes" configuration option to
its max (1000000) to see if it improves the cache read performances.
Currently the cache file is there to reduce the number of IO operations to
its minimum (1 file read to get everything). But it mandates to read this
whole file to get the information to access a specific module.
We have several thousands of modulefiles in our setup and this is working
like a charm. When facing ~10k modulefiles you say you get around ~200ms to
load a module, which is not so bad I think.
I do not plan to change the cache strategy to have a different cache
information and mechanism with another kind of cache file. But I see
different improvements that could be done:
* maybe if we compress the modulefile content when saving it into the cache
file, it could help to speed the evaluation of large cache file
* a configuration option may be added to list the module commands that
ignore the cache file
With this second point, your "module load" will skip reading the cache, as
your setup is more performant without it. But avail will still use this
information (unless if also listed in the configuration option).
Regards,
Xavier
Le mer. 15 janv. 2025 à 19:10, Miller, Eric via Modules-interest <
mod...@li...> a écrit :
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Thanks Xavier. That ticket captures the issue we see when creating the
cachefile in a tree with many non-modulefiles.
>
> I think there is a secondary performance feature that could be useful,
even in modulefile-only trees.
>
> In the case where the MODULEPATH contains hundreds of packages,
encompassing many thousands of modulefiles, the cache file still can be
many MB in size.
> In this scenario a module load package/x.y.z can be slower when using a
cache file than without. Loading and interpreting the entire cache file
takes longer than using the filesystem.
>
> To improve performance, I think indexes are needed.
>
> One index would map the package name to a list of discovered versions.
This index would be used to resolve range specifiers and show avail results
very quickly.
>
> The other index would map the package/version to the resolved path of the
modulefile, or perhaps the byte offset of the entry in the cache file.
>
> Ideally the index file(s) would be structured-access files (ex. binary
search or nearest neighbor hashing). This approach would eliminate
reading/interpreting the entire file.
>
> I realize these are fairly esoteric asks. But in large-scale systems even
small performance gains can add up to noticeable improvements.
>
> Thanks again for all the constructive discussion and suggestions.
>
> Eric Miller
>
> -----Original Message-----
> From: Xavier Delaruelle <xav...@gm...>
> Sent: Wednesday, January 15, 2025 12:34 AM
> To: Environment Modules usage and discussion. <
mod...@li...>
> Subject: Re: [Modules] Prevent modulefile search from looking at all
files in directory
>
> Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.
>
>
> Hello Eric,
>
> I have created another issue regarding the cache mechanism (
https://github.com/envmodules/modules/issues/562).
>
> It would be interesting to see if the ignore/only mechanism, described on
the other ticket, may improve the cache performances.
>
> Regards,
> Xavier
>
> Le mar. 14 janv. 2025 à 16:34, Miller, Eric via Modules-interest <
mod...@li...> a écrit :
> >
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > Thanks Xavier - The github issue captured the request correctly.
> >
> > I'll check into sponsorship. Probably the response will be "stick with
3.2.10 because it works".
> > Or, if I have time perhaps I can work up a patch.
> >
> > I'd also like to see the cachefile indexed for faster loading. As
currently implemented there are common situations where using the cache
actually slows things down.
> > If the cache was more performant, that might provide incentive to
upgrade from 3.2.10.
> >
> > Thank you all for the discussion. At a minimum the issue has been
confirmed and my workaround have been validated.
> >
> > Eric Miller
> >
> > PS. Just a quick note. This does not work as you would expect: [file
> > normalize [file readlink $ModulesCurrentModulefile]]
> > See:
> > https://wiki.tcl-lang.org/page/file+normalize#:~:text=Resolving%20syml
> > inks%20in%20the%20last%20component%20of%20a%20path
> > Tcl is weird.
> >
> >
> > -----Original Message-----
> > From: Xavier Delaruelle <xav...@gm...>
> > Sent: Monday, January 13, 2025 11:41 PM
> > To: Environment Modules usage and discussion.
> > <mod...@li...>
> > Subject: Re: [Modules] Prevent modulefile search from looking at all
> > files in directory
> >
> > Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.
> >
> >
> > Hello Eric,
> >
> > Le lun. 13 janv. 2025 à 19:48, Miller, Eric via Modules-interest <
mod...@li...> a écrit :
> > > The symlink approach requires modifying files that use the
$ModulesCurrentModulefile variable.
> >
> > The following Tcl commands will help to get back the real path: [file
> > normalize [file readlink $ModulesCurrentModulefile]]
> >
> > > For testing purposes I just replaced the variable with the full path
to the original file.
> > >
> > > I tested the modulefile-only tree by loading a package having 435
versions and module load package/1.2.3, which resolves to an actual version
file.
> > > 3.2.10: 70ms
> > > 5.5: 570ms
> > > OS disk caching (i.e. repeated runs) knocks that down to 50ms and
150ms respectively.
> > >
> > > This is better than using the release repository for the module path,
but still too slow to justify switching from 3.2.10.
> >
> > I think introducing an option to avoid looking at adjacent files would
be interesting for your setup.
> >
> > I have thought about Paul's suggestion on file extension: as it would
help with non-modulefiles I think it will not solve the issue in case of
large number of adjacent version files.
> >
> > I have created an issue to track this feature request:
> > https://github.com/envmodules/modules/issues/561
> >
> > I would be glad to get some sponsoring to work on this. You can contact
me directly to exchange about that.
> >
> > Regards,
> > Xavier
> >
> > > The modulecache file is 7.3 MB and loads take ~220ms, regardless of
OS caching.
> > > For packages with a small number of versions, using the cache file is
much slower.
> > >
> > > Eric
> > >
> > > -----Original Message-----
> > > From: Paul Markfort <pau...@gm...>
> > > Sent: Monday, January 13, 2025 10:13 AM
> > > To: mod...@li...
> > > Subject: Re: [Modules] Prevent modulefile search from looking at all
> > > files in directory
> > >
> > > Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.
> > >
> > >
> > > First Make sure you read this (if you haven't already):
> > > https://modules.readthedocs.io/en/stable/modulefile.html#locating-mo
> > > du
> > > lefiles
> > >
> > > An option you might try is to make a module tree that is simply
populated with links to the actual module files. And point MODULEPATH to
that tree. If that tree were structured exactly the same as your current
tree (but only containing the modulefiles); the modulefiles would not have
to be rewritten.
> > >
> > >
> > > I suspect, if modules could be told to restrict modulefiles to a
particular file extension (what is the extension used for the windows
version of modules?), the processing would speed up considerably (as it
would not have to open every file to check if it is a module file). I
should point out this is NOT a current feature - just asking if such an
OPTIONAL feature would even be useful.
> > >
> > > Assuming that extension were .modfl (It should probably be
different), You would not have to change the name of the modulefile, just
add the extension to the name (the extension wouldn't be needed when
referencing the file - as it would be assumed).
> > >
> > > So the module file would be:
> > > /release_path/package_name/version_directory/modulefile.modfl
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 2025-01-13 10:47 AM, Miller, Eric via Modules-interest wrote:
> > > > [AMD Official Use Only - AMD Internal Distribution Only]
> > > >
> > > > Hi Xavier-
> > > >
> > > > Thank you for the quick reply.
> > > >
> > > > You are correct, we do see poor performance when running "module
avail" under 3.2.10.
> > > > However, we never use that command. Perhaps we would if the
performance was better.
> > > >
> > > > It is not a hardware issue, but rather a scaling issue. We have
thousands of software packages with millions of non-modulefile files across
the version directories.
> > > > All our packages are structured as:
> > > > /release_path/package_name/version_directory/modulefile
> > > >
> > > > For now we are specifying the full path just as you recommended in
the second email, but this approach is just a short term fix.
> > > >
> > > > Another workaround we have tried is isolating the modulefiles from
the release repository into a standalone module_path. However, this
requires us to rewrite all our existing modulefiles (~10k).
> > > > I'll continue to do experiments in this area. Perhaps we can
automate the conversion and add it to our release process.
> > > >
> > > > The cache currently has 2 issues that prevent us from using it with
our current setup:
> > > > 1. Creating the cache is slow and uses lots of memory. I killed
cachebuild after running for 2.5hrs and using 81GB of memory on modern
enterprise hardware.
> > > > 2. Even if the cache is created, using it is still somewhat
slow. Removing the module-invalid commands in the cache file helps, but is
still slower than the search behavior in 3.2.10.
> > > >
> > > > I would ask for the following enhancements to bring the search
performance back to parity with version 3 (or even improve on it):
> > > > 1. Grant the ability to tune the modulefile search so it does
not inspect all files in a version directory.
> > > > 2. If there is an exact path match to the search specifier, use
it without searching or inspecting the modulefiles.
> > > > 3. Improve the cache file search performance and memory usage
(i.e. do not load the entire cache into memory). This can be tricky.
> > > >
> > > > Regards,
> > > > Eric Miller
> > > >
> > > > -----Original Message-----
> > > > From: Xavier Delaruelle <xav...@gm...>
> > > > Sent: Sunday, January 12, 2025 11:58 PM
> > > > To: Environment Modules usage and discussion.
> > > > <mod...@li...>
> > > > Subject: Re: [Modules] Prevent modulefile search from looking at
> > > > all files in directory
> > > >
> > > > Caution: This message originated from an External Source. Use
proper caution when opening attachments, clicking links, or responding.
> > > >
> > > >
> > > > I just remind of an alternative way which is to refer to the module
by its full path name:
> > > >
> > > > module load /path/to/ModulePath/MyModule/1.0/modulefile
> > > >
> > > > This way, no other files will be scanned/evaluated.
> > > >
> > > > This solution may help you to find the speed you expect.
> > > >
> > > > Regards,
> > > > Xavier
> > > >
> > > > Le dim. 12 janv. 2025 à 20:40, Xavier Delaruelle <
xav...@gm...> a écrit :
> > > >>
> > > >> Hello Eric,
> > > >>
> > > >> Best solution I can think of is using module cache.
> > > >>
> > > >> It is a bit slower but it should not be noticeable. If it does, I
> > > >> would suggest to look at the performance of the underlying
> > > >> storage system. If there is no issue with the storage system, I
> > > >> would be happy to get some debugging output in --timer mode.
> > > >>
> > > >> With large number of non-modulefiles in modulepaths, load is
> > > >> slower on newer version of Modules than 3.2 due to collecting all
> > > >> module symbols that applied to loading module. Even if load is
> > > >> efficient with this kind of setup on version 3.2, bad performance
> > > >> should be observed on "module avail".
> > > >>
> > > >> Regards,
> > > >> Xavier
> > > >>
> > > >> Le ven. 10 janv. 2025 à 16:59, Miller, Eric via Modules-interest
> > > >> <mod...@li...> a écrit :
> > > >>>
> > > >>> [AMD Official Use Only - AMD Internal Distribution Only]
> > > >>>
> > > >>>
> > > >>> My question: Is there a way to optimize the modulefile search
algorithm for cases where the directory contains many files that are NOT
modulefiles?
> > > >>>
> > > >>>
> > > >>>
> > > >>> Background:
> > > >>>
> > > >>>
> > > >>>
> > > >>> Given a directory structure like:
> > > >>>
> > > >>>
> > > >>>
> > > >>> ModulePath/MyModule/1.0/
> > > >>>
> > > >>> ModulePath/MyModule/1.1/
> > > >>>
> > > >>> ...
> > > >>>
> > > >>>
> > > >>>
> > > >>> And given that each version directory contains one file named
"modulefile" at the root of the directory and also contains a large number
of other subdirectories and files.
> > > >>>
> > > >>>
> > > >>>
> > > >>> The command: "module load MyModule/1.0" will be very slow.
> > > >>>
> > > >>>
> > > >>>
> > > >>> The root cause appears to be that the modulefile search algorithm
stats and reads every file in each of the ModulePath/MyModule/* directories.
> > > >>>
> > > >>> Going back to modulecmd 3.2.10 we do not see this issue.
> > > >>>
> > > >>>
> > > >>>
> > > >>> I've tried setting MODULES_MCOOKIE_CHECK=eval, but that caused
errors when the loader encountered non-modulefile files.
> > > >>>
> > > >>>
> > > >>>
> > > >>> I've also tried using the cachebuild command, and while it helps,
it is still slower than 3.2.10.
> > > >>>
> > > >>> The cachefile also appears to be tied to that specific version of
modulecmd due to the header: #%Module5.5.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>> Eric Miller
> > > >>>
> > > >>> _______________________________________________
> > > >>> Modules-interest mailing list
> > > >>> Mod...@li...
> > > >>> https://lists.sourceforge.net/lists/listinfo/modules-interest
> > > >
> > > >
> > > > _______________________________________________
> > > > Modules-interest mailing list
> > > > Mod...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/modules-interest
> > > >
> > > > _______________________________________________
> > > > Modules-interest mailing list
> > > > Mod...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/modules-interest
> > > >
> > >
> > > --
> > > --------------------------------------------------------
> > > The views and opinions expressed above are strictly those of the
author(s). The content of this message has not been reviewed nor approved
by any entity whatsoever.
> > > --------------------------------------------------------
> > > Paul FM Info: http://paulfm.com/~paulfm/
> > > --------------------------------------------------------
> > >
> > >
> > > _______________________________________________
> > > Modules-interest mailing list
> > > Mod...@li...
> > > https://lists.sourceforge.net/lists/listinfo/modules-interest
> > >
> > >
> > > _______________________________________________
> > > Modules-interest mailing list
> > > Mod...@li...
> > > https://lists.sourceforge.net/lists/listinfo/modules-interest
> >
> >
> > _______________________________________________
> > Modules-interest mailing list
> > Mod...@li...
> > https://lists.sourceforge.net/lists/listinfo/modules-interest
> >
> > _______________________________________________
> > Modules-interest mailing list
> > Mod...@li...
> > https://lists.sourceforge.net/lists/listinfo/modules-interest
>
>
> _______________________________________________
> Modules-interest mailing list
> Mod...@li...
> https://lists.sourceforge.net/lists/listinfo/modules-interest
>
> _______________________________________________
> Modules-interest mailing list
> Mod...@li...
> https://lists.sourceforge.net/lists/listinfo/modules-interest
|