[Rdkit-devel] lapack dependency finally gone
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Greg L. <gre...@gm...> - 2009-10-30 19:52:25
|
Dear all, This morning I finally figured out how to get rid of the lapack dependencies in the RDKit in a reasonable (and backwards compatible) manner. Lapack was being used to generate eigenvalues of the distance matrix for molecules and fragments of molecules. These eigenvalues are used, along with the Balaban J index, as invariants to distinguish molecular subgraphs from each other in the Subgraphs and FragCatalog code. I've replaced the invariant calculation completely and now use a hashing approach with the Morgan algorithm similar to that used in the MorganFingerprint code. The new, simpler, code is in $RDBASE/Code/GraphMol/Subgraphs/SubgraphUtils.cpp. There's not much in the way of documentation in the new code and it could definitely stand to be cleaned up, but the big step is taken. A particularly pleasant side effect of all this is that the findUniqueSubgraphs family of functions are now substantially faster (about a factor of three). The fragment catalog stuff should also be a lot faster, but I haven't done the benchmarking there yet. The last remaining lapack dependency is in the regression tests for the power eigenvalue solver; I'll get rid of this in the next few days, clean up the build system, and remove all traces of lapack from the code; that's easy stuff. This one has been nagging for a while... it's nice to have it done. -greg |