From: Egon W. <ewi...@un...> - 2007-02-08 19:08:04
|
Hi all, pending the results we might see from Nina on the performance boost by the recent AllRingsFinder patch (using the SpanningTree to isolate ring systems), recalculation of the fingerprints for the 3D model builder give some interesting statistics too. The set contains 10759 ring templates, for each of which a fingerprint is calculated, which takes about 82 minutes. This is how this breaks down (TCT=total computation time): Calculating the fingerprint for: - 60% of the molecules in <10 ms (1.3% TCT) - 80% of the molecules in <20 ms (2.2% TCT) - 88% of the molecules in <30 ms (2.7% TCT) - 92% of the molecules in <40 ms (3.0% TCT) - 95% of the molecules in <70 ms (3.4% TCT) - 99% of the molecules in <300 ms (5% TCT) - 23 molecules take more than 10 seconds (92% TCT) - 9 molecules take more than 1 minute (84% TCT) - 8 molecules take [1-10> minutes - 1 molecule takes 35 minutes (43% TCT) - will report on those slow structures later, I hope... This is quite comparable with the results generated by Nina in her CDK News article in June 2005. This was done on a "Intel(R) Pentium(R) 4 CPU 2.80GHz"... Same Ghz as Nina, Pentium too. So, hereby the first Law of Chemoinformatics: "99% of chemoinformatics takes 5% of the effort." (If you prefer it the other way around) The Second Law of Chemoinformatics: "95% of the effort is spent on 1% of chemoinformatics". I'm extrapolating these computation time results to programming efforts, if you don't mind :) You might note that the maximal time recorded by Nina, in the article, was about 18 seconds for a compound with 24 cyclic bonds, while it is 35 minutes here... I am not sure about the number of cyclic bonds in that structures, but one compound with 43 cyclic bonds took 36 seconds, indicating that the 35 minutes molecule might be considerably more complex in number of possible rings than the slowest molecule in the article. Egon -- CUBIC blog: http://chem-bla-ics.blogspot.com/ |