|
From: Craig A. J. <cj...@em...> - 2009-03-19 14:52:32
|
ern...@ba... wrote: > Can somebody point me to information what additional info could be > precomputed and stored in order to help the SMARTS matcher? > > Frankly I don't really understand the code, but i see that it makes calls > like IsAromatic(), CountRingBonds(), IsInRing(), KBOSum() etc. > > None of those are explicitly calculated e.g. in MDLFormat. Are these > properties calculated lazy on first request? And which of them could be > calculated and stored upfront? You may make small gains by storing more information about the molecule, but the real problem with the SMARTS matcher is that it's inherently slow, because of its design. There are two tricks to speeding up a SMARTS match: Determine the symmetry of the target molecule, and determine the symmetry of the SMARTS pattern. This divides atoms up into symmetrically-equivalent classes, which allows the matcher to avoid walking down paths that are symmetrically equivalent, and only pursue possible matches that haven't been tried before. The SMARTS matcher doesn't do this very well. In fact, as far as I can tell, it doesn't take symmetry into account at all! It's hard to believe, so maybe I overlooked something. But the SMARTS matcher seems to just do a brute-force atom-by-atom attempt, without any regard to the symmetry classes of the atoms, and it definitely doesn't try to take the symmetry classes of the SMARTS into account. So, SMARTS matches are slow because it's the most brute-force algorithm possible. The parsmarts.cpp class does a good job of implementing the language (the atom expressions and recursive-SMARTS features can be VERY tricky to get right), but it can never be fast until it at least takes the atom symmetry classes into account. Craig |