From: Joos K. <jo...@su...> - 2012-02-06 14:47:51
|
Hi Egon, About substructure search: Basic principle is simple. First screens based on Fingerprint, then uses UIT for Subgraph matching. The user can configure any implementation of IFingerprinter. "Optimizations:" - You can specify the number of search threads (eg. threads running UIT) so one could make use of modern CPUs. - You can configure to keep the molecules in memory (in the format used in OrChem = as Strings). - limit maximum number of hits - Results are immediately available so the first search hits can already be displayed while search continues in background - Search can be limited to a list of IDs (id = primary key column). The idea behind this is, if you search for substructure + property you limit the substructure search to structures matching the property value. Best Regards, Joos Am 06.02.2012 14:17, schrieb Egon Willighagen: > Hi Joos, > > On Sun, Feb 5, 2012 at 6:34 PM, Joos Kiener<jo...@su...> wrote: >> for my Master Thesis in computer science (Master of Advanced Studies) I have >> created a simple framework based on the CDK for storing and searching >> chemical structures in a relational database. > How does it optimize substructure searching? > >> Note that newest >> CDK versions won't work as IMolecule interface is used in the framework. >> Will need to change that for next release. > CDK 1.4.x will be the stable release for at least the next 6 months... > Also, the 'master' branch is far from being frozen and we expect more > API changes... just keep that in mind :) > > Egon > |