[octet-devel] Demystify 'in silico' cheminformatics and feature calculation identifier hashing
Status: Alpha
Brought to you by:
r_apodaca
From: Joerg K. W. <we...@in...> - 2004-12-29 11:17:28
|
Hi all, Cross-Posting -> please reply ONLY TO the following lists, if your reply affects not your library and software package: joe...@li... and qsa...@li... As you all know the 'in silico' chemistry is pretty complex and i've now implemented a trace back routine to identify how complex. This routine was implemented for JOELib2 (not JOELib1, so use: cvs co joelib2). The complexity is more or less analogue to OpenBabel, since both libraries uses the same SMARTS-matching and text definition files. As already discussed several times, we had serious problems to trace back algorithms (chemical expert systems) and feature calculation algorithms, because the complex dependencies trees hidden to the users. The only way to get a feeling for those dependencies was to read the source code. This is even more serious for using algorithms depending on the underlying expert systems, like file conversion (force fields), QSAR descriptor (feature) calculation, SMARTS substructure search, and so on ... Until now we were unable to formalize and assign a unique version number to the applied algorithms. Hence this may be critical for QSAR, because one single change in the aromaticity typer (source code or text definition file) forwards those changes to all depending algorithms. But which one depends on the aromaticity, and affects this the algorithm of interest? Yes, definitely yes! Now, for JOELib2 we can calculate a hashed dependency tree identifier for each expert system, feature calculation algorithm and this version identifier is calculated automatically using the source code CVS tags, which are unique! And i'm pretty proud to present this hack, because even with Java reflection this was tricky. Here is an admonitory remark for the new year 2005 to all users: 'Never mess around with chemical expert systems, and never believe anyone telling you that a fingerprint is a simple 1D descriptor for screening.' Why not? 1. '1D': As you can see the calculation requires the full chemical expert system and this is at least a 2D dependency, 1D is only valid for primitive elemental counts, without any neighborhood. 2. 'simple': even if you apply a 'simple' Query (e.g. SMARTS) matching task, the underlying complexity is far away from being simple (requires 47 algorithms and feature calculation methods for 'SSKey3DS'). Here are two small examples and the full dependency tree can be calculated with JOELib2, 'sh joelibKernel.sh' (requires Java 1.5) and is available online: http://www-ra.informatik.uni-tuebingen.de/software/joelib/KernelLog.txt Now, the examples: 1. chemical expert system identifier: jk:k-1566956551:softDependencies:joelib2.data.AtomTyper (992650183) joelib2.data.AtomTyper http://joelib.sf.net joelib2/data/plain/atomtype.txt 1.1.1.1 2004-12-06_15-33-18 2. dependency for simple a 'simple' paharmacophore fingerprint: dependency class is: joelib2.feature.types.SSKey3DS dependency version hash code is: 1113779298 (including chemistry kernel hash: -1566956551) dependency algorithm complexity is at least: 47 (+ basic user input graph, + recursive dependencies, + data structure dependencies, + forgotten dependencies) class joelib2.feature.types.SSKey3DS(version 1.4) depends on class joelib2.feature.types.count.AromaticBonds(version 1.2) depends on class joelib2.feature.types.atomlabel.AtomIsHydrogen(version 1.2) class joelib2.feature.types.bondlabel.BondInAromaticSystem(version 1.3) depends on class joelib2.data.AromaticityTyper(version 1.4) depends on class joelib2.smarts.SMARTSPattern(version 1.3) depends on class joelib2.smarts.SMARTSParser(version 1.4) depends on (WARNING: recursively defined dependencies) class joelib2.feature.types.atomlabel.AtomBondOrderSum class joelib2.feature.types.atomlabel.AtomExplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomHeavyValence class joelib2.feature.types.atomlabel.AtomHybridisation class joelib2.feature.types.atomlabel.AtomImplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomImplicitValence class joelib2.feature.types.atomlabel.AtomInAromaticSystem class joelib2.feature.types.atomlabel.AtomInRing class joelib2.feature.types.atomlabel.AtomInRingsCount class joelib2.feature.types.atomlabel.AtomIsElectronegative class joelib2.feature.types.atomlabel.AtomIsHydrogen class joelib2.feature.types.atomlabel.AtomKekuleBondOrderSum class joelib2.feature.types.bondlabel.BondInAromaticSystem class joelib2.feature.types.bondlabel.BondInRing class joelib2.feature.types.atomlabel.AtomHybridisation(version 1.3) depends on class joelib2.data.AtomTyper(version 1.4) depends on class joelib2.smarts.SMARTSPattern(version 1.3) depends on class joelib2.smarts.SMARTSParser(version 1.4) depends on (WARNING: recursively defined dependencies) class joelib2.feature.types.atomlabel.AtomBondOrderSum class joelib2.feature.types.atomlabel.AtomExplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomHeavyValence class joelib2.feature.types.atomlabel.AtomHybridisation class joelib2.feature.types.atomlabel.AtomImplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomImplicitValence class joelib2.feature.types.atomlabel.AtomInAromaticSystem class joelib2.feature.types.atomlabel.AtomInRing class joelib2.feature.types.atomlabel.AtomInRingsCount class joelib2.feature.types.atomlabel.AtomIsElectronegative class joelib2.feature.types.atomlabel.AtomIsHydrogen class joelib2.feature.types.atomlabel.AtomKekuleBondOrderSum class joelib2.feature.types.bondlabel.BondInAromaticSystem class joelib2.feature.types.bondlabel.BondInRing class joelib2.feature.types.atomlabel.AtomImplicitValence(version 1.3) depends on class joelib2.data.AtomTyper(version 1.4) depends on class joelib2.smarts.SMARTSPattern(version 1.3) depends on class joelib2.smarts.SMARTSParser(version 1.4) depends on (WARNING: recursively defined dependencies) class joelib2.feature.types.atomlabel.AtomBondOrderSum class joelib2.feature.types.atomlabel.AtomExplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomHeavyValence class joelib2.feature.types.atomlabel.AtomHybridisation class joelib2.feature.types.atomlabel.AtomImplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomImplicitValence class joelib2.feature.types.atomlabel.AtomInAromaticSystem class joelib2.feature.types.atomlabel.AtomInRing class joelib2.feature.types.atomlabel.AtomInRingsCount class joelib2.feature.types.atomlabel.AtomIsElectronegative class joelib2.feature.types.atomlabel.AtomIsHydrogen class joelib2.feature.types.atomlabel.AtomKekuleBondOrderSum class joelib2.feature.types.bondlabel.BondInAromaticSystem class joelib2.feature.types.bondlabel.BondInRing class joelib2.feature.types.atomlabel.AtomInRing(version 1.3) depends on class joelib2.ring.RingDetector(version 1.3) class joelib2.feature.types.atomlabel.AtomIsHydrogen(version 1.2) class joelib2.feature.types.bondlabel.BondInRing(version 1.2) depends on class joelib2.ring.RingDetector(version 1.3) class joelib2.feature.types.bondlabel.BondIsClosure(version 1.2) class joelib2.feature.types.atomlabel.AtomIsHeteroatom(version 1.2) depends on class joelib2.feature.types.atomlabel.AtomIsHalogen(version 1.3) class joelib2.ring.RingFinderSSSR(version 1.2) depends on class joelib2.feature.types.atomlabel.AtomInRing(version 1.3) depends on class joelib2.ring.RingDetector(version 1.3) class joelib2.feature.types.bondlabel.BondInRing(version 1.2) depends on class joelib2.ring.RingDetector(version 1.3) class joelib2.feature.types.bondlabel.BondIsClosure(version 1.2) class joelib2.feature.types.FractionRotatableBonds(version 1.3) depends on class joelib2.feature.types.atomlabel.AtomIsHydrogen(version 1.2) class joelib2.feature.types.bondlabel.BondIsRotor(version 1.3) depends on class joelib2.feature.types.atomlabel.AtomHeavyValence(version 1.3) depends on class joelib2.feature.types.atomlabel.AtomIsHydrogen(version 1.2) class joelib2.feature.types.atomlabel.AtomHybridisation(version 1.3) depends on class joelib2.data.AtomTyper(version 1.4) depends on class joelib2.smarts.SMARTSPattern(version 1.3) depends on class joelib2.smarts.SMARTSParser(version 1.4) depends on (WARNING: recursively defined dependencies) class joelib2.feature.types.atomlabel.AtomBondOrderSum class joelib2.feature.types.atomlabel.AtomExplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomHeavyValence class joelib2.feature.types.atomlabel.AtomHybridisation class joelib2.feature.types.atomlabel.AtomImplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomImplicitValence class joelib2.feature.types.atomlabel.AtomInAromaticSystem class joelib2.feature.types.atomlabel.AtomInRing class joelib2.feature.types.atomlabel.AtomInRingsCount class joelib2.feature.types.atomlabel.AtomIsElectronegative class joelib2.feature.types.atomlabel.AtomIsHydrogen class joelib2.feature.types.atomlabel.AtomKekuleBondOrderSum class joelib2.feature.types.bondlabel.BondInAromaticSystem class joelib2.feature.types.bondlabel.BondInRing class joelib2.feature.types.count.HBA1(version 1.2) depends on class joelib2.smarts.ProgrammableAtomTyper(version 1.4) depends on class joelib2.smarts.SMARTSPattern(version 1.3) depends on class joelib2.smarts.SMARTSParser(version 1.4) depends on (WARNING: recursively defined dependencies) class joelib2.feature.types.atomlabel.AtomBondOrderSum class joelib2.feature.types.atomlabel.AtomExplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomHeavyValence class joelib2.feature.types.atomlabel.AtomHybridisation class joelib2.feature.types.atomlabel.AtomImplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomImplicitValence class joelib2.feature.types.atomlabel.AtomInAromaticSystem class joelib2.feature.types.atomlabel.AtomInRing class joelib2.feature.types.atomlabel.AtomInRingsCount class joelib2.feature.types.atomlabel.AtomIsElectronegative class joelib2.feature.types.atomlabel.AtomIsHydrogen class joelib2.feature.types.atomlabel.AtomKekuleBondOrderSum class joelib2.feature.types.bondlabel.BondInAromaticSystem class joelib2.feature.types.bondlabel.BondInRing class joelib2.feature.types.count.HBA2(version 1.2) depends on class joelib2.smarts.ProgrammableAtomTyper(version 1.4) depends on class joelib2.smarts.SMARTSPattern(version 1.3) depends on class joelib2.smarts.SMARTSParser(version 1.4) depends on (WARNING: recursively defined dependencies) class joelib2.feature.types.atomlabel.AtomBondOrderSum class joelib2.feature.types.atomlabel.AtomExplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomHeavyValence class joelib2.feature.types.atomlabel.AtomHybridisation class joelib2.feature.types.atomlabel.AtomImplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomImplicitValence class joelib2.feature.types.atomlabel.AtomInAromaticSystem class joelib2.feature.types.atomlabel.AtomInRing class joelib2.feature.types.atomlabel.AtomInRingsCount class joelib2.feature.types.atomlabel.AtomIsElectronegative class joelib2.feature.types.atomlabel.AtomIsHydrogen class joelib2.feature.types.atomlabel.AtomKekuleBondOrderSum class joelib2.feature.types.bondlabel.BondInAromaticSystem class joelib2.feature.types.bondlabel.BondInRing class joelib2.smarts.SMARTSPattern(version 1.3) depends on class joelib2.smarts.SMARTSParser(version 1.4) depends on (WARNING: recursively defined dependencies) class joelib2.feature.types.atomlabel.AtomBondOrderSum class joelib2.feature.types.atomlabel.AtomExplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomHeavyValence class joelib2.feature.types.atomlabel.AtomHybridisation class joelib2.feature.types.atomlabel.AtomImplicitHydrogenCount class joelib2.feature.types.atomlabel.AtomImplicitValence class joelib2.feature.types.atomlabel.AtomInAromaticSystem class joelib2.feature.types.atomlabel.AtomInRing class joelib2.feature.types.atomlabel.AtomInRingsCount class joelib2.feature.types.atomlabel.AtomIsElectronegative class joelib2.feature.types.atomlabel.AtomIsHydrogen class joelib2.feature.types.atomlabel.AtomKekuleBondOrderSum class joelib2.feature.types.bondlabel.BondInAromaticSystem class joelib2.feature.types.bondlabel.BondInRing And a Happy New Year to all ! Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |