From: Joerg K. W. <we...@in...> - 2004-04-28 09:03:54
|
Hello Rich, > What are the advantages of an Octet Atom inheriting from Node? The definition of the Atom interface is very short and contains mainly methods for identifying neighboring atoms and bonding systems. Octet doesn't use Bond, but rather BondingSystem, which allows for the connection of any number of Atoms using any number of electrons so that structures like ferrocene and transition metal complexes can be handled the same way as any purely organic molecule. I've also some (internal) code for maximum common substructure search and for such cases we work on abstract nodes. Of course, thes can use typical atom labels, but also a lot of other stuff. So for general graph algorithms, i would prefer a more abstract interface. > I thought about being able to store keys, properties, etc. in Atom, Molecule. However, since the design of Octet is based on the implementation of interfaces, doing so puts a burden on the implementor to provide this functionality. Marking atoms, bonding systems, and molecules could just as easily be done externally to those interfaces using a vector of visited atoms, for example. In fact, Octet uses this approach in, for example, the DepthFirstTraverser class. Is there something else I'm missing? Let's iterate ... > Do you consider AtomPair a "descriptor"? I noticed it is present in JOELib. Octet also has an AtomPair interface. However, in Octet, AtomPair simply represents an association between two atoms (no electrons involved - that happens through BondingSystem). To find all the atoms that are associated in a Molecule, use Molecule.iterateAtomPairs(). Your point about hashCode() is well-taken. Not i'm taking it as a descriptor. IT IS a descriptor in medicinal chemistry community. So i've no problem with this definition, but there are already some papers out there which uses the same name ! You can generalize, if you add distance and occurence variables. But here also a atom can be a more abstract label than a simple atom, e.g. a mix of special atom properties. So i would prefer a redesign or at least another name. > Your point about copy() clone() is also well-taken. However, this can't be forced through the interface definition but can be incorported into the reference implementations. i know. > Can you give me an example of the readAsString() method and its advantages in handling corrupted file entries compared to just throwing an exception with the existing MoleculeReader methods? You're right about these methods needing to declare an exception. E.g.: start reading a file normally with LineReader. When finding a corrupted line all things are skipped or you allow special skipping rules, which are difficult to handle. Allowing to read a full molecule entry from start-to-end-tag is bad in runtime, but pretty easy to implement. So for convertSkip.sh i simply load all molecules as String, then parse to molecule. Corrupted entries can be saved in a skip file without knowing the error and cancelling reading. Of course a error/warning is written, so if a company converts 100000 molecules, they can convert all at once and have later a look at the corrupted entries in the skip file. Otherwise you force them to correct every error by hand. I promise, if you do so they will flame you. > I'm currently working on implementing some of the other features you asked about such as a descriptor framework, substructure/similarity searching. However, these features are independent of the interface definitions for the key model-level objects (Molecule, BondingSystem, and AtomPair). I've had a look at JOELib's descriptor framework, and it looks like a flexible way implement descriptor functionality. Yes of course. See joelib/desc, joelib/math/similarity and the helper classes at joelib/util Take what you need. > Can you explain what a "descriptor IO helper class" is and why it is necessary? There are three relevant main classes: 1. DescriptorFactory-Factory pattern: Load calculation class for a descriptor by it's name. No caching at the moment, but clear() methods already available. 2. DescriptorHelper: Allows you to load/calculate descriptors. E.g. i'm interested in 'PSA' i say descFromMol(mol,"PolarSurfaceArea", true); If it already exists, just return it. If not, calculate it and add it imediately to the molecule (caching). This is important for expensive matrix or array descriptors to avoid calculating them several times. 3. ResultFactory-Parser factory: If they descriptor type was not already assigned in the loading process (only partial CML, because it stores SOME types explicitely), so we need to map the unparsed descriptors to a data type, e.g. int, double, int matrix, boolean array, atom pair, ... The mapping can be defined by name or regular expression in joelib/desc/data/plain/extKnownResults.txt, so we are able to load also external descriptors from other programs, like MolConnZ, Petra, MOE, Dragon, whatever ... The only thing we need is the mapping and Java reflection !!! Special IO functionalities, e.g. CML properties, like delimiter, ... are directly stored in each descriptor result class. Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: E.L. W. <eg...@sc...> - 2004-04-29 07:01:23
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 28 April 2004 11:05, Joerg K. Wegner wrote: > > Do you consider AtomPair a "descriptor"? I noticed it is present in > > JOELib. Octet also has an AtomPair interface. However, in Octet, AtomPa= ir > > simply represents an association between two atoms (no electrons involv= ed > > - that happens through BondingSystem). To find all the atoms that are > > associated in a Molecule, use Molecule.iterateAtomPairs(). Your point > > about hashCode() is well-taken. > > Not i'm taking it as a descriptor. IT IS a descriptor in medicinal > chemistry community. So i've no problem with this definition, but there > are already some papers out there which uses the same name ! > You can generalize, if you add distance and occurence variables. But > here also a atom can be a more abstract label than a simple atom, e.g. a > mix of special atom properties. > So i would prefer a redesign or at least another name. I don't think anyone can restrict atom pair to match a specific descriptor.= =2E.=20 it's something like making "windows" a registered trade mark. "Atom pair" i= s=20 a general term and cannot be restricted to just denote on descriptor. =46ortunately, we will have dictionaries... where the descriptor "AtomPair"= can=20 be explaned...=20 Egon =2D --=20 eg...@sc... PhD on Molecular Representation in Chemometrics Nijmegen University http://www.cac.sci.kun.nl/people/egonw/ GPG: 1024D/D6336BA6 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (SunOS) iD8DBQFAkKg4d9R8I9Yza6YRAnBEAJ45WFMUlkBrYxAII5wR/om+hD/obgCguvDt Rt5MxFFsSquEA8lwT1dkKzU=3D =3Do4ja =2D----END PGP SIGNATURE----- |
From: Joerg K. W. <we...@in...> - 2004-04-29 09:18:42
|
Hi all, > I don't think anyone can restrict atom pair to match a specific descriptor... > it's something like making "windows" a registered trade mark. "Atom pair" is > a general term and cannot be restricted to just denote on descriptor. It's, as i said, a object oriented design question. I prefer as most general objects as possible, so if the actual AtomPair supports also a distance variable (must not be 1) and an occurence count variable i like only one object. If you think this is overhead for all (also primitive) Atom-BondWhatever-Atom definitions (greater space complexity) i would prefer two different inherited interfaces ? Nikolas, any comments ?:-) > Fortunately, we will have dictionaries... where the descriptor "AtomPair" can > be explaned... Which one ? Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |