octet-devel Mailing List for Octet (Page 4)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I found the api you mentioned here: http://www.xml-cml.org/cmldom/htmlDoc/index.html (but I still wasn't able to find MoleculeTool). It's very interesting that interface definitions can be produced from an xml specification.

One of the points I considered when developing Octet was whether the core model-level interfaces such as Atom, AtomPair, and Molecule should be immutable or not. I ended up choosing the "Read-Only Object" variant of the "Immutable" design pattern described by Grand.

I won't go into the many reasons behind this decision here other than to mention that once Molecules are created, there was no situation (other than a graphical structure editor) that I could think of where the need to connect/disconnect atoms, change bond orders, or otherwise tinker with the Molecule's internal state would come up post creation. More importantly, the entire process of working with Molecules becomes simpler, more robust, and less error-prone if immutability of model-level objects can be assumed by clients.

What I noticed from the cml interface definitions is that they define read-write access for every property. This means that no assumptions about Molecule, Atom, or Bond immutability can be made. What would you think about an interface that removes the mutator methods? In Octet, I handled the need to get Molecules created in the first place by writing a concrete implementation of the Molecule interface (BasicMolecule) that has mutator methods, and by using the "Builder" pattern for general molecule construction. Of course, clients can always try to guess the concrete implementation of Molecule it encounters, downcast to it, and use the mutator methods, but they really have to consider if this is a good thing to be doing.

Another thing that struck me when looking at the CMLMolecule, CMLAtom, and CMLBond interface definitions was that they are essentially interfaces to a data structure. If that data structure were to  change for some reason, the resulting refactorings could be somewhat painful. Since I'm a fan of OO programming and encapsulation, I've been conditioned to avoid this kind of situation. What are your thoughts on this?

One more thing: there are features like 2-D and 3-D coordinates that will be unused in many cheminformatics applications but which are defined in the CML interfaces. This means that overhead will be incurred when it might not be necessary. In Octet/Structure, I've handled this with a "pay-as-you-go" approach. The net.sourceforge.octet.molecule.Molecule inteface defines the bare minimal functionality needed to work with molecular graph objects. If 2-D coordinates are desired, then clients can apply the "Decorator" design pattern and use a net.sourceforge.structure.molecule.Molecule2D implementation which is itself a subclass of Molecule with additional 2-D coordinate functionality. What are your thoughts on this kind of approach to defining a Molecule interface?

cheers,
rich

Peter Murray-Rust <pm...@ca...> wrote:
[Reply to QSAR list only]

At 19:03 27/04/2004 -0700, rich apodaca wrote:
>Thanks for your comments, Peter. I'm especially interested in your 
>comments on cml. I've been watching cml at a distance for some time, but I 
>didn't realize you had defined interfaces for molecule and atom behavior. 
>Could you more precisely point me to where these interfaces are? I visited 
>the link you sent but wasn't able to find them.

http://wwmm.ch.cam.ac.uk/moin/ChemicalMarkupLanguage

You will find schema elements for about 100 concepts.
http://wwmm.ch.cam.ac.uk/moin/CmlElements
This list is autogenerated from the schema, so can be updated every time 
the schema is modified. There are similar lists for
http://wwmm.ch.cam.ac.uk/moin/CmlAttributes
and
http://wwmm.ch.cam.ac.uk/moin/CmlSimpleComplexTypes

These are then automatically compiled into target code (Java, C++, Python, 
F90). In Java this results in a (Java) interface for every Element 
including appropriate methods for every attribute. The code obviously 
generates Javadoc. Rather than displaying this we distribute the complete 
system:
http://wwmm.ch.cam.ac.uk/moin/CmlAtNesc
and ask people to generate their own.

I now believe that we should try to define interfaces, etc in XML rather 
than a target language. I am not a fan of UML (costs money) so somewhat 
reluctantly use XMLSchema.

>
>I'm not very familiar with xml, but if I understand correctly, a DOM is 
>used to produce an in-memory representation of the structure of an XML 
>document. Minimally,

Absolutely right

>it provides an exact representation of the content of the XML document. If 
>I'm correct so far, then I imagine that a CML DOM provides an exact 
>representation of the structure of a CML document.
>

Yes.

>In addition to providing an interface to access the data, what behaviors 
>do the CML interfaces define for model-level objects like Atom and 
>Molecule? To me, an example of pure Atom data would be an atom label 
>property, whereas an example of Atom behavior is the capability to report 
>what bonding systems an Atom belongs to and what Atoms it is a neighbor 
>of. The choice of behavior is critical: too much functionality and the 
>interface becomes bloated and hard to understand - too little and 
>developers are frustrated at how much work it takes to do simple things. 
>I'm very interested in knowing what the right balance is.
>

Fully agreed. That is why I have developed a Tool approach. Every element 
has a Tool which adds functionality.Thus Molecule has MoleculeTool. The 
tool has behavioural methods like:
MoleculeTool.getMolecularMass().
MoleculeTool.get2DCentroid().

I originally wrote these in Java but am now starting to develop a 
pseudocode so that the other target languages can be supported. In this 
way we get a complete interface for behaviour which - hopefully - will 
lead to increasingly consistency of implementation

>It sounds like the approach you've taken in using interfaces is similar to 
>mine. Like you, I am keenly interested in taking advantage of the rich 
>functionality of CDK and JOELib. As a first pass, I've been working on a 
>two-way adapter class for CDK. Its definition looks something like this:

See MoleculeTool in our distrib. At present it uses CDK as the engine but 
could easily use JOELib, etc. I am sure this is the right way to go

>
>public class CDKMolecule extends org.openscience.cdk.Molecule
> implements net.sourceforge.octet.molecule.Molecule
>{
> // override org.openscience.cdk.Molecule methods where appropriate
>
> // implement net.sourceforge.octet.molecule.Molecule interface
>}

Yes. Seems reasonable

I tend to use a delegation method:
>public class MoleculeToolImpl implements MoleculeTool {
// body is implementor dependent
org.openscience.cdk.Molecule theMolecule; // used for computation

}

>
>The advantage here is that a CDKMolecule can be used from within either 
>CDK or Octet without the need for a conversion step. I plan to do the same 
>thing for joelib.molecule.JOEMol.

Yes.
I am now starting to use workflow tools (Kepler, Taverna - see sf) and 
these require small atomic units (in the CS sense) It is important that 
their interface to the external world is implementation independent

>
>In particular, it would be helpful to use the file format read/write 
>capabilities of CDK. The problem I'm currently facing is that IO classes 
>such as org.openscience.cdk.io.MDLReader provide their own instance of 
>org.openscience.cdk.Molecule that is created during a call to read(). If 
>this method used an instance of org.openscience.cdk.Molecule passed into 
>the read() method instead, then I could just pass in my CDKMolecule, and 
>the reader would not be the wiser. What would be the consequences of 
>modifying the IO classes to allow for this?
>
>With regard to directly supporting CML, I'm interested in trying my hand 
>at it with Octet. The Octet model for bonding is somewhat different from 
>the other Java cheminformatics packages I've seen in that it directly 
>supports multicenter, multielectron bonding arrangements.

So does CML. A bond can be between 2, 3, 4 or many atoms. It can also be 
between atoms and bonds or bonds and bonds. To be fair we haven't 
implemented this

>So, the bonding arrangement of ferrocene, benzyne, borane clusters, or the 
>homotropylium cation are handled exactly the same way as those of hexane. 
>This implementation is based on a paper by Dietz (JCICS 1995, 35, 787). 
>What are your thoughts on CML providing the syntax necessary to represent 
>these "non-traditional" kinds of bonding arrangements?

See if it works!!!

P.

Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069

-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Qsar-devel mailing list
Qsa...@li...
https://lists.sourceforge.net/lists/listinfo/qsar-devel

---------------------------------
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs 

2004	Jan	Feb	Mar	Apr (3)	May (11)	Jun (7)	Jul (12)	Aug (10)	Sep	Oct (2)	Nov (10)	Dec (14)
2005	Jan (3)	Feb	Mar (1)	Apr	May	Jun (1)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov	Dec
2006	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug (2)	Sep (5)	Oct (31)	Nov (13)	Dec

octet-devel Mailing List for Octet (Page 4)

octet-devel — Octet developer list.