octet-devel Mailing List for Octet (Page 4)
Status: Alpha
Brought to you by:
r_apodaca
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
(11) |
Jun
(7) |
Jul
(12) |
Aug
(10) |
Sep
|
Oct
(2) |
Nov
(10) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(3) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(5) |
Oct
(31) |
Nov
(13) |
Dec
|
From: rich a. <che...@ya...> - 2004-04-29 01:57:24
|
I found the api you mentioned here: http://www.xml-cml.org/cmldom/htmlDoc/index.html (but I still wasn't able to find MoleculeTool). It's very interesting that interface definitions can be produced from an xml specification. One of the points I considered when developing Octet was whether the core model-level interfaces such as Atom, AtomPair, and Molecule should be immutable or not. I ended up choosing the "Read-Only Object" variant of the "Immutable" design pattern described by Grand. I won't go into the many reasons behind this decision here other than to mention that once Molecules are created, there was no situation (other than a graphical structure editor) that I could think of where the need to connect/disconnect atoms, change bond orders, or otherwise tinker with the Molecule's internal state would come up post creation. More importantly, the entire process of working with Molecules becomes simpler, more robust, and less error-prone if immutability of model-level objects can be assumed by clients. What I noticed from the cml interface definitions is that they define read-write access for every property. This means that no assumptions about Molecule, Atom, or Bond immutability can be made. What would you think about an interface that removes the mutator methods? In Octet, I handled the need to get Molecules created in the first place by writing a concrete implementation of the Molecule interface (BasicMolecule) that has mutator methods, and by using the "Builder" pattern for general molecule construction. Of course, clients can always try to guess the concrete implementation of Molecule it encounters, downcast to it, and use the mutator methods, but they really have to consider if this is a good thing to be doing. Another thing that struck me when looking at the CMLMolecule, CMLAtom, and CMLBond interface definitions was that they are essentially interfaces to a data structure. If that data structure were to change for some reason, the resulting refactorings could be somewhat painful. Since I'm a fan of OO programming and encapsulation, I've been conditioned to avoid this kind of situation. What are your thoughts on this? One more thing: there are features like 2-D and 3-D coordinates that will be unused in many cheminformatics applications but which are defined in the CML interfaces. This means that overhead will be incurred when it might not be necessary. In Octet/Structure, I've handled this with a "pay-as-you-go" approach. The net.sourceforge.octet.molecule.Molecule inteface defines the bare minimal functionality needed to work with molecular graph objects. If 2-D coordinates are desired, then clients can apply the "Decorator" design pattern and use a net.sourceforge.structure.molecule.Molecule2D implementation which is itself a subclass of Molecule with additional 2-D coordinate functionality. What are your thoughts on this kind of approach to defining a Molecule interface? cheers, rich Peter Murray-Rust <pm...@ca...> wrote: [Reply to QSAR list only] At 19:03 27/04/2004 -0700, rich apodaca wrote: >Thanks for your comments, Peter. I'm especially interested in your >comments on cml. I've been watching cml at a distance for some time, but I >didn't realize you had defined interfaces for molecule and atom behavior. >Could you more precisely point me to where these interfaces are? I visited >the link you sent but wasn't able to find them. http://wwmm.ch.cam.ac.uk/moin/ChemicalMarkupLanguage You will find schema elements for about 100 concepts. http://wwmm.ch.cam.ac.uk/moin/CmlElements This list is autogenerated from the schema, so can be updated every time the schema is modified. There are similar lists for http://wwmm.ch.cam.ac.uk/moin/CmlAttributes and http://wwmm.ch.cam.ac.uk/moin/CmlSimpleComplexTypes These are then automatically compiled into target code (Java, C++, Python, F90). In Java this results in a (Java) interface for every Element including appropriate methods for every attribute. The code obviously generates Javadoc. Rather than displaying this we distribute the complete system: http://wwmm.ch.cam.ac.uk/moin/CmlAtNesc and ask people to generate their own. I now believe that we should try to define interfaces, etc in XML rather than a target language. I am not a fan of UML (costs money) so somewhat reluctantly use XMLSchema. > >I'm not very familiar with xml, but if I understand correctly, a DOM is >used to produce an in-memory representation of the structure of an XML >document. Minimally, Absolutely right >it provides an exact representation of the content of the XML document. If >I'm correct so far, then I imagine that a CML DOM provides an exact >representation of the structure of a CML document. > Yes. >In addition to providing an interface to access the data, what behaviors >do the CML interfaces define for model-level objects like Atom and >Molecule? To me, an example of pure Atom data would be an atom label >property, whereas an example of Atom behavior is the capability to report >what bonding systems an Atom belongs to and what Atoms it is a neighbor >of. The choice of behavior is critical: too much functionality and the >interface becomes bloated and hard to understand - too little and >developers are frustrated at how much work it takes to do simple things. >I'm very interested in knowing what the right balance is. > Fully agreed. That is why I have developed a Tool approach. Every element has a Tool which adds functionality.Thus Molecule has MoleculeTool. The tool has behavioural methods like: MoleculeTool.getMolecularMass(). MoleculeTool.get2DCentroid(). I originally wrote these in Java but am now starting to develop a pseudocode so that the other target languages can be supported. In this way we get a complete interface for behaviour which - hopefully - will lead to increasingly consistency of implementation >It sounds like the approach you've taken in using interfaces is similar to >mine. Like you, I am keenly interested in taking advantage of the rich >functionality of CDK and JOELib. As a first pass, I've been working on a >two-way adapter class for CDK. Its definition looks something like this: See MoleculeTool in our distrib. At present it uses CDK as the engine but could easily use JOELib, etc. I am sure this is the right way to go > >public class CDKMolecule extends org.openscience.cdk.Molecule > implements net.sourceforge.octet.molecule.Molecule >{ > // override org.openscience.cdk.Molecule methods where appropriate > > // implement net.sourceforge.octet.molecule.Molecule interface >} Yes. Seems reasonable I tend to use a delegation method: >public class MoleculeToolImpl implements MoleculeTool { // body is implementor dependent org.openscience.cdk.Molecule theMolecule; // used for computation } > >The advantage here is that a CDKMolecule can be used from within either >CDK or Octet without the need for a conversion step. I plan to do the same >thing for joelib.molecule.JOEMol. Yes. I am now starting to use workflow tools (Kepler, Taverna - see sf) and these require small atomic units (in the CS sense) It is important that their interface to the external world is implementation independent > >In particular, it would be helpful to use the file format read/write >capabilities of CDK. The problem I'm currently facing is that IO classes >such as org.openscience.cdk.io.MDLReader provide their own instance of >org.openscience.cdk.Molecule that is created during a call to read(). If >this method used an instance of org.openscience.cdk.Molecule passed into >the read() method instead, then I could just pass in my CDKMolecule, and >the reader would not be the wiser. What would be the consequences of >modifying the IO classes to allow for this? > >With regard to directly supporting CML, I'm interested in trying my hand >at it with Octet. The Octet model for bonding is somewhat different from >the other Java cheminformatics packages I've seen in that it directly >supports multicenter, multielectron bonding arrangements. So does CML. A bond can be between 2, 3, 4 or many atoms. It can also be between atoms and bonds or bonds and bonds. To be fair we haven't implemented this >So, the bonding arrangement of ferrocene, benzyne, borane clusters, or the >homotropylium cation are handled exactly the same way as those of hexane. >This implementation is based on a paper by Dietz (JCICS 1995, 35, 787). >What are your thoughts on CML providing the syntax necessary to represent >these "non-traditional" kinds of bonding arrangements? See if it works!!! P. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 ------------------------------------------------------- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click _______________________________________________ Qsar-devel mailing list Qsa...@li... https://lists.sourceforge.net/lists/listinfo/qsar-devel --------------------------------- Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs |
From: rich a. <che...@ya...> - 2004-04-28 02:03:41
|
Thanks for your comments, Peter. I'm especially interested in your comments on cml. I've been watching cml at a distance for some time, but I didn't realize you had defined interfaces for molecule and atom behavior. Could you more precisely point me to where these interfaces are? I visited the link you sent but wasn't able to find them. I'm not very familiar with xml, but if I understand correctly, a DOM is used to produce an in-memory representation of the structure of an XML document. Minimally, it provides an exact representation of the content of the XML document. If I'm correct so far, then I imagine that a CML DOM provides an exact representation of the structure of a CML document. In addition to providing an interface to access the data, what behaviors do the CML interfaces define for model-level objects like Atom and Molecule? To me, an example of pure Atom data would be an atom label property, whereas an example of Atom behavior is the capability to report what bonding systems an Atom belongs to and what Atoms it is a neighbor of. The choice of behavior is critical: too much functionality and the interface becomes bloated and hard to understand - too little and developers are frustrated at how much work it takes to do simple things. I'm very interested in knowing what the right balance is. It sounds like the approach you've taken in using interfaces is similar to mine. Like you, I am keenly interested in taking advantage of the rich functionality of CDK and JOELib. As a first pass, I've been working on a two-way adapter class for CDK. Its definition looks something like this: public class CDKMolecule extends org.openscience.cdk.Molecule implements net.sourceforge.octet.molecule.Molecule { // override org.openscience.cdk.Molecule methods where appropriate // implement net.sourceforge.octet.molecule.Molecule interface } The advantage here is that a CDKMolecule can be used from within either CDK or Octet without the need for a conversion step. I plan to do the same thing for joelib.molecule.JOEMol. In particular, it would be helpful to use the file format read/write capabilities of CDK. The problem I'm currently facing is that IO classes such as org.openscience.cdk.io.MDLReader provide their own instance of org.openscience.cdk.Molecule that is created during a call to read(). If this method used an instance of org.openscience.cdk.Molecule passed into the read() method instead, then I could just pass in my CDKMolecule, and the reader would not be the wiser. What would be the consequences of modifying the IO classes to allow for this? With regard to directly supporting CML, I'm interested in trying my hand at it with Octet. The Octet model for bonding is somewhat different from the other Java cheminformatics packages I've seen in that it directly supports multicenter, multielectron bonding arrangements. So, the bonding arrangement of ferrocene, benzyne, borane clusters, or the homotropylium cation are handled exactly the same way as those of hexane. This implementation is based on a paper by Dietz (JCICS 1995, 35, 787). What are your thoughts on CML providing the syntax necessary to represent these "non-traditional" kinds of bonding arrangements? cheers, rich Peter Murray-Rust <pm...@ca...> wrote: At 08:03 21/04/2004 -0700, rich apodaca wrote: >I agree that a common method for the representation of molecular objects >is critical for the development of portable and verifiable cheminformatics >protocols. > First - I welcome new contributors in the OpenSource molecular sciences domain! >A core principle of object-oriented design is that designs are most >reusable when you program to interfaces, not implementations. I agree fully. In practice this is difficult to achieve. The areas where I have found it work best are SUN's Java libraries, SAX (which we developed as an interface) and DOM. > >I would propose that any discussion of a QSAR framework should take into >consideration the need to first define Java interfaces for core objects >such as Atom and Molecule. The QSAR framework would be useful to the >greatest number of developers if each developer is free to provide their >own implementation of the core interfaces that will work without >modification in the QSAR framework. Defining these interfaces means that >the irreducible core functionality of Molecule, Atom, etc. with which the >framework will neeed to work must be decided on. I agree. May I suggest XML as the approach to define the functionality. We now have opensource tools (JUMBO4.3, http://wwmm.ch.cam.ac.uk/moin) which automatically generate DOM interfaces and implementations for Java, C++, python and F90 for any XML schema. We have done this for CML and can automatically do this for any sub or superset of CML within minutes. We have a pseudocode language for adding non-DOM functionality to DOM objects so that the whole of the code can be represented in XML. An advantage of doing this is that documentation, examples, rendering and behaviour are much easier to maintain and that multiple target languages can be used. The advantage of XML over UML is that it is much more widely used and tools are free > >The advantage of this approach is true design reuse. Because the QSAR >framework only knows about Java interfaces, all a developer needs to do to >use all of the functionality of the framework is to provide an >implementation of those interfaces. Of course, reference implementations >should be provided by the framework as well. Agreed. We do this for CML and for the additional non-DOM functionality. Thus we have: CMLMolecule (interface) MoleculeImpl (implementation - can be provided by anyone) These have automatically generated methods such as: void Molecule.setTitle() CMLAtom AtomArray.getAtomChild(int serial) there are also factory methods for generation so that object construction can be provided by different developers. To provide additional functionality we provide wrappers such as: MoleculeTool (interface) MoleculeToolImpl (impl). MoleculeTool MoleculeToolImpl.getMoleculeTool(CMLMolecule) doublr MoleculeTool.getMolecularWeight() This has great reusability - we currently use CDK methods within the Tools (rather than write our own). However we could easily add or replace JOELib methods without changing the user code. Libraries can be linked at runtime. Indeed the code could even poll the classlibraries to see which can be resolved. > >I've taken this approach in a cheminformatics framework called "Octet" >(http://octet.sourceforge.net) and in a 2-D >molecular visualization framework called "Structure" >(http://structure.sourceforge.net). The >approach in these frameworks differs significantly from both JOELib and >CDK in that a developer is never required to use my reference >implementations of Molecule or Atom. Thanks - I have had a look at the site and agree with the design. Please take the following comments as constructive. - a. If you are intending to write your own code there will be a huge amount. I did essentially this for CML1.0 and submission to the OMG. It involved over 1000 method interfaces. You will soon find you have a great many to maintain. - you will need to provide a reference implementation for each method to provide that the system is self-consistent. You may be able to borrow some functionality from CDK or JOELib - that's what I do. - you will need to convince collaborators of the value of your interface over other available ones. I'm neutral on this, but I would urge that any emerging interfaces support CML. > >For example, it is possible to provide performance-optimized >implementations of these interfaces that would be suitable for large >numbers of molecules, or the rapid constrution of molecules. The framework >only knows about interfaces, and this is the key to code reuse. > >I would be willing to provide any code and/or experiences from these >projects to the development of a QSAR framework. > I suspect this message is therefore on the wrong list and should be sent to qsar-devel. P. Note I have not replied to the crossposted lists in the original mail Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 --------------------------------- Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs |
From: rich a. <che...@ya...> - 2004-04-27 15:01:18
|
Hello Joerg, Thanks for looking over Octet. Your comments are very helpful, and I had a couple of questions: What are the advantages of an Octet Atom inheriting from Node? The definition of the Atom interface is very short and contains mainly methods for identifying neighboring atoms and bonding systems. Octet doesn't use Bond, but rather BondingSystem, which allows for the connection of any number of Atoms using any number of electrons so that structures like ferrocene and transition metal complexes can be handled the same way as any purely organic molecule. I thought about being able to store keys, properties, etc. in Atom, Molecule. However, since the design of Octet is based on the implementation of interfaces, doing so puts a burden on the implementor to provide this functionality. Marking atoms, bonding systems, and molecules could just as easily be done externally to those interfaces using a vector of visited atoms, for example. In fact, Octet uses this approach in, for example, the DepthFirstTraverser class. Is there something else I'm missing? Do you consider AtomPair a "descriptor"? I noticed it is present in JOELib. Octet also has an AtomPair interface. However, in Octet, AtomPair simply represents an association between two atoms (no electrons involved - that happens through BondingSystem). To find all the atoms that are associated in a Molecule, use Molecule.iterateAtomPairs(). Your point about hashCode() is well-taken. Your point about copy() clone() is also well-taken. However, this can't be forced through the interface definition but can be incorported into the reference implementations. Can you give me an example of the readAsString() method and its advantages in handling corrupted file entries compared to just throwing an exception with the existing MoleculeReader methods? You're right about these methods needing to declare an exception. I'm currently working on implementing some of the other features you asked about such as a descriptor framework, substructure/similarity searching. However, these features are independent of the interface definitions for the key model-level objects (Molecule, BondingSystem, and AtomPair). I've had a look at JOELib's descriptor framework, and it looks like a flexible way implement descriptor functionality. Can you explain what a "descriptor IO helper class" is and why it is necessary? cheers, rich "Joerg K. Wegner" <we...@in...> wrote: Hi, I've had a short look and i'm missing some things functionalities in octet: - i would prefer Node and Edge objects as Atom and Bond base - i would prefer a general NodeKey, EdgeKey, MoleculeKey, RingKey object as labelling the attributed molecular graph both things are required for general graph algorithms, for the keys a factory pattern could/should be used, especially for assigning default labels. This avoids calculating e.g. a ring search twice by using: if(!mol.hasKey(myRingSearchKey))mol.calculateRingSearch() - The AtomPair is ambigous, there exists a descriptor with an additional distance parameter, here you are using always one. Hashing is important here. - Force Copy/Clone/Hash-methods. - The reader should provide, readAsString, readToMoleculeObject, so we can catch corrupted file entries. Don't ask me why there are such a lot of corrupted entries, but they exists - Add MoleculeIOException to read/write, to catch these corrupted entries, this will us enable to write skip files - A general SubstructureSearch object would be fine, also a UniqueSubstructureSearch object or a transformer object. - General descriptor objects are missing completely, but they can be handled by the hashed MoleculeKey objects, but eventually we distinguish between keys which can handle only one object (hashed) and keys which can handle multiple objects, so we need a GeneralPropertyHandler which accepts single and multiple entries by key. - For descriptors IO helper classes are required, which have read(IOType) and write(IOType) Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) ------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ Cdk-devel mailing list Cdk...@li... https://lists.sourceforge.net/lists/listinfo/cdk-devel --------------------------------- Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs |