Re: [cml/ccml-discuss] [Blueobelisk-discuss] Conformers

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Is this a place to use the ref attribute? "...This is similar to a 
pointer and it can be thought of a strongly typed hyperlink. It may 
also be used for "subclassing" or "overriding" elements..."

<!--main molecule-->
<molecule id="m1">
   <atomArray id="aa1">
     <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
     <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
 >...

<molecule ref="m1" id="conformer1" convention="bo:conformer"
     <atom ref="a1" x3="0.9" y3="0.7" z3=0.5"/>
     <atom ref="a2" x3="1.6" y3="0.4" z3=0.2"/>
...

Only the atoms that were different would need to be specified; there 
could be sub-sub classing; putting convention on the molecule allows 
multiple molecules in the file. It would be backward compatible if 
parsers were to ignore an element with the ref attribute unless they 
knew how to handle it. Such a parser would see only the main 
molecule(s). Other atom and bond properties could maybe be treated in 
the same way: stereo and other isomers, tautomers, etc.

Chris

Peter Murray-Rust wrote:
> Thanks everyone for the discussion and contributions.
> Sam has given a good example of how energies can be held.for each set of 
> conformers (which can also include molecular dynamics and optimisation 
> by compChem and we use this approach in analysing QM output.
> 
> It's certainly possible to hold the coordinates in a <matrix> but it 
> loses the transparency of XML. It makes it difficult to search the file 
> for semantics - there is nothing to indicate what the information means 
> (the matrix could hold vibrational frequencies, kpoints, whatever).
> 
> It's certainly possible to hold the coordinates in atoms without the 
> bonds or element types. For example
> 
> <cml convention="bo:...">
> <molecule id="m1">
>   <atomArray id="aa1">
>     <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
>     <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
>   </atomArray>
>   <bondArray>
>     <bond id="b1" atomRefs2="a1 a2"/>
>   </bondArray>
> </molecule>
> 
> <molecule>
>   <propertyList>... conformer stuff ... </propertyList>
>   <atomArray id="aa2">
>     <atom id="a1" x3="0.0" y3="0.0" z3=0.0"/>
>     <atom id="a2" x3="1.6" y3="0.0" z3=0.0"/>
>   </atomArray>
> </molecule>
> 
> <molecule>
>   <propertyList>... conformer stuff ... </propertyList>
>   <atomArray id="aa3">
>     <atom id="a1" x3="0.0" y3="0.0" z3=0.0"/>
>     <atom id="a2" x3="1.7" y3="0.0" z3=0.0"/>
>   </atomArray>
> </molecule>
> 
> </cml>
> 
> However if you are still worried about size it's possible to use the 
> array form of the atomArray:
> 
> <atomArray
>   x3Array="1 2 3 4 5 6"
>   y3Array="9 8 7 6 5 4"
>   z3Array="1 9 1 9 1 9"/>
> 
> This will hold exactly the x3 y3 z3 coordinates with complete semantics 
> and almost no verbosity. It requires a bit more programming and is less 
> semantic.
> 
> 
> 
> There is always a balance between terseness and explicitness. It's 
> tempting to remove all markup and use the known sequence of numbers to 
> define the object. This gives something like:
> 
> <cml>
> <molecule id="m1">
>   <atomArray id="aa1">
>     <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
>     <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
>   </atomArray>
>   <bondArray>
>     <bond id="b1" atomRefs2="a1 a2"/>
>   </bondArray>
> 
>   <propertyList>... conformer stuff ... </propertyList>
> <matrix>0.0 0.0 0.0 1.5 0.0 0.0"/>
> 
>   <propertyList>... conformer stuff ... </propertyList>
> <matrix>0.0 0.0 0.0 1.6 0.0 0.0"/>
> 
>   <propertyList>... conformer stuff ... </propertyList>
> <matrix>0.0 0.0 0.0 1.7 0.0 0.0"/>
> </cml>
> 
> The difficulty of this is that there is no explicit markup and anyone 
> who doesn't know the convention would have to guess at the order of 
> rows, etc. It's harder to use semantic search engines to find 
> information. The primary points of CML are:
> * to make it explicit to human readers what the information is.
> * to balance flexibility against robustness
> * to make it easier to write software of high quality.
> Remember that in CML the order of atoms is not defined (their identity 
> comes from their ids). Many of the
> current problems of quality in chemoinformatics comes from guessed 
> semantics and over-fluid semantics. CML offers an increase in robustness 
> and ease of programming at a relatively small cost in filesize 
> (especially when compressed)
> 
> 
> I had a typical example yesterday. We were creating MOL files from CML 
> and got the second numeric field in the atom record wrong. It holds the 
> charges and spin in a transformed and completely opaque way. It wasted 
> time and we naroowly avoided filling the repository with junk. With CML 
> that would have been impossible but we had to use a MOL file. I'd argue 
> that writing:
> formalCharge="1"
> instead of a left-justified 3 (or is it 5) is a justification for using 
> a few more characters.
> 
> I think the atomArray may be the best solution. I haven't used it 
> recently but it's completely supported in JUMBO.
> 
> P.
> 
> 
> -- 
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
> 
> 
> ------------------------------------------------------------------------
> 
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> what you do best, core application coding. Discover what's new with 
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> cml-discuss mailing list
> cml...@li...
> https://lists.sourceforge.net/lists/listinfo/cml-discuss
> 
> 
> ------------------------------------------------------------------------
> 
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.392 / Virus Database: 270.13.42/2279 - Release Date: 08/03/09 05:57:00
>