From: Nina J. <jel...@gm...> - 2011-02-19 07:19:37
|
On 19 February 2011 06:51, Andrew Dalke <da...@da...> wrote: > On Feb 18, 2011, at 7:23 PM, Nina Jeliazkova wrote: >>>> What do you think? Is this the expected behaviour? > >> Yes, because you are aware of the internal representation in CDK. > >> However, this could be easily part of larger structural alert in a >> predictive model and the user would normally expect predictions to be >> independent of the way the input structure is submitted. Not? > > The SMARTS [#6]=[#6] should not match the biphenyl system if it is in aromatic form. The "=" specifically matches double bonds which are not aromatic. (If you need "double or aromatic" then you can do [#6]=,~[#6].) > > > The question is, who is responsible for proper chemical perception? > > In a system like you describe, it is the responsibility of the author of the structural alert system to process all input structures so that they can be used correctly. This may include format conversion, aromaticity perception, salt removal, and tautomer identification. > > It appears that you are correctly doing this normalization step. > > I don't know the CDK that well, so I don't know if there's something in the implementation which is causing this to be misunderstood. I see that CDKHueckelAromaticityDetector.detectAromaticity ends up adding flags to those atoms and bonds which are aromatic and that's all it does. > Yes, it adds flags only and the underlying representation of single-only or alternating single-double bonds stay unchanged. > Could it be that the CDK SMARTS matcher for "=" is only looking at the bond order, when it should also be checking the aromaticity flag? > Yes, it's the matchers that should be fixed (IMHO). There are two OrderQueryBond classes , with slightly different implementations, but both are not quite correct in this case. org.openscience.cdk.isomorphism.matchers.smarts.OrderQueryBond http://pele.farmbio.uu.se/nightly/cdk-javadoc-1.5.0.git/org/openscience/cdk/isomorphism/matchers/OrderQueryBond.html public boolean matches(IBond bond) { if (this.getOrder() == bond.getOrder()) { //that's it looking for bond orders only // bond orders match return true; } else if (this.getFlag(CDKConstants.ISAROMATIC) && bond.getFlag(CDKConstants.ISAROMATIC)) { // or both are aromatic <---- nothing here so check is essentially ignored and it will return false as per the last statement } // else return false; }; org.openscience.cdk.isomorphism.matchers.smarts.OrderQueryBond http://pele.farmbio.uu.se/nightly/cdk-javadoc-1.5.0.git/org/openscience/cdk/isomorphism/matchers/smarts/OrderQueryBond.html public boolean matches(IBond bond) { if (getOrder() == IBond.Order.SINGLE) { //this will NOT match single-non-aromatic bond to single-aromatic bond return !bond.getFlag(CDKConstants.ISAROMATIC) && getOrder() == bond.getOrder(); } else return getOrder() == bond.getOrder(); //this WILL match double-non-aromatic bond to double-aromatic bond } What I would expect OrderQueryBond to do is : public boolean matches(IBond bond) { if (aromatic-flags-are-different) return false else (if aromatic-flags-are-the-same-and-true) //this essentially considers aromatic flags as another bond type return true; else return getOrder() == bond.getOrder(); } Best regards, Nina > Cheers, > > Andrew > da...@da... > > > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > |