On 25 February 2012 17:36, Rajarshi Guha <rajarshi.guha@gmail.com> wrote:

On Sat, Feb 25, 2012 at 4:46 AM, Nina Jeliazkova <jeliazkova.nina@gmail.com> wrote:
"Updated code to handle the * SMARTS pattern so that it ignores H's
unless they have an isotopic mass specification. This means * no
longer matches [H][H] but will match [1H][1H]. Added unit tests to
take care of this and also updated Javadocs for the query tool to note
this behavior. Note that even matching isotopic hydorgens is slightly
ambiguous since it implicitly assumes that one is dealing with an
explicit H (i.e., an IAtom object with symbol H). ideally, * should
not match any hydrogen at all, and this may become the behavior in
future version"

This makes sense, if the expected behaviour of SMARTS star pattern is
to ignore hydrogens - is this the case?

If I remember correctly, this was based on a discussion with Andrew Dalke and the behavior of matching explicit H's in SMARTS. So, yes, I updated it so that * does ignore H's

Just for the record, other wild card matchers (e.g those matching "A" ) perhaps should be modified in the same way, as currently "A" would match explicit hydrogens.

Best regards,
There is still an inconsistency in how massNumbers are set,
SmilesParser will leave null values, while reading the same structure
from  SD file will result in mass numbers correctly set.  It would be
good to have a consistent behaviour; I am not sure when the mass
numbers should be set - probably on atom type configuration?

This inconsistency should be resolved - I would assume that the SmilesParser should use the most common mass number (unless a particular isotopic form is specified) using the same code as the MDL reader code

Rajarshi Guha
NIH Chemical Genomics Center

Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
Cdk-devel mailing list