From Andrew Dalke on list:
One of RDKit MACCS key definitions is
[!#6;!#1]~[!#6;!#1;!H0]
I'm working on my test suite for those definitions, as mentioned in my previous email.
Here's a test case
mol = pybel.readstring("smi", "[U]S(C)C")
matcher = pybel.Smarts("[!#6;H0]")
matcher.findall(mol)
[(1,), (2,)]
matcher = pybel.Smarts("[!#6;!#1]~[!#6;!#1;!H0]")
matcher.findall(mol)
[]
RDKit, OEChem, and Daylight say that that pattern matches that structure. That's because all three programs say that the "S" has an implicit hydrogen on it.
Daylight says that sulfur has valence levels of "S (2,4,6)"
http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
This looks to be a bug in the code which calculates the implicit hydrogen count.
Here's another another case where the implicit h-count is wrong, this time with P.
Daylight says the valence levels for P in SMILES are (3,5)
Given N=PPCC
The second atom (the first P) has a double bond and a single,
so it's valences are filled. It should have no implicit hydrogens.
However, here's first the RDKit.MACCS pattern which passed, unexpectedly, in OpenBabel
mol = pybel.readstring("smi", "N=PPCC")
matcher = pybel.Smarts("[!#6;!#1;!H0]~[!#6;!#1;!H0]")
matcher.findall(mol)
[(1, 2), (2, 3)]
Hmatcher = pybel.Smarts("[!H0]")
Hmatcher.findall(mol)
[(1,), (2,), (3,), (4,), (5,)]
You can see it's because the matcher thinks all of the atoms have at least one implicit hydrogen.
Compare this to RDKit, which correctly has the P with no implicit hydrogens.
mol = Chem.MolFromSmiles("N=PPCC")
pat = Chem.MolFromSmarts("[!#6;!#1;!H0]~[!#6;!#1;!H0]")
mol.GetSubstructMatches(pat)
()
Hpat = Chem.MolFromSmarts("[!H0]")
mol.GetSubstructMatches(Hpat)
((0,), (2,), (3,), (4,))