From: Simon S. <Sim...@cs...> - 2015-08-26 00:32:34
|
Thanks for doing this Greg. Fixing those SMARTS queries always looked like it would be a real...pain. I've dropped your Github file into the KNIME workflow, and the RDKit version of the workflow (using nodes RDKit 2.5.0.201505221301) now hits 770 structures in the WEHI-10k test set. But that includes 19 false positives that weren't being caught by the SLN filters. One filter alone is responsible for 17 of those false positives: anil_di_alk_C(246) old: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7](-[#6;X4])-[$([#1]),$([#6;X4])] new: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7;!H0,$([#7]-[#6;X4])]-[#6;X4] An example of one of the false positive structures is the aniline sulfonamide WEHI-18518. I've checked with Johnathan, and the intention of that query is that "... that the nitrogen has a single bond to a carbon that has four atoms bonded to it (i.e. sp3), and that the other atom singly bonded to the nitrogen atom is anything so long as it is either H or an sp3 carbon". So no to sulfonamides, and also some of the acetamide (sp2 C) showing up as hits. -- Cheers, Simon |