[Rdkit-discuss] Query on a failed molecule from SureChEMBL
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Lewis M. <lew...@gm...> - 2021-12-14 23:20:09
|
Hi All, Reading molecules from a bulk download of SureChEMBL, I come across a fair few molecules that fail to parse. Not sure whether they SHOULD parse or not. Here is an example: https://www.surechembl.org/chemical/SCHEMBL386 with SMILES code: COC(=O)C1=C(C=CC=C1)C1=CC=C(C[N+]#[N]=[N-])C=C1 Even reading the SMILES code one can see that there are too many bonds in there - a nitrogen triply bonded and doubly bonded to other atoms. Another example: https://www.surechembl.org/chemical/SCHEMBL33957 smiles: NC(N)=[NH]C1=NC(CSCC[NH]=CNS(=O)(=O)C2=CC=C(Br)C=C2)=CS1 Again, valence for a nitrogen is off. Should I expect to parse these with RDKit? Might there be some way around this? It's a significant fraction of the molecules in SureChEMBL. Thanks team! Lewis |