[Rdkit-discuss] canonical fragment SMILES

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

   I encountered an issue with SMILES of fragments. Maybe someone may 
suggest a workaround.
   I attached the notebook, but will also reproduce some code here.

   We have a structure with two Ns and we take an N atom and adjacent 
atoms to make a fragment SMILES and got different results, while SMILES 
represent the same pattern (only the order of atoms is different). I 
guess this happens due to canonicalization algorithm, which takes into 
account some additional information missing in the output SMILES (e.g. 
ring membership). For instance, if we break a saturated cycle (bond 
8-9), we get identical SMILES output.

mol = Chem.MolFromSmiles('CCn1c2cccc3CCn(c23)c2ccccc12')

print(Chem.MolFragmentToSmiles(mol, [1,2,3,17], canonical=True))
print(Chem.MolFragmentToSmiles(mol, [9,10,11,12], canonical=True))

cN(C)c
cN(c)C

   So, the question is how to workaround this issue? We already have 
millions of such patterns. So, it will work if we will be able to 
canonicalize them. However, standard canonicalization does not work, 
because we have disable sanitization during SMILES parsing. It returns 
the same output as input SMILES. Any ideas are appreciated.

print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(C)c', sanitize=False)))
print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(c)C', sanitize=False)))

cN(C)c
cN(c)C

   This issue actually came from the code of identification of 
functional groups.

Kind regards,
Pavel

[Rdkit-discuss] canonical fragment SMILES

Open-Source Cheminformatics and Machine Learning

[Rdkit-discuss] canonical fragment SMILES