Re: [Rdkit-discuss] Clearing isotope info
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Andrew D. <da...@da...> - 2019-12-12 19:42:05
|
On Dec 12, 2019, at 17:39, Rafal Roszak <rmr...@gm...> wrote:
> I also had situation when I need to generate smiles with either
> isotopes or stereochemistry but not both. Maybe it is worth to add two
> options to ChemMolToSmiles function:
>
> dontIncludeStereochemistry=True/False
> dontIncludeIsotopes=True/False
>
> Right now it is not straightforward to generate smiles w/o isotopes
> (but with stereochemistry) - one need to remove isotope, export to
> smiles and restore isotopes.
Bear in mind a few complications. I believe the following correctly implements what you describe:
from rdkit import Chem
def MolWithoutIsotopesToSmiles(mol):
atom_data = [(atom, atom.GetIsotope()) for atom in mol.GetAtoms()]
for atom, isotope in atom_data:
if isotope:
atom.SetIsotope(0)
smiles = Chem.MolToSmiles(mol)
for atom, isotope in atom_data:
if isotope:
atom.SetIsotope(isotope)
return smiles
>>> mol = Chem.MolFromSmiles("[19F][13C@H]([16OH])[35Cl]")
>>> MolWithoutIsotopesToSmiles(mol)
'O[C@@H](F)Cl'
Testing reveals two problems with my implementation:
1) isotopic hydrogens
Consider the same structure with tritium instead of fluorine:
>>> mol = Chem.MolFromSmiles("[3H][13C@H]([16OH])[35Cl]")
>>> MolWithoutIsotopesToSmiles(mol)
'[H][C@H](O)Cl'
That output should be 'OCCl'.
2) stereochemistry assignment needs to be recalculated after the isotopes have been removed:
>>> mol = Chem.MolFromSmiles("C[C@H]([13CH3])CI")
>>> MolWithoutIsotopesToSmiles(mol)
'C[C@@H](C)CI'
>>> Chem.CanonSmiles("C[C@H]([CH3])CI")
'CC(C)CI'
2b) This includes directional bonds
>>> mol = Chem.MolFromSmiles("C/C(=C/CO)/[11CH3]")
>>> MolWithoutIsotopesToSmiles(mol)
'C/C(C)=C/CO'
>>> Chem.CanonSmiles("C/C(=C/CO)/[CH3]")
'CC(C)=CCO'
Andrew
da...@da...
|