[Rdkit-discuss] cis/trans info lost when generating SMILES from molfile
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Eloy F. <elo...@gm...> - 2018-01-25 14:14:38
|
Hi RDKitters, I'm having trouble writing SMILES including cis/trans info for some molecules I load from molfile using rdkit. Openbabel and indigo are generating the expected SMILES. https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL501674 *# RDKIT* mol = Chem.MolFromMolFile('CHEMBL501674.mol') Chem.MolToSmiles(mol, isomericSmiles=True) 'CO[C@H]1[C@H](OC[C@H]2[C@@H]3O[C@@H]3C=CC(=O)[C@H](C)CC[C@H](O[C@@H]3O[C@H ](C)C[C@H](O)[C@H]3O)[C@@H](C)C=CC(=O)O[C@@H]2C)O[C@H](C)[C@@H](O)[C@H]1OC' *# OPENBABEL* mol_ob = pybel.readfile('mol', 'CHEMBL501674.mol') mol_smiles.write('can') 'CO[C@H]1[C@H](OC[C@@H]2[C@@H](C)OC(=O)*/C=C/*[C@H](C)[C@H](CC[C@H](C(=O) */C=C/*[C@@H]3[C@H]2O3)C)O[C@@H]2O[C@H](C)C[C@@H]([C@H]2O)O)O[C@@H]([C@H ]([C@H]1OC)O)C' *# INDIGO* mol = indigoObject.loadMoleculeFromFile('CHEMBL501674.mol') mol.smiles() 'C1(=O)[C@H](C)CC[C@H](O[C@@]2([H])[C@H](O)[C@@H](O)C[C@@H](C)O2)[C@ @H](C)C=CC(=O)O[C@H](C)[C@@H](CO[C@]2([H])O[C@H](C)[C@@H](O)[C@@H](OC)[C@H ]2OC)[C@]2([H])O[C@]2([H])C=C1 |*t:21,51* ,&1:2,6,8,10,12,15,18,25,27,30,33,35,37,40,43,46|' Indigo is using chemaxon extended notation, but is also . recognising t:21,51. If I check double bond stereo info for the molecule: for bond in mol.GetBonds(): if bond.GetBondType() == Chem.BondType.DOUBLE: print(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetStereo()) 6 7 STEREONONE 11 12 STEREONONE 0 17 STEREONONE 5 43 STEREONONE No E/Z info in bonds. inchi bond stereo layer generated from the molfile: /b12-10+,13-9+ from rdkit generated smiles(with and without Compute2DCoords): /b12-10-,13-9- from obabel generated smiles: /b12-10+,13-9+ This might be a bug in the piece of code that detects bond stereo info from the molfile or maybe... I'm just missing something :P Thanks for your great job! |