[Rdkit-discuss] MolBlock for a fragment
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Andrew D. <da...@da...> - 2012-05-24 01:38:17
|
My MCS code finds a subset of the atoms and bonds in each input structure. This is a connected fragment. It could be as simple as two aromatic atoms bonded by an aromatic bond. I would like to save the fragment to an SD file. This seems reasonable on the surface. But I don't know enough about the SD file to know how meaningful that is. (For example, the above as a 'fragment SMILES' is 'cc', which some programs accept and many don't.) RDKit frowns on me trying to do that. >>> from rdkit import Chem >>> mol = Chem.MolFromSmiles("c1ccccc1") >>> emol = Chem.EditableMol(Chem.Mol()) >>> bond = list(mol.GetBonds())[0] >>> atoms = [bond.GetBeginAtom(), bond.GetEndAtom()] >>> a1 = emol.AddAtom(atoms[0]) >>> a2 = emol.AddAtom(atoms[1]) >>> a1, a2 (0, 1) >>> emol.AddBond(a1, a2, bond.GetBondType()) 1 >>> mol2 = emol.GetMol() >>> Chem.MolToMolBlock(mol2) [02:47:11] non-ring atom 0 marked aromatic Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Sanitization error: non-ring atom 0 marked aromatic >>> What I did to support this (again, I don't know if it's really legitimate, according to the SD file conventions) is to use "kekulize=False" >>> print Chem.MolToMolBlock(mol2, kekulize=False) RDKit 2 1 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 0 M END It this reasonable for what I'm doing? 2) I seem to have a problems with the code which extracts the fragment from a molecule. Here's a reproducible, where I think I turn carbon monoxide into a copy of itself, but the copy can't be used to make a molecule block. from rdkit import Chem def subgraph_to_fragment(mol, atom_indices, bond_indices): emol = Chem.EditableMol(Chem.Mol()) atom_map = {} for atom_index in atom_indices: emol.AddAtom(mol.GetAtomWithIdx(atom_index)) atom_map[atom_index] = len(atom_map) for bond_index in bond_indices: bond = mol.GetBondWithIdx(bond_index) emol.AddBond(atom_map[bond.GetBeginAtomIdx()], atom_map[bond.GetEndAtomIdx()], bond.GetBondType()) return emol.GetMol() mol = Chem.MolFromSmiles("O=C") bond_indices = [0] ## bond = mol.GetBondWithIdx(0) ## atom_indices = [bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()] atom_indices = [0, 1] fragment = subgraph_to_fragment(mol, atom_indices, bond_indices) print "====Original\n", Chem.MolToMolBlock(mol, kekulize=False) print "SMILES", Chem.MolToSmiles(fragment) print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False) However, that gives the output and exception: ====Original RDKit 2 1 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 M END SMILES C=O ====Fragment [03:12:30] **** Pre-condition Violation not initialized Violation occurred on line 67 in file /tmp/homebrew-rdkit-HEAD-dB4i/Code/GraphMol/RingInfo.cpp Failed Expression: df_init **** Traceback (most recent call last): File "fail.py", line 26, in <module> print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False) RuntimeError: Pre-condition Violation What am I doing wrong here? I know from private discussion with Greg on another topic that a fix to this is either Chem.GetSSSR(fragment) or Chem.FastFindRings(fragment), as in, for example: fragment = subgraph_to_fragment(mol, atom_indices, bond_indices) Chem.FastFindRings(fragment) That seems to work for me. Why is it needed? And why *isn't* it needed for the SMILES output? When building a molecule from scratch, I'm supposed to call sanitizeMol on the result in order to perceive the chemistry. However, I don't want to do that on a fragment for fear that it will modify chemistry under the assumption that it's a full molecule. The only other previous match on Google to "Failed Expression: df_init" is a bug report from JP, where the same exception is raised after doing ReplaceSubstructs. Greg wrote: Marking this as "Won't Fix", because the current behavior is what's supposed to happen: it is expected that the caller will take care of sanitizing the results of ReplaceSubstructs if they need them sanitized. The chemical reaction code behaves similarly. This seems like a rough edge that's caught a few people and should be smoothed. Andrew da...@da... |