Re: [Rdkit-discuss] MolBlock for a fragment
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Greg L. <gre...@gm...> - 2012-05-24 15:13:14
|
On Thu, May 24, 2012 at 3:38 AM, Andrew Dalke <da...@da...> wrote:
>
> I would like to save the fragment to an SD file. This seems
> reasonable on the surface. But I don't know enough about the SD
> file to know how meaningful that is. (For example, the above as
> a 'fragment SMILES' is 'cc', which some programs accept and
> many don't.)
>
> RDKit frowns on me trying to do that.
>
>>>> from rdkit import Chem
>>>> mol = Chem.MolFromSmiles("c1ccccc1")
>>>> emol = Chem.EditableMol(Chem.Mol())
>>>> bond = list(mol.GetBonds())[0]
>>>> atoms = [bond.GetBeginAtom(), bond.GetEndAtom()]
>>>> a1 = emol.AddAtom(atoms[0])
>>>> a2 = emol.AddAtom(atoms[1])
>>>> a1, a2
> (0, 1)
>>>> emol.AddBond(a1, a2, bond.GetBondType())
> 1
>>>> mol2 = emol.GetMol()
>>>> Chem.MolToMolBlock(mol2)
> [02:47:11] non-ring atom 0 marked aromatic
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: Sanitization error: non-ring atom 0 marked aromatic
>>>>
>
>
> What I did to support this (again, I don't know if it's
> really legitimate, according to the SD file conventions)
> is to use "kekulize=False"
>
>>>> print Chem.MolToMolBlock(mol2, kekulize=False)
>
> RDKit
>
> 2 1 0 0 0 0 0 0 0 0999 V2000
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 1 2 4 0
> M END
>
>
> It this reasonable for what I'm doing?
Absolutely. You're defining a query, not a molecule, so it's
appropriate to use the aromatic bond type. As Gianluca points out in
his message: the CTAB spec says that this bond type is intended for
queries.
> 2) I seem to have a problems with the code which extracts
> the fragment from a molecule. Here's a reproducible, where
> I think I turn carbon monoxide into a copy of itself, but
> the copy can't be used to make a molecule block.
>
>
> from rdkit import Chem
>
> def subgraph_to_fragment(mol, atom_indices, bond_indices):
> emol = Chem.EditableMol(Chem.Mol())
> atom_map = {}
> for atom_index in atom_indices:
> emol.AddAtom(mol.GetAtomWithIdx(atom_index))
> atom_map[atom_index] = len(atom_map)
>
> for bond_index in bond_indices:
> bond = mol.GetBondWithIdx(bond_index)
> emol.AddBond(atom_map[bond.GetBeginAtomIdx()],
> atom_map[bond.GetEndAtomIdx()],
> bond.GetBondType())
>
> return emol.GetMol()
>
> mol = Chem.MolFromSmiles("O=C")
> bond_indices = [0]
> ## bond = mol.GetBondWithIdx(0)
> ## atom_indices = [bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()]
> atom_indices = [0, 1]
> fragment = subgraph_to_fragment(mol, atom_indices, bond_indices)
> print "====Original\n", Chem.MolToMolBlock(mol, kekulize=False)
> print "SMILES", Chem.MolToSmiles(fragment)
> print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False)
>
>
> However, that gives the output and exception:
>
> ====Original
>
> RDKit
>
> 2 1 0 0 0 0 0 0 0 0999 V2000
> 0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 1 2 2 0
> M END
>
> SMILES C=O
> ====Fragment
> [03:12:30]
>
> ****
> Pre-condition Violation
> not initialized
> Violation occurred on line 67 in file /tmp/homebrew-rdkit-HEAD-dB4i/Code/GraphMol/RingInfo.cpp
> Failed Expression: df_init
> ****
>
> Traceback (most recent call last):
> File "fail.py", line 26, in <module>
> print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False)
> RuntimeError: Pre-condition Violation
>
>
>
> What am I doing wrong here?
Nothing really... it's a wart. The code to generate mol blocks needs
ring information to figure out which bonds it should wedge to mark
stereochemistry (if a choice is available, the writer favors wedging
non-ring bonds). The error comes if you haven't run the
ring-perception algorithm on your fragments. I'm about to check in a
fix for this. I also checked in a fix that makes that error message at
least a little bit more useful... it now says "RingInfo not
initialized" instead of "not initialized"
>
> I know from private discussion with Greg on another topic that
> a fix to this is either Chem.GetSSSR(fragment) or
> Chem.FastFindRings(fragment), as in, for example:
>
> fragment = subgraph_to_fragment(mol, atom_indices, bond_indices)
> Chem.FastFindRings(fragment)
>
> That seems to work for me. Why is it needed? And why *isn't*
> it needed for the SMILES output?
The SMILES writer doesn't need the ring information to do its work.
> When building a molecule from scratch, I'm supposed to call
> sanitizeMol on the result in order to perceive the chemistry.
> However, I don't want to do that on a fragment for fear that
> it will modify chemistry under the assumption that it's a
> full molecule.
Yeah, you definitely don't want full sanitization. A call to make
fragments maximally useful would be:
Chem.SanitizeMol(m,Chem.SanitizeFlags.SANITIZE_PROPERTIES|Chem.SanitizeFlags.SANITIZE_SYMMRINGS)
This does ring perception and tallies up the valences of each atom,
but shouldn't actually cause any structural changes.
>
> The only other previous match on Google to "Failed Expression: df_init"
> is a bug report from JP, where the same exception is raised
> after doing ReplaceSubstructs. Greg wrote:
>
> Marking this as "Won't Fix", because the current behavior is
> what's supposed to happen: it is expected that the caller
> will take care of sanitizing the results of ReplaceSubstructs
> if they need them sanitized. The chemical reaction code behaves
> similarly.
>
>
> This seems like a rough edge that's caught a few people
> and should be smoothed.
Point taken. I'll look at it.
-greg
|