Re: [Rdkit-discuss] MolBlock for a fragment

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, May 24, 2012 at 3:38 AM, Andrew Dalke <da...@da...> wrote:
>
> I would like to save the fragment to an SD file. This seems
> reasonable on the surface. But I don't know enough about the SD
> file to know how meaningful that is. (For example, the above as
> a 'fragment SMILES' is 'cc', which some programs accept and
> many don't.)
>
> RDKit frowns on me trying to do that.
>
>>>> from rdkit import Chem
>>>> mol = Chem.MolFromSmiles("c1ccccc1")
>>>> emol = Chem.EditableMol(Chem.Mol())
>>>> bond = list(mol.GetBonds())[0]
>>>> atoms = [bond.GetBeginAtom(), bond.GetEndAtom()]
>>>> a1 = emol.AddAtom(atoms[0])
>>>> a2 = emol.AddAtom(atoms[1])
>>>> a1, a2
> (0, 1)
>>>> emol.AddBond(a1, a2, bond.GetBondType())
> 1
>>>> mol2 = emol.GetMol()
>>>> Chem.MolToMolBlock(mol2)
> [02:47:11] non-ring atom 0 marked aromatic
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> ValueError: Sanitization error: non-ring atom 0 marked aromatic
>>>>
>
>
> What I did to support this (again, I don't know if it's
> really legitimate, according to the SD file conventions)
> is to use "kekulize=False"
>
>>>> print Chem.MolToMolBlock(mol2, kekulize=False)
>
>     RDKit
>
>  2  1  0  0  0  0  0  0  0  0999 V2000
>    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>  1  2  4  0
> M  END
>
>
> It this reasonable for what I'm doing?

Absolutely. You're defining a query, not a molecule, so it's
appropriate to use the aromatic bond type. As Gianluca points out in
his message: the CTAB spec says that this bond type is intended for
queries.

> 2)  I seem to have a problems with the code which extracts
> the fragment from a molecule. Here's a reproducible, where
> I think I turn carbon monoxide into a copy of itself, but
> the copy can't be used to make a molecule block.
>
>
> from rdkit import Chem
>
> def subgraph_to_fragment(mol, atom_indices, bond_indices):
>    emol = Chem.EditableMol(Chem.Mol())
>    atom_map = {}
>    for atom_index in atom_indices:
>        emol.AddAtom(mol.GetAtomWithIdx(atom_index))
>        atom_map[atom_index] = len(atom_map)
>
>    for bond_index in bond_indices:
>        bond = mol.GetBondWithIdx(bond_index)
>        emol.AddBond(atom_map[bond.GetBeginAtomIdx()],
>                     atom_map[bond.GetEndAtomIdx()],
>                     bond.GetBondType())
>
>    return emol.GetMol()
>
> mol = Chem.MolFromSmiles("O=C")
> bond_indices = [0]
> ## bond = mol.GetBondWithIdx(0)
> ## atom_indices = [bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()]
> atom_indices = [0, 1]
> fragment = subgraph_to_fragment(mol, atom_indices, bond_indices)
> print "====Original\n", Chem.MolToMolBlock(mol, kekulize=False)
> print "SMILES", Chem.MolToSmiles(fragment)
> print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False)
>
>
> However, that gives the output and exception:
>
> ====Original
>
>     RDKit
>
>  2  1  0  0  0  0  0  0  0  0999 V2000
>    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>  1  2  2  0
> M  END
>
> SMILES C=O
> ====Fragment
> [03:12:30]
>
> ****
> Pre-condition Violation
> not initialized
> Violation occurred on line 67 in file /tmp/homebrew-rdkit-HEAD-dB4i/Code/GraphMol/RingInfo.cpp
> Failed Expression: df_init
> ****
>
> Traceback (most recent call last):
>  File "fail.py", line 26, in <module>
>    print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False)
> RuntimeError: Pre-condition Violation
>
>
>
> What am I doing wrong here?

Nothing really... it's a wart. The code to generate mol blocks needs
ring information to figure out which bonds it should wedge to mark
stereochemistry (if a choice is available, the writer favors wedging
non-ring bonds). The error comes if you haven't run the
ring-perception algorithm on your fragments. I'm about to check in a
fix for this. I also checked in a fix that makes that error message at
least a little bit more useful... it now says "RingInfo not
initialized" instead of "not initialized"

>
> I know from private discussion with Greg on another topic that
> a fix to this is either Chem.GetSSSR(fragment) or
> Chem.FastFindRings(fragment), as in, for example:
>
> fragment = subgraph_to_fragment(mol, atom_indices, bond_indices)
> Chem.FastFindRings(fragment)
>
> That seems to work for me. Why is it needed? And why *isn't*
> it needed for the SMILES output?

The SMILES writer doesn't need the ring information to do its work.

> When building a molecule from scratch, I'm supposed to call
> sanitizeMol on the result in order to perceive the chemistry.
> However, I don't want to do that on a fragment for fear that
> it will modify chemistry under the assumption that it's a
> full molecule.

Yeah, you definitely don't want full sanitization. A call to make
fragments maximally useful would be:

  Chem.SanitizeMol(m,Chem.SanitizeFlags.SANITIZE_PROPERTIES|Chem.SanitizeFlags.SANITIZE_SYMMRINGS)

This does ring perception and tallies up the valences of each atom,
but shouldn't actually cause any structural changes.

>
> The only other previous match on Google to "Failed Expression: df_init"
> is a bug report from JP, where the same exception is raised
> after doing ReplaceSubstructs. Greg wrote:
>
>   Marking this as "Won't Fix", because the current behavior is
>   what's supposed to happen: it is expected that the caller
>   will take care of sanitizing the results of ReplaceSubstructs
>   if they need them sanitized. The chemical reaction code behaves
>   similarly.
>
>
> This seems like a rough edge that's caught a few people
> and should be smoothed.

Point taken. I'll look at it.

-greg

Re: [Rdkit-discuss] MolBlock for a fragment

Open-Source Cheminformatics and Machine Learning

Re: [Rdkit-discuss] MolBlock for a fragment