[Rdkit-discuss] MolBlock for a fragment

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

My MCS code finds a subset of the atoms and bonds
in each input structure.

This is a connected fragment. It could be as simple as two
aromatic atoms bonded by an aromatic bond.

I would like to save the fragment to an SD file. This seems
reasonable on the surface. But I don't know enough about the SD
file to know how meaningful that is. (For example, the above as
a 'fragment SMILES' is 'cc', which some programs accept and
many don't.)

RDKit frowns on me trying to do that.

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c1ccccc1")
>>> emol = Chem.EditableMol(Chem.Mol())
>>> bond = list(mol.GetBonds())[0]
>>> atoms = [bond.GetBeginAtom(), bond.GetEndAtom()]
>>> a1 = emol.AddAtom(atoms[0])
>>> a2 = emol.AddAtom(atoms[1])
>>> a1, a2
(0, 1)
>>> emol.AddBond(a1, a2, bond.GetBondType())
1
>>> mol2 = emol.GetMol()
>>> Chem.MolToMolBlock(mol2)
[02:47:11] non-ring atom 0 marked aromatic
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Sanitization error: non-ring atom 0 marked aromatic
>>> 

What I did to support this (again, I don't know if it's
really legitimate, according to the SD file conventions)
is to use "kekulize=False"

>>> print Chem.MolToMolBlock(mol2, kekulize=False)

     RDKit          

  2  1  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  4  0
M  END

It this reasonable for what I'm doing?

2)  I seem to have a problems with the code which extracts
the fragment from a molecule. Here's a reproducible, where
I think I turn carbon monoxide into a copy of itself, but
the copy can't be used to make a molecule block.

from rdkit import Chem

def subgraph_to_fragment(mol, atom_indices, bond_indices):
    emol = Chem.EditableMol(Chem.Mol())
    atom_map = {}
    for atom_index in atom_indices:
        emol.AddAtom(mol.GetAtomWithIdx(atom_index))
        atom_map[atom_index] = len(atom_map)

    for bond_index in bond_indices:
        bond = mol.GetBondWithIdx(bond_index)
        emol.AddBond(atom_map[bond.GetBeginAtomIdx()],
                     atom_map[bond.GetEndAtomIdx()],
                     bond.GetBondType())

    return emol.GetMol()

mol = Chem.MolFromSmiles("O=C")
bond_indices = [0]
## bond = mol.GetBondWithIdx(0)
## atom_indices = [bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()]
atom_indices = [0, 1]
fragment = subgraph_to_fragment(mol, atom_indices, bond_indices)
print "====Original\n", Chem.MolToMolBlock(mol, kekulize=False)
print "SMILES", Chem.MolToSmiles(fragment)
print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False)

However, that gives the output and exception:

====Original

     RDKit          

  2  1  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0
M  END

SMILES C=O
====Fragment
[03:12:30] 

****
Pre-condition Violation
not initialized
Violation occurred on line 67 in file /tmp/homebrew-rdkit-HEAD-dB4i/Code/GraphMol/RingInfo.cpp
Failed Expression: df_init
****

Traceback (most recent call last):
  File "fail.py", line 26, in <module>
    print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False)
RuntimeError: Pre-condition Violation

What am I doing wrong here?

I know from private discussion with Greg on another topic that
a fix to this is either Chem.GetSSSR(fragment) or
Chem.FastFindRings(fragment), as in, for example:

fragment = subgraph_to_fragment(mol, atom_indices, bond_indices)
Chem.FastFindRings(fragment)

That seems to work for me. Why is it needed? And why *isn't*
it needed for the SMILES output?

When building a molecule from scratch, I'm supposed to call
sanitizeMol on the result in order to perceive the chemistry.
However, I don't want to do that on a fragment for fear that
it will modify chemistry under the assumption that it's a
full molecule.

The only other previous match on Google to "Failed Expression: df_init"
is a bug report from JP, where the same exception is raised
after doing ReplaceSubstructs. Greg wrote:

   Marking this as "Won't Fix", because the current behavior is
   what's supposed to happen: it is expected that the caller
   will take care of sanitizing the results of ReplaceSubstructs
   if they need them sanitized. The chemical reaction code behaves
   similarly.

This seems like a rough edge that's caught a few people
and should be smoothed.

				Andrew
				da...@da...

[Rdkit-discuss] MolBlock for a fragment

Open-Source Cheminformatics and Machine Learning

[Rdkit-discuss] MolBlock for a fragment