Re: [Rdkit-discuss] Save files with new atom properties and read again
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Jan H. J. <ja...@bi...> - 2015-07-05 08:49:19
|
Hi Greg,
I was running with both RDKit 2014.09.2 and 2013.09.1. "V " lines
accepted and read in input, but no "V " lines produced in mol block output.
I downloaded 2015.03.1 and that produces "V " lines in mol block
output. Another good reason to upgrade :-).
2014.09.2:
>>> from rdkit import rdBase
>>> rdBase.rdkitVersion
'2014.09.2'
>>> from rdkit import Chem
>>> m = Chem.MolFromSmiles('CO')
>>> m.GetAtomWithIdx(0).SetProp('molFileValue','a1')
>>> print Chem.MolToMolBlock(m)
RDKit
2 1 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
M END
>>>
2015.03.1:
>>> from rdkit import rdBase
>>> rdBase.rdkitVersion
'2015.03.1'
>>> from rdkit import Chem
>>> m = Chem.MolFromSmiles('CO')
>>> m.GetAtomWithIdx(0).SetProp('molFileValue','a1')
>>> print Chem.MolToMolBlock(m)
RDKit
2 1 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
V 1 a1
M END
>>>
Cheers
-- Jan
On 2015-07-05 04:59, Greg Landrum wrote:
> Hmm, that's strange. Unless that atom also has a query associated with
> it, the atom values definitely should be written. It certainly works
> at least some of the time:
>
> In [5]: m = Chem.MolFromSmiles('CO')
>
> In [6]: m.GetAtomWithIdx(0).SetProp('molFileValue','a1')
>
> In [7]: mb = Chem.MolToMolBlock(m)
>
> In [8]: print(mb)
>
> RDKit
>
> 2 1 0 0 0 0 0 0 0 0999 V2000
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
> 1 2 1 0
> V 1 a1
> M END
>
> Jan: what version of the RDKit are you using?
>
> -greg
>
>
>
> On Sat, Jul 4, 2015 at 8:08 PM, Jan Holst Jensen
> <ja...@bi... <mailto:ja...@bi...>> wrote:
>
> Hi Hitesh,
>
> The V2000 molfile format has a feature that can be used to set a
> simple text value for an atom by adding "V " lines to the
> molfile. The RDKit molfile *reader* supports this feature as seen
> below (I have seen this feature used to e.g. tag reactive centers
> in a molecule when doing RDKit reaction-based enumeration).
>
> from rdkit import Chem molfile_with_values =
>
> >>> "".join(open("C:/temp/cns-with-values.mol").readlines()) print
> >>> molfile_with_values
>
> -ISIS- 07041519212D
>
> 3 2 0 0 0 0 0 0 0 0999 V2000
> 0.0958 -2.6833 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.8083 -2.2708 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
> 1.5208 -2.6792 0.0000 <tel:2.6792%20%20%20%200.0000> S 0
> 0 0 0 0 0 0 0 0 0 0 0
> 1 2 1 0 0 0 0
> 2 3 1 0 0 0 0
> V 1 Carbs
> V 3 Sulfuric
> M END
>
> m = Chem.MolFromMolBlock(molfile_with_values)
>
> >>> m.GetAtoms()[0].GetProp('molFileValue')
> 'Carbs'
>
> m.GetAtoms()[1].GetProp('molFileValue')
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> KeyError: 'molFileValue'
>
> m.GetAtoms()[2].GetProp('molFileValue')
>
> 'Sulfuric'
>
>
>
> As you can see, the "V " lines in the molfile are put into RDKit
> atom "molFileValue" properties.
>
> Unfortunately, the atom values are not written when RDKit outputs
> a molfile:
>
> print Chem.MolToMolBlock(m)
>
>
> RDKit 2D
>
> 3 2 0 0 0 0 0 0 0 0999 V2000
> 0.0958 -2.6833 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.8083 -2.2708 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
> 1.5208 -2.6792 0.0000 <tel:2.6792%20%20%20%200.0000> S 0
> 0 0 0 0 0 0 0 0 0 0 0
> 1 2 1 0
> 2 3 1 0
> M END
>
>
>
> But, it is fairly easy to add them with this function:
>
> def MolToMolBlock_WithAtomValues(mol):
> mol_block = Chem.MolToMolBlock(mol).split("\n")
> # Delete the "M END" line.
> mol_block = mol_block[:-2]
> # Add appropriate "V" lines.
> for atom in mol.GetAtoms():
> if atom.HasProp("molFileValue"):
> mol_block.append("V %3d %s" % (atom.GetIdx() + 1,
> atom.GetProp("molFileValue")))
>
> mol_block.append("M END")
> return "\n".join(mol_block)
>
> This lets you persist atom text values. Disclaimer: I have no idea
> if this will break in the presence of other property lines, e.g.
> "M CHG" etc., but ... it's a start.
>
> As an example, let's first create the "CNS" molecule without atom
> values.
>
> m = Chem.MolFromSmiles("CNS") print
>
> >>> MolToMolBlock_WithAtomValues(m)
>
> RDKit
>
> 3 2 0 0 0 0 0 0 0 0999 V2000
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0
> 1 2 1 0
> 2 3 1 0
> M END
>
>
>
> Add two atom values and use the new function to persist the atom
> values to the V2000 molfile output:
>
> m.GetAtoms()[0].SetProp("molFileValue", "C-atom")
>
> >>> m.GetAtoms()[2].SetProp("molFileValue", "Here is an S-atom")
> >>> print MolToMolBlock_WithAtomValues(m)
>
> RDKit
>
> 3 2 0 0 0 0 0 0 0 0999 V2000
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0
> 1 2 1 0
> 2 3 1 0
> V 1 C-atom
> V 3 Here is an S-atom
> M END
>
>
>
> Check that the output can be read back in:
>
> molblock_test = MolToMolBlock_WithAtomValues(m) m_test =
>
> >>> Chem.MolFromMolBlock(molblock_test)
> >>> m_test.GetAtoms()[0].GetProp("molFileValue")
> 'C-atom'
>
> m_test.GetAtoms()[1].GetProp("molFileValue")
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> KeyError: 'molFileValue'
>
> m_test.GetAtoms()[2].GetProp("molFileValue")
>
> 'Here is an S-atom'
>
>
>
> If you have multiple properties they would have to be encoded into
> the text value as e.g. key-value pairs. The text values are in
> principle limited to max. 70-80 characters (72 ?) by the MDL
> molfile specification, but RDKit probably accepts longer strings
> (I would guess, but have not tried).
>
> A more generic solution would be to map RDKit atom and bond
> properties to molfile S-group data - but that's a bit more
> involved and is not supported at the moment.
>
> Cheers
> -- Jan Holst Jensen
>
> On 2015-07-04 18:28, Greg Landrum wrote:
>
> Hi,
>
> >
> > On Friday, July 3, 2015, Hitesh Patel <hit...@gm...
> <mailto:hit...@gm...>
> > <mailto:hit...@gm...
> <mailto:hit...@gm...>>> wrote:
> >
> > Hi Greg,
> >
> > At first priority, I will use mol2 format. A s shown in mol2 format
> > explanation, we can set user specified atom attributes. I copied the
> > text below for your convenience. See the bold text.
> >
> >
> > The rdkit does not yet have a mol2 writer, so that isn't an option.
> >
> >
> > For second priority, I can use mol files. There I have to set
> > Properties block:
> >
> > * |M ALS| - atom list and exclusive list * |M APO| - Rgroup
> > attachment point * |M CHG| - charge * .....
> >
> > But, I am not sure, whether the user defined property block is
> > allowed or not.
> >
> >
> > M CHG and M ALS are already used by the rdkit when atoms have
> charges
> > or there are list queries. M APO is not used, but that's because I
> > have never managed to figure out how to adapt the MDL R Group
> idea to
> > something sensible in the context of the rdkit.
> >
> > Which one looks feasible??
> >
> >
> > I'm still not sure of what kind of custom properties you are looking
> > to write.
> >
> > -greg
> >
> >
> >
> >
> >
> > On Fri, Jul 3, 2015 at 4:45 PM Greg Landrum
> <gre...@gm... <mailto:gre...@gm...>
> > <javascript:_e(%7B%7D,'cvml','gre...@gm...
> <mailto:gre...@gm...>');>> wrote:
> >
> > Hitesh,
> >
> > It is certainly possible to set atom properties. I don't think
> any of
> > the output formats the rdkit can generate really support atom
> > properties though. What format did you envision writing and how
> would
> > the atom properties be encoded?
> >
> > -greg
> >
> >
> > On Friday, July 3, 2015, Hitesh Patel <hit...@gm...
> <mailto:hit...@gm...>
> > <javascript:_e(%7B%7D,'cvml','hit...@gm...
> <mailto:hit...@gm...>');>> wrote:
> >
> > Hi Josh, Thanks for your quick reply. But, sorry, I want to set atom
> > properties, not molecule properties. Like,
> >
> > atom = m.GetAtomWithIdx(5) atom.SetProp('my_property',
> > 'value_of_my_property')
> >
> > I want to save this property associated with each atom.
> >
> > Regards,
> >
> > Hitesh Patel
> >
> > On Fri, Jul 3, 2015 at 3:41 PM, Campbell J.E.
> <je...@so... <mailto:je...@so...>>
> > wrote:
> >
> > Hi Hitesh
> >
> >
> >
> > I use the PropertyMol object to save molecules with properties,
> > setting a property for a molecule is fairly simple,
> >
> >
> >
> > m.SetProp("_Name",”mol_name")
> >
> >
> >
> > for m in mol_lst:
> >
> > pm = AllChem.PropertyMol(m)
> >
> > pm.SetProp("_Name", name)
> >
> > pm.SetProp("_Energy", None)
> >
> > dump_list.append(pm)
> >
> > cPickle.dump(dump_list, open(p_name, "w+"))
> >
> >
> >
> > Then something like this will allow you to act on the molecules
> > again.
> >
> >
> >
> > mol_list = cPickle.load( open(p_name, "rb" ) )
> >
> >
> >
> > Hope this helps.
> >
> >
> >
> > Josh Campbell
> >
> >
> >
> > *From:*Hitesh Patel [mailto:hit...@gm...
> <mailto:hit...@gm...>] *Sent:* 03 July
> > 2015 14:21 *To:* rdk...@li...
> <mailto:rdk...@li...> *Subject:*
>
> > [Rdkit-discuss] Save files with new atom properties and read again
> >
> >
> >
> > Hi there,
> >
> > I am new to RDkti.
> >
> > Is there a way to save custom property for each atoms and save that
> > to any file format and use it again?
> >
> >
> > --
> >
> > Regards,
> >
> > Dr. Hitesh Patel Post-Doctoral Fellow, Technische Universität
> > Dortmund, Chemische Biologie, Otto-Hahn-Straße 6, 44227, Dortmund,
> > Germany
> >
>
>
>
|