Re: [Rdkit-discuss] MolFromInchi with Amides
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Markus S. <mar...@gm...> - 2018-06-15 05:26:24
|
Hi Jeff,
That is because InChI is a structure identifier, not a structure representation. The difference of both is, a structure identifier normalizes the structure to a form which it regards as the standard representation of the molecule in order to make the molecule identifiable regardless of the state the molecule is coming in from a input resource (and hence calculates the same identifier).
For Standard InChI, the decision was made to make them insensitive to tautomers (within the limitations of the InChI algorithm). Kind of unluckily, this normalizes most amides to a form that chemists regard as the incorrect one. And the second unlucky thing is that you can convert the InChI back to a structure representation which then is of course the normalized or standardized form of the molecule.
So if you want to make sure to keep the original representation of a molecule don’t use InChI as your representation format (calculate InChI as an identifier field next to it). If your input resource only provides InChI or Standard InChI then your are of course out of luck.
Best,
Markus
-------------------------------------
| Markus Sitzmann
| mar...@gm...
> On 14. Jun 2018, at 23:33, Jeff van Santen <jef...@sf...> wrote:
>
> Hi all,
>
>
> I have some questions about how remit handles amides. For context, I am working with a large set of molecules, many of which contain peptides. I have been running into a problem with using rdkit, in that when I try to load a molecule from the InChI, the wrong tautomer is loaded. As a simple example consider acetamide:
>
>
> """
>
> FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
> > 0
>
> print(Chem.MolToSmiles(FromInchi))
>
> > CC(=N)O
>
>
>
> FromSmiles = Chem.MolFromSmiles('CC(=O)N')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
> > 1
>
> print(Chem.MolToSmiles(FromSmiles))
>
> > CC(=N)O
>
> """
>
>
> I realize that Standard InChi does not have a mechanism for distinguishing between the two tautomers, so I am wondering why rdkit considers the iminol to be a better representation? Also, there is anyway to get the amide instead? (Without using MolVS)
>
>
> Thanks,
>
> Jeff
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
|