Re: [Rdkit-devel] Rethinking the RDKit's implicit hydrogen handling
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Greg L. <gre...@gm...> - 2012-12-10 17:04:12
|
On Mon, Dec 10, 2012 at 4:51 PM, Andrew Dalke <da...@da...>wrote:
> On Dec 8, 2012, at 7:01 AM, Greg Landrum wrote:
> > It's a pretty small API change, but there's a huge amount of code that
> needs to be changed in the back and lots and lots of testing that has to be
> done, so this is going to take a while.
>
> Speaking of hydrogens, I came across a strange query (part of the
> BindingDB_structure set from my Structure Query Collection).
>
> C1N[H]OC[H]1
>
> Because of RDKit's sanitization, the '[H]' gets absorbed into the atoms.
> This breaks a bond, so what started as a ring with poor chemistry ends
> up as two linear pieces with sane chemistry.
>
> >>> mol = Chem.MolFromSmiles("C1N[H]OC[H]1")
> >>> Chem.MolToSmiles(mol)
> 'CN.OC'
> >>>
>
> The above query is nonsensical, but I don't think sanitization should
> modify the topology of the input.
>
>
That's hilarious.
Wrong, but hilarious.
Also easy to prevent from happening so that you at least get "unallowed
valence" errors.
Divalent hydrogen does exist in a very small number of real records.
> I once came across a couple of structure with a 4-membered boron/hydrogen
> ring, like this:
>
> >>> mol = Chem.MolFromSmiles("[B]1[H][B][H]1")
> >>> Chem.MolToSmiles(mol)
> '[BH].[BH]'
> >>>
>
> I am unable to find a public record with that core, to act as a
> more realistic test case.
The bonds in those molecules are not classic electron pair bonds (they are
three-center--two-electron bonds), so this isn't going to produce a
reasonable molecule, but it should at least fail correctly.
-greg
|