Re: [Rdkit-discuss] Faulty valence for nitrogen in aromatic ring

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Greg,

thanks for your prompt and thorough reply. Wow!

> Now that I've at least tried to clear up what is going on, maybe I can
> be more helpful: was there a specific question you were trying to
> answer that led you to your discovery that the RDKit behaves strangely
> in this special case?

What I'm trying to do can be inspected here:

http://chimpsky.uwaterloo.ca/mol2chemfig/index

Briefly, I'm building a program for converting molecular structures  
from smiles or molfile format to TeX code, using the syntax defined by  
the chemfig package as the target. rdkit does all the heavy lifting. I  
was using the GetImplicitHs method to determine how many hydrogens to  
attach to carbons and heteroatoms and then noticed that the number of  
hydrogens on nitrogen in rings was off.

 From your answer, it seems I should be using GetTotalNumHs. However,  
I would still like to be able to distinguish between hydrogens that  
were specified in a molfile, with coordinates, and those that weren't.

Another question I ran into was accessing the coordinates of an atom,  
either loaded from molfile or, with smiles, computed with  
AllChem.Compute2DCoords. Does the atom object have a method to get at  
those? Right now, I'm using some embarrassing workaround.

Thanks and best, Michael

Quoting Greg Landrum <gre...@gm...>:

> Hi Michael,
>
> On Tue, Aug 14, 2012 at 7:06 PM, Michael Palmer <mp...@uw...> wrote:
>>
>> it seems that nitrogen atoms in aromatic rings are being assigned
>> faulty valences. See example below:
>>
>>>>> from rdkit import Chem
>>>>> pyrrole = Chem.MolFromSmiles('C1=CNC=C1')
>>>>> for atom in pyrrole.GetAtoms():
>> ...  print atom.GetSymbol(), atom.GetExplicitValence(),
>> atom.GetNumImplicitHs()
>> ...
>> C 3 1
>> C 3 1
>> N 3 0
>> C 3 1
>> C 3 1
>>
>> The nitrogen should have a valence of two and one implicit hydrogen.
>
> This is confusing because of a bad choice of terminology that I made a
> long time ago.
>
> First the terminology bit: valence has to do with bond orders, degree
> has to do with neighbors. Most of the time (the bad exception is
> explained below), explicit refers to atoms that are in the graph and
> implicit refers to atoms that are not in the graph (i.e. Hs).
>
> So, given that the ring is aromatic, the explicit valence of each of
> the atoms (ignores the Hs that are not present in the graph) in
> pyrrole is 3:
>
>  In [6]: for atom in pyrrole.GetAtoms():
>     print atom.GetSymbol(), atom.GetExplicitValence()
>    ...:
> C 3
> C 3
> N 3
> C 3
> C 3
>
> If you want the H count, you should use GetTotalNumHs(); the total
> number of Hs for each atom is one:
>
> In [8]: for atom in pyrrole.GetAtoms():
>     print atom.GetSymbol(), atom.GetTotalNumHs()
>    ...:
> C 1
> C 1
> N 1
> C 1
> C 1
>
> Unfortunately, due to a bad decision from >10 years ago, there's some
> overlapping and confusing nomenclature around the use of the words
> "explicit" and "implicit" when it comes to Hs. When you specify the Hs
> for an atom inside of square brackets in the SMILES, it becomes an
> "explicit" H as far as atom.GetNumExplicitHs() is concerned.
>
> Here's an example:
>
> In [14]: pyrrole = Chem.MolFromSmiles('C1=CNC=C1')
>
> In [15]: mol1 = Chem.MolFromSmiles('C1=CNCC1')
>
> In [16]: mol2 = Chem.MolFromSmiles('C1=C[NH]CC1')
>
> In [17]: for atom in pyrrole.GetAtoms():
>     print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs()
>    ....:
> C 0 1
> C 0 1
> N 1 0
> C 0 1
> C 0 1
>
> In [18]: for atom in mol1.GetAtoms():
>     print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs()
>    ....:
> C 0 1
> C 0 1
> N 0 1
> C 0 2
> C 0 2
>
> In [19]: for atom in mol2.GetAtoms():
>     print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs()
>    ....:
> C 0 1
> C 0 1
> N 1 0
> C 0 2
> C 0 2
>
> This is really exposing an implementation detail as far as the end
> user/client is concerned, it creates confusion, and it's not something
> that should be there. It definitely could be fixed, but I am afraid
> that it would break a lot of code.
>
> Now that I've at least tried to clear up what is going on, maybe I can
> be more helpful: was there a specific question you were trying to
> answer that led you to your discovery that the RDKit behaves strangely
> in this special case?
>
> Does this help?
> -greg
>

-- 
Michael Palmer
University of Waterloo
Waterloo, Canada
science.uwaterloo.ca/~mpalmer

Re: [Rdkit-discuss] Faulty valence for nitrogen in aromatic ring

Open-Source Cheminformatics and Machine Learning

Re: [Rdkit-discuss] Faulty valence for nitrogen in aromatic ring