Re: [Rdkit-discuss] Faulty valence for nitrogen in aromatic ring
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Michael P. <mp...@uw...> - 2012-08-15 15:10:14
|
Hi Greg, thanks for your prompt and thorough reply. Wow! > Now that I've at least tried to clear up what is going on, maybe I can > be more helpful: was there a specific question you were trying to > answer that led you to your discovery that the RDKit behaves strangely > in this special case? What I'm trying to do can be inspected here: http://chimpsky.uwaterloo.ca/mol2chemfig/index Briefly, I'm building a program for converting molecular structures from smiles or molfile format to TeX code, using the syntax defined by the chemfig package as the target. rdkit does all the heavy lifting. I was using the GetImplicitHs method to determine how many hydrogens to attach to carbons and heteroatoms and then noticed that the number of hydrogens on nitrogen in rings was off. From your answer, it seems I should be using GetTotalNumHs. However, I would still like to be able to distinguish between hydrogens that were specified in a molfile, with coordinates, and those that weren't. Another question I ran into was accessing the coordinates of an atom, either loaded from molfile or, with smiles, computed with AllChem.Compute2DCoords. Does the atom object have a method to get at those? Right now, I'm using some embarrassing workaround. Thanks and best, Michael Quoting Greg Landrum <gre...@gm...>: > Hi Michael, > > On Tue, Aug 14, 2012 at 7:06 PM, Michael Palmer <mp...@uw...> wrote: >> >> it seems that nitrogen atoms in aromatic rings are being assigned >> faulty valences. See example below: >> >>>>> from rdkit import Chem >>>>> pyrrole = Chem.MolFromSmiles('C1=CNC=C1') >>>>> for atom in pyrrole.GetAtoms(): >> ... print atom.GetSymbol(), atom.GetExplicitValence(), >> atom.GetNumImplicitHs() >> ... >> C 3 1 >> C 3 1 >> N 3 0 >> C 3 1 >> C 3 1 >> >> The nitrogen should have a valence of two and one implicit hydrogen. > > This is confusing because of a bad choice of terminology that I made a > long time ago. > > First the terminology bit: valence has to do with bond orders, degree > has to do with neighbors. Most of the time (the bad exception is > explained below), explicit refers to atoms that are in the graph and > implicit refers to atoms that are not in the graph (i.e. Hs). > > So, given that the ring is aromatic, the explicit valence of each of > the atoms (ignores the Hs that are not present in the graph) in > pyrrole is 3: > > In [6]: for atom in pyrrole.GetAtoms(): > print atom.GetSymbol(), atom.GetExplicitValence() > ...: > C 3 > C 3 > N 3 > C 3 > C 3 > > If you want the H count, you should use GetTotalNumHs(); the total > number of Hs for each atom is one: > > In [8]: for atom in pyrrole.GetAtoms(): > print atom.GetSymbol(), atom.GetTotalNumHs() > ...: > C 1 > C 1 > N 1 > C 1 > C 1 > > Unfortunately, due to a bad decision from >10 years ago, there's some > overlapping and confusing nomenclature around the use of the words > "explicit" and "implicit" when it comes to Hs. When you specify the Hs > for an atom inside of square brackets in the SMILES, it becomes an > "explicit" H as far as atom.GetNumExplicitHs() is concerned. > > Here's an example: > > In [14]: pyrrole = Chem.MolFromSmiles('C1=CNC=C1') > > In [15]: mol1 = Chem.MolFromSmiles('C1=CNCC1') > > In [16]: mol2 = Chem.MolFromSmiles('C1=C[NH]CC1') > > In [17]: for atom in pyrrole.GetAtoms(): > print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs() > ....: > C 0 1 > C 0 1 > N 1 0 > C 0 1 > C 0 1 > > In [18]: for atom in mol1.GetAtoms(): > print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs() > ....: > C 0 1 > C 0 1 > N 0 1 > C 0 2 > C 0 2 > > In [19]: for atom in mol2.GetAtoms(): > print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs() > ....: > C 0 1 > C 0 1 > N 1 0 > C 0 2 > C 0 2 > > This is really exposing an implementation detail as far as the end > user/client is concerned, it creates confusion, and it's not something > that should be there. It definitely could be fixed, but I am afraid > that it would break a lot of code. > > Now that I've at least tried to clear up what is going on, maybe I can > be more helpful: was there a specific question you were trying to > answer that led you to your discovery that the RDKit behaves strangely > in this special case? > > Does this help? > -greg > -- Michael Palmer University of Waterloo Waterloo, Canada science.uwaterloo.ca/~mpalmer |