Re: [Rdkit-discuss] Capturing offending atom in error message
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2019-10-07 04:53:09
|
This doesn't immediately help, but it's worth mentioning the upcoming 2019.09 release has functionality that should help here: In [18]: m = Chem.MolFromSmiles('CN(C)(C)C',sanitize=False) In [19]: problems = Chem.DetectChemistryProblems(m) [06:47:43] Explicit valence for atom # 1 N, 4, is greater than permitted In [20]: len(problems) Out[20]: 1 In [21]: problems[0].GetType() Out[21]: 'AtomValenceException' In [22]: problems[0].GetAtomIdx() Out[22]: 1 In [23]: problems[0].Message() Out[23]: 'Explicit valence for atom # 1 N, 4, is greater than permitted' In [24]: m2 = Chem.MolFromSmiles('c1cncc1',sanitize=False) In [25]: problems = Chem.DetectChemistryProblems(m2) [06:48:19] Can't kekulize mol. Unkekulized atoms: 0 1 2 3 4 In [26]: len(problems) Out[26]: 1 In [27]: problems[0].GetType() Out[27]: 'KekulizeException' In [28]: problems[0].GetAtomIndices() Out[28]: (0, 1, 2, 3, 4) In [29]: problems[0].Message() Out[29]: "Can't kekulize mol. Unkekulized atoms: 0 1 2 3 4\n" For your case, since you have Hs and bonds, I would suggest directly setting the charge on any 4-valent neutral nitrogen to +1. One thing to also check is what the representation you are using does for nitro groups. -greg On Fri, Oct 4, 2019 at 6:54 PM Chaya Stern <cha...@ch...> wrote: > Hello all, > > I am trying to create a molecule from geometry (a numpy array n_atoms x > 3), symbols (list of atom symbols) and a connectivity map (list of list > where each list is [atom_1_idx, atom_2_idx, bond_type]). The information > also has all hydrogens. The following code works most of the time: > > from rdkit import Chem > from rdkit.Geometry.rdGeometry import Point3D > > _BO_DISPATCH_TABLE = {1: Chem.BondType.SINGLE, 2: Chem.BondType.DOUBLE, 3: > Chem.BondType.TRIPLE} > > conformer = Chem.Conformer(len(symbols)) > > molecule = Chem.Mol() > em = Chem.RWMol(molecule) > for i, s in enumerate(symbols): > atom = em.AddAtom(Chem.Atom(cmiles.utils._symbols[s])) > atom_position = Point3D(geometry[i][0], geometry[i][1], geometry[i][2]) > conformer.SetAtomPosition(atom, atom_position) > > # Add connectivity > for bond in connectivity: > bond_type = _BO_DISPATCH_TABLE[bond[-1]] > em.AddBond(bond[0], bond[1], bond_type) > > molecule = em.GetMol() > Chem.SanitizeMol(molecule) > > However, if a molecule has a tetravalent nitrogen, the data that I have > does not have the explicit formal charge for each atom so I get the > following error: > > ValueError: Sanitization error: Explicit valence for atom # 0 N, 4, is greater than permitted > > > Given that I have all the hydrogen and the total charge of the molecules, I can go in and add the charge to the problematic nitrogen and check that the total charge is still the same. But I am not sure how to capture the offending atom instance. I can get the information from parsing the error message (which is the hack I use now) but I was wondering if there is a better way to do it. > > > Thank you, > > Chaya > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |