Re: [Rdkit-discuss] can't kekulize molecule
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Francois B. <ber...@bi...> - 2017-08-16 07:55:55
|
On 08/16/2017 03:36 PM, Greg Landrum wrote: > Hi Shuai, > > The RDKit Mol2 parser is really only validated for the atom types > generated by corina. I'm not surprised that the ouput from open babel > would not be understood. This is documented: > http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File It would be really nice if open babel MOL2 output could directly be read in by rdkit. I often find myself running $ obabel in.mol2 -O out.sdf just for that purpose. > An aside: If you have an SDF file you can read that directly into the > RDKit. It seems like you shouldn't need the openbabel translation step > at all. > > -greg > > > On Wed, Aug 16, 2017 at 12:13 AM, David Liu <sdh...@gm... > <mailto:sdh...@gm...>> wrote: > > Dear all, > > I have troubles to kekulize molecule using rdkit, below is an example: > > The example.mol2 file looks like below: > > @MOLECULE > example > 46 49 0 0 0 > SMALL > GASTEIGER > > @ATOM > 1 C -4.5556 -0.2844 1.1718 C.3 1 LIG1 -0.0109 > 2 C -6.0291 -0.7271 1.2334 C.3 1 LIG1 0.0493 > 3 C -6.4413 -0.5958 -1.0493 C.3 1 LIG1 0.0493 > 4 C -5.1977 0.3130 -1.1927 C.3 1 LIG1 -0.0109 > 5 C 5.5992 -2.5640 -0.8780 C.ar 1 LIG1 -0.0253 > 6 O -6.3822 -1.4588 0.0764 O.3 1 LIG1 -0.3796 > 7 C 2.8943 1.6722 0.9911 C.ar 1 LIG1 0.2664 > 8 C 5.1745 -2.0407 0.3480 C.ar 1 LIG1 0.1371 > 9 C -1.6179 0.4017 0.1577 C.ar 1 LIG1 0.2173 > 10 C -4.0573 -0.1702 -0.2838 C.3 1 LIG1 0.0275 > 11 C 0.8767 -0.2307 1.1489 C.ar 1 LIG1 0.0370 > 12 C 2.1438 -0.5325 1.6439 C.ar 1 LIG1 -0.0306 > 13 C 6.1958 -1.7294 -1.8279 C.ar 1 LIG1 -0.0590 > 14 C 6.3717 -0.3702 -1.5525 C.ar 1 LIG1 -0.0605 > 15 C 5.9487 0.1564 -0.3282 C.ar 1 LIG1 -0.0452 > 16 C 0.6358 1.0320 0.5744 C.ar 1 LIG1 0.1483 > 17 C -0.1716 -1.1537 1.2042 C.ar 1 LIG1 0.0418 > 18 C 3.1618 0.4153 1.5592 C.ar 1 LIG1 0.0780 > 19 C 5.3424 -0.6749 0.6231 C.ar 1 LIG1 0.0480 > 20 C 1.3530 3.2786 -0.1013 C.3 1 LIG1 0.0167 > 21 F 4.6032 -2.8623 1.2640 F 1 LIG1 -0.2043 > 22 S 4.7969 0.0115 2.1898 S.3 1 LIG1 -0.0812 > 23 N -1.3906 -0.8211 0.7091 N.ar 1 LIG1 -0.2222 > 24 O 3.8206 2.5277 0.9363 O.2 1 LIG1 -0.2664 > 25 N 1.6412 1.9659 0.5033 N.ar 1 LIG1 -0.2949 > 26 N -0.6088 1.3106 0.0937 N.ar 1 LIG1 -0.1964 > 27 N -2.9091 0.7394 -0.3655 N.pl3 1 LIG1 -0.3104 > 28 H -3.9262 -1.0225 1.7144 H 1 LIG1 0.0305 > 29 H -4.4544 0.6942 1.6907 H 1 LIG1 0.0305 > 30 H -6.1785 -1.3738 2.1237 H 1 LIG1 0.0560 > 31 H -6.6965 0.1565 1.3647 H 1 LIG1 0.0560 > 32 H -7.3658 0.0220 -1.0063 H 1 LIG1 0.0560 > 33 H -6.5227 -1.2302 -1.9574 H 1 LIG1 0.0560 > 34 H -4.8575 0.3261 -2.2513 H 1 LIG1 0.0305 > 35 H -5.4753 1.3532 -0.9112 H 1 LIG1 0.0305 > 36 H 5.4676 -3.6168 -1.0922 H 1 LIG1 0.0646 > 37 H -3.7461 -1.1771 -0.6436 H 1 LIG1 0.0500 > 38 H 2.3428 -1.4998 2.0895 H 1 LIG1 0.0638 > 39 H 6.5237 -2.1362 -2.7758 H 1 LIG1 0.0618 > 40 H 6.8363 0.2748 -2.2870 H 1 LIG1 0.0618 > 41 H 6.0904 1.2094 -0.1219 H 1 LIG1 0.0630 > 42 H -0.0243 -2.1352 1.6372 H 1 LIG1 0.0838 > 43 H 2.2342 3.9528 -0.1073 H 1 LIG1 0.0457 > 44 H 0.5450 3.7853 0.4685 H 1 LIG1 0.0457 > 45 H 1.0258 3.1432 -1.1544 H 1 LIG1 0.0457 > 46 H -3.0166 1.6655 -0.8392 H 1 LIG1 0.1492 > @BOND > 1 1 2 1 > 2 1 10 1 > 3 2 6 1 > 4 3 4 1 > 5 3 6 1 > 6 4 10 1 > 7 5 8 ar > 8 5 13 ar > 9 7 18 ar > 10 7 24 2 > 11 7 25 ar > 12 8 19 ar > 13 8 21 1 > 14 9 23 ar > 15 9 26 ar > 16 9 27 1 > 17 10 27 1 > 18 11 12 ar > 19 11 16 ar > 20 11 17 ar > 21 12 18 ar > 22 13 14 ar > 23 14 15 ar > 24 15 19 ar > 25 16 25 ar > 26 16 26 ar > 27 17 23 ar > 28 18 22 1 > 29 19 22 1 > 30 20 25 1 > 31 1 28 1 > 32 1 29 1 > 33 2 30 1 > 34 2 31 1 > 35 3 32 1 > 36 3 33 1 > 37 4 34 1 > 38 4 35 1 > 39 5 36 1 > 40 10 37 1 > 41 12 38 1 > 42 13 39 1 > 43 14 40 1 > 44 15 41 1 > 45 17 42 1 > 46 20 43 1 > 47 20 44 1 > 48 20 45 1 > 49 27 46 1 > > And the example.py code looks like > > from rdkit.Chem import AllChem > from rdkit import Chem > > rdkit_mol = Chem.MolFromMol2File("example.mol2", sanitize=False, > removeHs=False) > mol = AllChem.RemoveHs(rdkit_mol) > > If running the example.py, it returns an error as below: > > ValueError: Sanitization error: Can't kekulize mol. Unkekulized > atoms: 8 10 11 15 16 17 22 24 25 > > It seems rdkit cannot understand the molecules when it try to remove > the hydrogens, probably related to the format of the mol2 file I > used here? I use openbabel to convert the mol2 file from an sdf > file. So I wonder if there is a plan to parse the mol2 file like > this or I need to further cook the mol2 file. I appreciate for any > advices! > > > Thanks, > > Shuai > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > <mailto:Rdk...@li...> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |