Re: [Rdkit-discuss] how to output multiple Kekule structures
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Jason B. <jas...@gm...> - 2017-09-11 21:38:09
|
But keep in mind that the kekulized mols you create with the resonance supplier will not match the SMARTS patterns given. Chem.MolToSmiles(mol2, kekuleSmiles = True) >'C1C=CC=CC=1' mol2.HasSubstructMatch(Chem.MolFromSmarts('[C]=[C]-[C]')) > False mol2.HasSubstructMatch(Chem.MolFromSmarts('[c]=[c]-[c]')) > True So at the very least, you need to change the smarts strings to use [#6] instead of [C] Jason Biggs On Mon, Sep 11, 2017 at 2:53 PM, Paolo Tosco <pao...@un...> wrote: > Hi Jim, > > you can indeed enumerate all Kekulè structures for a molecule within the > RDKit using Chem.ResonanceMolSupplier(): > > from rdkit import Chem > > mol = Chem.MolFromSmiles('c1ccccc1') > > suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL) > > len(suppl) > > 2 > > for i in range(len(suppl)): > print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True)) > > C1C=CC=CC=1 > C1=CC=CC=C1 > > Best, > Paolo > > > On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote: > > Greg, > > Thanks! Yes, very helpful. I will need to digest the detailed > information > you have provided. I am somewhat familiar with recursive SMARTS. Thanks > again. > > Regards, > Jim Metz > > > > > -----Original Message----- > From: Greg Landrum <gre...@gm...> <gre...@gm...> > To: James T. Metz <jam...@ao...> <jam...@ao...> > Cc: RDKit Discuss <rdk...@li...> > <rdk...@li...> > Sent: Mon, Sep 11, 2017 11:15 am > Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures > > > On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz < <jam...@ao...> > jam...@ao...> wrote: > > Greg, > > I need to be able to use SMARTS patterns to identify substructures in > molecules > that can be aromatic, and I need to be able to handle cases where there > can be > differences in the way that the molecule was entered or drawn by a user. > > > That particular problem is a big part of the reason that we tend to use > the aromatic representation of things. > > > For example, consider the following alkenyl-substituted pyridine, there > are two possible Kekule structures > > m1 = 'C=CC1=NC=CC=C1' > m2 = 'C=CC1N=CC=CC1' > > > Fixing what I assume is a typo for m2, I can do the following: > > In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1') > > In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1') > > In [13]: q1 = Chem.MolFromSmarts('cccc') > > In [14]: q2 = Chem.MolFromSmarts('cccn') > > In [15]: list(m1.GetSubstructMatch(q1)) > Out[15]: [2, 7, 6, 5] > > In [16]: list(m1.GetSubstructMatch(q2)) > Out[16]: [6, 5, 4, 3] > > In [17]: list(m2.GetSubstructMatch(q1)) > Out[17]: [2, 7, 6, 5] > > In [18]: list(m2.GetSubstructMatch(q2)) > Out[18]: [6, 5, 4, 3] > > > Those particular queries were going for the aromatic species and will only > match inside the ring, but if you want to be more generic you could tune > your queries like this: > > In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*]) > ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]') > > In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*]) > ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]') > > In [30]: list(m1.GetSubstructMatch(q3)) > Out[30]: [0, 1, 2, 7] > > In [31]: list(m1.GetSubstructMatch(q4)) > Out[31]: [0, 1, 2, 3] > > In [32]: list(m2.GetSubstructMatch(q3)) > Out[32]: [0, 1, 2, 7] > > In [33]: list(m2.GetSubstructMatch(q4)) > Out[33]: [0, 1, 2, 3] > > If you aren't familiar with recursive SMARTS, this construct: > "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an > aromatic bond to another atom". So you can interpret q3 as "four carbons > that each have either a double or aromatic bond and that are connected to > each other by single, double, or aromatic bonds". > > Is this starting to approximate what you're looking for? > -greg > > > > > Now consider two SMARTS > > pattern1 = '[C]=[C]-[C]={C] > pattern2 = '[C]=[C]-[C]=[N]' > > I need to be able to detect the existence of each pattern in the > molecule > > If m1 is the only available generated Kekule structure, then pattern2 > will be recognized. > If m2 is the only available generated Kekule structure, then pattern1 > will be recognized. > > Hence, I am getting different answers for the same input molecule just > because > it was drawn in different Kekule structures. > > Regards, > Jim Metz > > > > > > -----Original Message----- > From: Greg Landrum < <gre...@gm...>gre...@gm...> > To: James T. Metz < <jam...@ao...>jam...@ao...> > Cc: RDKit Discuss <rdk...@li...> > Sent: Mon, Sep 11, 2017 10:31 am > Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures > > Hi Jim, > > The code currently has no way to enumerate Kekule structures. I don't > recall this coming up in the past and, to be honest, it doesn't seem all > that generally useful. > > Perhaps there's an alternate way to solve the problem; what are you trying > to do? > > -greg > > > On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss < > rdk...@li...> wrote: > > Hello, > > Suppose I read in an aromatic SMILES e.g., for benzene > > c1ccccc1 > > I would like to generate the major canonical resonance forms > and save the results as two separate molecules. Essentially > I am trying to generate > > m1 = 'C1=CC=CC-C1' > m2 = 'C1C=CC=CC1' > > Can this be done in RDkit? I have found a KEKULE_ALL > option in the detailed documentation which seems to be what I > am trying to do, but I don't understand how this option is to be used, > or the proper syntax. > > If it is necessary to somehow renumber the atoms and re-generate > Kekule structures, that is OK. Thank you. > > Regards, > Jim Metz > > > > > > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Rdkit-discuss mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > |