Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2017-02-21 06:36:25
|
On Mon, Feb 20, 2017 at 6:17 PM, Thomas Evangelidis <te...@gm...> wrote: > > Thank you for your useful hints. All the compounds that I want to align > are supposed to belong to the same analogue series so they should shave a > common substructure with substantial size. > In that case, using an MCS based alignment should work reasonably well, particularly if you do the MCS of the entire series instead of doing it pairwise. The approach there would be to find the MCS and then use the RDKit's RMSD-based alignment (AllChem.AlignMol) where you provide the atomMap argument to specify the atom--atom mapping for alignment. Here's a short example of how to do that: # generate 3d structures (in case you don't have them already): mhs = [Chem.AddHs(x) for x in mols] [AllChem.EmbedMolecule(x,AllChem.ETKDG()) for x in mhs] mols = [Chem.RemoveHs(x) for x in mhs] # Find the MCS: from rdkit.Chem import rdFMCS mcs = rdFMCS.FindMCS(mols,threshold=0.8,completeRingsOnly=True,ringMatchesRingOnly=True) # align everything to the first molecule patt = Chem.MolFromSmarts(mcs.smartsString) refMol = mols[0] refMatch = refMol.GetSubstructMatch(patt) rmsVs = [] for probeMol in mols[1:]: mv = probeMol.GetSubstructMatch(patt) rms = AllChem.AlignMol(probeMol,refMol,atomMap=list(zip(mv,refMatch))) rmsVs.append(rms) What I want to emulate is the "core restrained docking" with glide, where > you specify the common core of the query and the reference ligand using a > SMARTS pattern and then glide docks the query compound to the binding > pocket but takes care to overlay the core atoms of the query to the core > atoms of the reference compound. Since RDKit does not do docking, I just > generate 30 conformers of each query compound and select the best one by > measuring the RMSD between the core of the query and the core of the > reference after the alignment. Of course the conformations of the core > atoms between the query and the reference are never identical hence the bad > alignment. Is there any smarter way to emulate the "core restrained > docking" with RDKit? > The docking part is not doable in any straightforward way at the moment since it's hard to take information about the protein into account. There's an idea for a student summer project to solve this problem floating around, let's see if that gets funded and we find the right student. If the goal is to generate a set of conformations where cores are aligned with each other, this blog post may be interesting: http://rdkit.blogspot.ch/2013/12/using-allchemconstrainedembed.html -greg |