Re: [Rdkit-discuss] Conformer generation
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2017-10-26 15:18:29
|
I was about to reply "yeah, the first ID is the cluster centroid", but that may not be true for all definitions of "centroid". According to the Butina paper, the first point added is considered to be the centroid. The definition of that is that all the other points in the cluster are within the exclusion distance of the first point. It does not necessarily mean that its the point in the cluster that would satisfy a geometric definition of centroid. But that's a detail. According to the definitions of the algorithm, the first point is the centroid. :-) -greg On Thu, Oct 26, 2017 at 4:51 PM, Paul Hawkins <pha...@ey...> wrote: > Thanks Greg. In the results from the Butina clustering is the cluster > centre the first ID in the list? If so then to recover the cluster centres > I’d just need to do something like: > > > > For c in cs: > > conf_id_I_want = c[0] > > > > > > Paul. > > > > *From:* Greg Landrum [mailto:gre...@gm...] > *Sent:* Thursday, October 26, 2017 12:41 AM > *To:* Paul Hawkins <pha...@ey...> > *Cc:* rdk...@li...; Sereina < > ser...@gm...> > *Subject:* Re: [Rdkit-discuss] Conformer generation > > > > > > > > On Wed, Oct 25, 2017 at 6:52 PM, Sereina <ser...@gm...> > wrote: > > Hi Paul, > > > > Regarding your second question: > > > > On 25 Oct 2017, at 18:36, Paul Hawkins <pha...@ey...> wrote: > > > > Also, once I generate the conformers what is best way to cluster them by > RMSD so that each conformer has a minimum RMSD to all the others in the set? > > > > I think the function AllChem.GetConformerRMSMatrix() might do (parts of) > what you want. > > > > And since we're doing the "each reply refines the answer" thing, here's a > bit of code that does Butina clustering to group the conformers: > > > > In [71]: m = Chem.AddHs(Chem.MolFromSmiles('OCCc1ncccc1CCCC')) > > > > In [72]: AllChem.EmbedMultipleConfs(m,50,AllChem.ETKDG()) > > Out[72]: <rdkit.rdBase._vecti at 0x11630ac90> > > > > In [73]: dm = AllChem.GetConformerRMSMatrix(m) > > > > In [76]: cs = Butina.ClusterData(dm,m.GetNumConformers(),1.5,isDistData= > True,reordering=True) > > > > In [77]: len(cs) > > Out[77]: 9 > > > > In [78]: for c in cs: > > ...: print(c) > > ...: > > (36, 2, 3, 4, 7, 8, 9, 11, 14, 15, 16, 17, 21, 29, 30, 33, 34, 40, 43, 44, > 46, 47) > > (10, 0, 32, 35, 38, 45, 49, 18, 19, 22, 23, 25, 27) > > (48, 6, 39, 42, 12, 24, 26) > > (5, 41, 31) > > (37,) > > (28,) > > (20,) > > (13,) > > (1,) > > > > The 1.5 argument in the call to Butina.ClusterData is the distance > threshold for things to be considered neighbors. > > > > -greg > > > |