Re: [Rdkit-discuss] Leaky Memory?
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Greg L. <gre...@gm...> - 2014-06-11 09:02:15
|
On Wed, Jun 11, 2014 at 4:35 AM, Nicholas Firth <Nic...@ic...>
wrote:
> I want to show some numbers from a compatible fragmentation scheme to my
> own one. Which means generating all the leaves from the hierarchy and then
> doing some post processing to merge these fragments. This isn't a problem
> on some of the more drug like data sets, however with ChEMBL this is
> causing me some stress.
>
If you're ok using BRICS instead of RECAP, you can do something like this:
In [24]: mol = Chem.MolFromSmiles('CC[C@H](C)[C@H](NC(=O)[C@H
](CC(C)C)NC(=O)[C@@H](NC(=O)[C@@H](N)CCSC)[C@@H](C)O)C(=O)NCC(=O)N[C@
@H](C)C(=O)N[C@@H](C)C(=O)N[C@@H](Cc1c[nH]cn1)C(=O)N[C@
@H](CC(=O)N)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)N[C@
@H](CCC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@
@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@
@H](CCCN=C(N)N)C(=O)NCC(=O)N[C@@H](CCC(=O)N)C(=O)N[C@
@H](CC(C)C)C(=O)NCC(=O)N2CCC[C@H]2C(=O)N3CCC[C@H]3C(=O)NCC(=O)N[C@
@H](CO)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N')
In [25]: frags =
Chem.GetMolFrags(Chem.FragmentOnBRICSBonds(mol),asMols=True)
In [26]: smis = set([Chem.MolToSmiles(x,True) for x in frags])
In [27]: len(smis)
Out[27]: 17
In [28]: smis
Out[28]:
{'[1*]C(=O)C[4*]',
'[1*]C(=O)[C@@H](N)CC[4*]',
'[1*]C(=O)[C@@H]([4*])C',
'[1*]C(=O)[C@@H]([4*])CC(C)C',
'[1*]C(=O)[C@@H]([4*])CC(N)=O',
'[1*]C(=O)[C@@H]([4*])CCC(N)=O',
'[1*]C(=O)[C@@H]([4*])CCCN=C(N)N',
'[1*]C(=O)[C@@H]([4*])CO',
'[1*]C(=O)[C@@H]([4*])C[8*]',
'[1*]C(=O)[C@H]([4*])[C@@H](C)CC',
'[1*]C(=O)[C@H]([4*])[C@@H](C)O',
'[1*]C([6*])=O',
'[11*]SC',
'[14*]c1c[nH]cn1',
'[4*][C@@H](CCCN=C(N)N)C(N)=O',
'[5*]N1CCC[C@@H]1[13*]',
'[5*]N[5*]'}
Doing the same thing with the RECAP rules is not quite as trivial, but
should be doable
-greg
|