Re: [Rdkit-discuss] multiprocessing & rdkit
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Greg L. <gre...@gm...> - 2011-10-11 14:23:33
|
Hi Paul,
On Tue, Oct 11, 2011 at 7:55 AM, <Pau...@me...> wrote:
>
> Dear RDkitters,
>
> I'm trying to use Python's multiprocessing module in conjunction with
> RDKit.
>
> It should be applied in 2 cases:
> (1) fingerprint calculation
> &
> (2) Picking Diverse Molecules
>
>
>
> (1)
> "
> from multiprocessing import Pool
> p4 = Pool(processes=4)
> def fps_calc(m):
> fps = [GetMorganFingerprint(x,3) for x in m]
> return fps
> fps = p4.map(fps_calc,ms)
> "
> ==>
> "TypeError: 'Mol' object is not iterable"
I think what you want to do here is:
#-------------------
def fps_calc(m):
fp = GetMorganFingerprint(m,3)
return fp
fps = p4.map(fps_calc,ms)
#-------------------
The map method takes a function and a sequence of objects, it applies
that function to each object in the sequence.
> (2)
> "
> from multiprocessing import Pool
> p4 = Pool(processes=4)
> def distij(i,j,fps=fps):
> return 1-DataStructs.DiceSimilarity(fps[i],fps[j])
>
> def DivSelection(distij,nfps,quantity_train):
> picker = MaxMinPicker()
> picked_indices = picker.LazyPick(distij,nfps,quantity_train)
> return picked_indices
> "
> pickIndices = p4.map(DivSelection, ???)
The MaxMinPicker does not have any way to do the parallelization with
multiprocessing.
-greg
|