[Rdkit-discuss] Missing Properties for Mol in Multiprocessing Pool
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Paul N. <pau...@gm...> - 2017-01-13 23:39:57
|
Hello All I am getting strange behaviour for mols passed into multiprocessing Pools. I am finding that all of the SD properties for the mol seem to disappear within the worker process. In the following, I am attempting to retrieve the 'ChemDiv_IDNUMBER' property from a series of mols. When doing this is in loop outside of a worker process, the value is retrieved as expected. However, within the worker, the property does not exist. compFile = Chem.SDMolSupplier('mols.sdf') iterator = [] for i in range(5): iterator.append(compFile[i]) print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop') print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop') def lookupForI(mol): thisresult = [0,0,0,0,0,0] print(mol.GetNumHeavyAtoms(), 'atoms in worker') print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop') return mol.GetNumHeavyAtoms() pool = Pool(3) result=pool.map(lookupForI, iterator) pool.close() pool.join() for ares in result: print(ares) gives the following 20 atoms in loop 000L-0408 is ID in loop 18 atoms in loop 000L-1176 is ID in loop 18 atoms in loop 000L-1268 is ID in loop 26 atoms in loop 000L-2413 is ID in loop 18 atoms in loop 000L-5632 is ID in loop 20 atoms in worker 18 atoms in worker 18 atoms in worker 26 atoms in worker 18 atoms in worker ---------------------------------------------------------------------------RemoteTraceback Traceback (most recent call last)RemoteTraceback: """ Traceback (most recent call last): File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "<ipython-input-98-b305529073c1>", line 16, in lookupForI print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop') KeyError: 'ChemDiv_IDNUMBER' """ The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last)<ipython-input-98-b305529073c1> in <module>() 35 36 pool = Pool(3)---> 37 result=pool.map(lookupForI, iterator) 38 pool.close() 39 pool.join() /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in map(self, func, iterable, chunksize) 258 in a list that is returned. 259 '''--> 260 return self._map_async(func, iterable, mapstar, chunksize).get() 261 262 def starmap(self, func, iterable, chunksize=None): /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in get(self, timeout) 606 return self._value 607 else:--> 608 raise self._value 609 610 def _set(self, i, obj): KeyError: 'ChemDiv_IDNUMBER' And, when looking to see if any properties are associated with the mol using GetPropNames, I find no properties in the worker process, but all of the properties exist within the loop. iterator = [] for i in range(5): iterator.append(compFile[i]) print(len([x for x in compFile[i].GetPropNames()]), 'properties in loop') print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop') print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop') def lookupForI(mol): thisresult = [0,0,0,0,0,0] print(len([x for x in mol.GetPropNames()]), 'properties in worker') print(mol.GetNumHeavyAtoms(), 'atoms in worker') print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop') return mol.GetNumHeavyAtoms() ... gives 76 properties in loop 20 atoms in loop 000L-0408 is ID in loop 76 properties in loop 18 atoms in loop 000L-1176 is ID in loop 76 properties in loop 18 atoms in loop 000L-1268 is ID in loop 76 properties in loop 26 atoms in loop 000L-2413 is ID in loop 76 properties in loop 18 atoms in loop 000L-5632 is ID in loop 0 properties in worker 0 properties in worker 20 atoms in worker 0 properties in worker 18 atoms in worker 18 atoms in worker 18 atoms in worker 0 properties in worker 26 atoms in worker 0 properties in worker --------------------------------------------------------------------------- RemoteTraceback Traceback (most recent call last) ... Any ideas on where the missing data went, or how to overcome this issue? Thanks in advance for your thoughts! Best |