From: Mathieu D. <dub...@ya...> - 2013-07-11 19:49:33
|
Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks put(task) PicklingError: Can't pickle <type 'weakref'>: attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Mathieu |