From: Anthony S. <sc...@gm...> - 2013-07-11 22:44:40
|
Hi Mathieu, I think you should try opening a new file handle per process. The following works for me on v3.0: import tables import random import multiprocessing # Reload the data # Use multiprocessing to perform a simple computation (column average) def f(filename): h5file = tables.openFile(filename, mode='r') name = multiprocessing.current_process().name column = random.randint(0, 10) print '%s use column %i' % (name, column) rtn = h5file.root.X[:, column].mean() h5file.close() return rtn p = multiprocessing.Pool(2) col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) Be well Anthony On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois <dub...@ya... > wrote: > Le 11/07/2013 21:56, Anthony Scopatz a écrit : > > > > > On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois < > dub...@ya...> wrote: > >> Hello, >> >> I wanted to use PyTables in conjunction with multiprocessing for some >> embarrassingly parallel tasks. >> >> However, it seems that it is not possible. In the following (very >> stupid) example, X is a Carray of size (100, 10) stored in the file >> test.hdf5: >> >> import tables >> >> import multiprocessing >> >> # Reload the data >> >> h5file = tables.openFile('test.hdf5', mode='r') >> >> X = h5file.root.X >> >> # Use multiprocessing to perform a simple computation (column average) >> >> def f(X): >> >> name = multiprocessing.current_process().name >> >> column = random.randint(0, n_features) >> >> print '%s use column %i' % (name, column) >> >> return X[:, column].mean() >> >> p = multiprocessing.Pool(2) >> >> col_mean = p.map(f, [X, X, X]) >> >> When executing it the following error: >> >> Exception in thread Thread-2: >> >> Traceback (most recent call last): >> >> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner >> >> self.run() >> >> File "/usr/lib/python2.7/threading.py", line 504, in run >> >> self.__target(*self.__args, **self.__kwargs) >> >> File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in >> _handle_tasks >> >> put(task) >> >> PicklingError: Can't pickle <type 'weakref'>: attribute lookup >> __builtin__.weakref failed >> >> >> I have googled for weakref and pickle but can't find a solution. >> >> Any help? >> > > Hello Mathieu, > > I have used multiprocessing and files opened in read mode many times so > I am not sure what is going on here. > > Thanks for your answer. Maybe you can point me to an working example? > > > Could you provide the test.hdf5 file so that we could try to reproduce > this. > > Here is the script that I have used to generate the data: > > import tables > > import numpy > > # Create data & store it > > n_features = 10 > > n_obs = 100 > > X = numpy.random.rand(n_obs, n_features) > > h5file = tables.openFile('test.hdf5', mode='w') > > Xatom = tables.Atom.from_dtype(X.dtype) > > Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) > > Xhdf5[:] = X > > h5file.close() > > > I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu > 12.04 (libhdf5 is 1.8.4patch1). > > > > >> By the way, I have noticed that by slicing a Carray, I get a numpy array >> (I created the HDF5 file with numpy). Therefore, everything is copied to >> memory. Is there a way to avoid that? >> > > Only the slice that you ask for is brought into memory an it is returned > as a non-view numpy array. > > OK. I may be careful about that. > > > > Be Well > Anthony > > >> >> Mathieu >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |