From: Anthony S. <sc...@gm...> - 2013-07-12 16:13:25
|
On Fri, Jul 12, 2013 at 1:51 AM, Mathieu Dubois <dub...@ya... > wrote: > Hi Anthony, > > Thank you very much for your answer (it works). I will try to remodel my > code around this trick but I'm not sure it's possible because I use a > framework that need arrays. > I think that this method still works. You can always send back a numpy array to the main process that you pull out from a subprocess. > Can somebody explain what is going on? I was thinking that PyTables keep > weakref to the file for lazy loading but I'm not sure. > > How > > In any case, the PyTables community is very helpful. > Glad to help! Be Well Anthony > > Thanks, > Mathieu > > Le 12/07/2013 00:44, Anthony Scopatz a écrit : > > Hi Mathieu, > > I think you should try opening a new file handle per process. The > following works for me on v3.0: > > import tables > import random > import multiprocessing > > # Reload the data > > # Use multiprocessing to perform a simple computation (column average) > > def f(filename): > h5file = tables.openFile(filename, mode='r') > name = multiprocessing.current_process().name > column = random.randint(0, 10) > print '%s use column %i' % (name, column) > rtn = h5file.root.X[:, column].mean() > h5file.close() > return rtn > > p = multiprocessing.Pool(2) > col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) > > Be well > Anthony > > > On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois < > dub...@ya...> wrote: > >> Le 11/07/2013 21:56, Anthony Scopatz a écrit : >> >> >> >> >> On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois < >> dub...@ya...> wrote: >> >>> Hello, >>> >>> I wanted to use PyTables in conjunction with multiprocessing for some >>> embarrassingly parallel tasks. >>> >>> However, it seems that it is not possible. In the following (very >>> stupid) example, X is a Carray of size (100, 10) stored in the file >>> test.hdf5: >>> >>> import tables >>> >>> import multiprocessing >>> >>> # Reload the data >>> >>> h5file = tables.openFile('test.hdf5', mode='r') >>> >>> X = h5file.root.X >>> >>> # Use multiprocessing to perform a simple computation (column average) >>> >>> def f(X): >>> >>> name = multiprocessing.current_process().name >>> >>> column = random.randint(0, n_features) >>> >>> print '%s use column %i' % (name, column) >>> >>> return X[:, column].mean() >>> >>> p = multiprocessing.Pool(2) >>> >>> col_mean = p.map(f, [X, X, X]) >>> >>> When executing it the following error: >>> >>> Exception in thread Thread-2: >>> >>> Traceback (most recent call last): >>> >>> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner >>> >>> self.run() >>> >>> File "/usr/lib/python2.7/threading.py", line 504, in run >>> >>> self.__target(*self.__args, **self.__kwargs) >>> >>> File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in >>> _handle_tasks >>> >>> put(task) >>> >>> PicklingError: Can't pickle <type 'weakref'>: attribute lookup >>> __builtin__.weakref failed >>> >>> >>> I have googled for weakref and pickle but can't find a solution. >>> >>> Any help? >>> >> >> Hello Mathieu, >> >> I have used multiprocessing and files opened in read mode many times so >> I am not sure what is going on here. >> >> Thanks for your answer. Maybe you can point me to an working example? >> >> >> Could you provide the test.hdf5 file so that we could try to reproduce >> this. >> >> Here is the script that I have used to generate the data: >> >> import tables >> >> import numpy >> >> # Create data & store it >> >> n_features = 10 >> >> n_obs = 100 >> >> X = numpy.random.rand(n_obs, n_features) >> >> h5file = tables.openFile('test.hdf5', mode='w') >> >> Xatom = tables.Atom.from_dtype(X.dtype) >> >> Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) >> >> Xhdf5[:] = X >> >> h5file.close() >> >> >> I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu >> 12.04 (libhdf5 is 1.8.4patch1). >> >> >> >> >>> By the way, I have noticed that by slicing a Carray, I get a numpy array >>> (I created the HDF5 file with numpy). Therefore, everything is copied to >>> memory. Is there a way to avoid that? >>> >> >> Only the slice that you ask for is brought into memory an it is >> returned as a non-view numpy array. >> >> OK. I may be careful about that. >> >> >> >> Be Well >> Anthony >> >> >>> >>> Mathieu >>> >>> >>> ------------------------------------------------------------------------------ >>> See everything from the browser to the database with AppDynamics >>> Get end-to-end visibility with application monitoring from AppDynamics >>> Isolate bottlenecks and diagnose root cause in seconds. >>> Start your free trial of AppDynamics Pro today! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >> >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> >> >> >> _______________________________________________ >> Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |