From: Mathieu D. <dub...@ya...> - 2013-07-12 06:51:34
|
Hi Anthony, Thank you very much for your answer (it works). I will try to remodel my code around this trick but I'm not sure it's possible because I use a framework that need arrays. Can somebody explain what is going on? I was thinking that PyTables keep weakref to the file for lazy loading but I'm not sure. How In any case, the PyTables community is very helpful. Thanks, Mathieu Le 12/07/2013 00:44, Anthony Scopatz a écrit : > Hi Mathieu, > > I think you should try opening a new file handle per process. The > following works for me on v3.0: > > import tables > import random > import multiprocessing > > # Reload the data > > # Use multiprocessing to perform a simple computation (column average) > > def f(filename): > h5file = tables.openFile(filename, mode='r') > name = multiprocessing.current_process().name > column = random.randint(0, 10) > print '%s use column %i' % (name, column) > rtn = h5file.root.X[:, column].mean() > h5file.close() > return rtn > > p = multiprocessing.Pool(2) > col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) > > Be well > Anthony > > > On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois > <dub...@ya... <mailto:dub...@ya...>> wrote: > > Le 11/07/2013 21:56, Anthony Scopatz a écrit : >> >> >> >> On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois >> <dub...@ya... >> <mailto:dub...@ya...>> wrote: >> >> Hello, >> >> I wanted to use PyTables in conjunction with multiprocessing >> for some >> embarrassingly parallel tasks. >> >> However, it seems that it is not possible. In the following (very >> stupid) example, X is a Carray of size (100, 10) stored in >> the file >> test.hdf5: >> >> import tables >> >> import multiprocessing >> >> # Reload the data >> >> h5file = tables.openFile('test.hdf5', mode='r') >> >> X = h5file.root.X >> >> # Use multiprocessing to perform a simple computation (column >> average) >> >> def f(X): >> >> name = multiprocessing.current_process().name >> >> column = random.randint(0, n_features) >> >> print '%s use column %i' % (name, column) >> >> return X[:, column].mean() >> >> p = multiprocessing.Pool(2) >> >> col_mean = p.map(f, [X, X, X]) >> >> When executing it the following error: >> >> Exception in thread Thread-2: >> >> Traceback (most recent call last): >> >> File "/usr/lib/python2.7/threading.py", line 551, in >> __bootstrap_inner >> >> self.run() >> >> File "/usr/lib/python2.7/threading.py", line 504, in run >> >> self.__target(*self.__args, **self.__kwargs) >> >> File "/usr/lib/python2.7/multiprocessing/pool.py", line >> 319, in _handle_tasks >> >> put(task) >> >> PicklingError: Can't pickle <type 'weakref'>: attribute >> lookup __builtin__.weakref failed >> >> >> I have googled for weakref and pickle but can't find a solution. >> >> Any help? >> >> >> Hello Mathieu, >> >> I have used multiprocessing and files opened in read mode many >> times so I am not sure what is going on here. > Thanks for your answer. Maybe you can point me to an working example? > > >> Could you provide the test.hdf5 file so that we could try to >> reproduce this. > Here is the script that I have used to generate the data: > > import tables > > import numpy > > # Create data & store it > > n_features = 10 > > n_obs = 100 > > X = numpy.random.rand(n_obs, n_features) > > h5file = tables.openFile('test.hdf5', mode='w') > > Xatom = tables.Atom.from_dtype(X.dtype) > > Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) > > Xhdf5[:] = X > > h5file.close() > > I hope it's not a stupid mistake. I am using PyTables 2.3.1 on > Ubuntu 12.04 (libhdf5 is 1.8.4patch1). > > >> By the way, I have noticed that by slicing a Carray, I get a >> numpy array >> (I created the HDF5 file with numpy). Therefore, everything >> is copied to >> memory. Is there a way to avoid that? >> >> >> Only the slice that you ask for is brought into memory an it is >> returned as a non-view numpy array. > OK. I may be careful about that. > > >> >> Be Well >> Anthony >> >> >> Mathieu >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from >> AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> <mailto:Pyt...@li...> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... <mailto:Pyt...@li...> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |