|
From: Mathieu D. <dub...@ya...> - 2013-07-12 06:51:34
|
Hi Anthony,
Thank you very much for your answer (it works). I will try to remodel my
code around this trick but I'm not sure it's possible because I use a
framework that need arrays.
Can somebody explain what is going on? I was thinking that PyTables keep
weakref to the file for lazy loading but I'm not sure.
How
In any case, the PyTables community is very helpful.
Thanks,
Mathieu
Le 12/07/2013 00:44, Anthony Scopatz a écrit :
> Hi Mathieu,
>
> I think you should try opening a new file handle per process. The
> following works for me on v3.0:
>
> import tables
> import random
> import multiprocessing
>
> # Reload the data
>
> # Use multiprocessing to perform a simple computation (column average)
>
> def f(filename):
> h5file = tables.openFile(filename, mode='r')
> name = multiprocessing.current_process().name
> column = random.randint(0, 10)
> print '%s use column %i' % (name, column)
> rtn = h5file.root.X[:, column].mean()
> h5file.close()
> return rtn
>
> p = multiprocessing.Pool(2)
> col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])
>
> Be well
> Anthony
>
>
> On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois
> <dub...@ya... <mailto:dub...@ya...>> wrote:
>
> Le 11/07/2013 21:56, Anthony Scopatz a écrit :
>>
>>
>>
>> On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois
>> <dub...@ya...
>> <mailto:dub...@ya...>> wrote:
>>
>> Hello,
>>
>> I wanted to use PyTables in conjunction with multiprocessing
>> for some
>> embarrassingly parallel tasks.
>>
>> However, it seems that it is not possible. In the following (very
>> stupid) example, X is a Carray of size (100, 10) stored in
>> the file
>> test.hdf5:
>>
>> import tables
>>
>> import multiprocessing
>>
>> # Reload the data
>>
>> h5file = tables.openFile('test.hdf5', mode='r')
>>
>> X = h5file.root.X
>>
>> # Use multiprocessing to perform a simple computation (column
>> average)
>>
>> def f(X):
>>
>> name = multiprocessing.current_process().name
>>
>> column = random.randint(0, n_features)
>>
>> print '%s use column %i' % (name, column)
>>
>> return X[:, column].mean()
>>
>> p = multiprocessing.Pool(2)
>>
>> col_mean = p.map(f, [X, X, X])
>>
>> When executing it the following error:
>>
>> Exception in thread Thread-2:
>>
>> Traceback (most recent call last):
>>
>> File "/usr/lib/python2.7/threading.py", line 551, in
>> __bootstrap_inner
>>
>> self.run()
>>
>> File "/usr/lib/python2.7/threading.py", line 504, in run
>>
>> self.__target(*self.__args, **self.__kwargs)
>>
>> File "/usr/lib/python2.7/multiprocessing/pool.py", line
>> 319, in _handle_tasks
>>
>> put(task)
>>
>> PicklingError: Can't pickle <type 'weakref'>: attribute
>> lookup __builtin__.weakref failed
>>
>>
>> I have googled for weakref and pickle but can't find a solution.
>>
>> Any help?
>>
>>
>> Hello Mathieu,
>>
>> I have used multiprocessing and files opened in read mode many
>> times so I am not sure what is going on here.
> Thanks for your answer. Maybe you can point me to an working example?
>
>
>> Could you provide the test.hdf5 file so that we could try to
>> reproduce this.
> Here is the script that I have used to generate the data:
>
> import tables
>
> import numpy
>
> # Create data & store it
>
> n_features = 10
>
> n_obs = 100
>
> X = numpy.random.rand(n_obs, n_features)
>
> h5file = tables.openFile('test.hdf5', mode='w')
>
> Xatom = tables.Atom.from_dtype(X.dtype)
>
> Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)
>
> Xhdf5[:] = X
>
> h5file.close()
>
> I hope it's not a stupid mistake. I am using PyTables 2.3.1 on
> Ubuntu 12.04 (libhdf5 is 1.8.4patch1).
>
>
>> By the way, I have noticed that by slicing a Carray, I get a
>> numpy array
>> (I created the HDF5 file with numpy). Therefore, everything
>> is copied to
>> memory. Is there a way to avoid that?
>>
>>
>> Only the slice that you ask for is brought into memory an it is
>> returned as a non-view numpy array.
> OK. I may be careful about that.
>
>
>>
>> Be Well
>> Anthony
>>
>>
>> Mathieu
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from
>> AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> <mailto:Pyt...@li...>
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>
>>
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li... <mailto:Pyt...@li...>
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> <mailto:Pyt...@li...>
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
|