Re: [Pytables-users] PyTables and Multiprocessing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, Jul 12, 2013 at 1:51 AM, Mathieu Dubois <dub...@ya...
> wrote:

>  Hi Anthony,
>
> Thank you very much for your answer (it works). I will try to remodel my
> code around this trick but I'm not sure it's possible because I use a
> framework that need arrays.
>

I think that this method still works.  You can always send back a numpy
array to the main process that you pull out from a subprocess.

> Can somebody explain what is going on? I was thinking that PyTables keep
> weakref to the file for lazy loading but I'm not sure.
>
> How
>
> In any case, the PyTables community is very helpful.
>

Glad to help!

Be Well
Anthony

>
> Thanks,
> Mathieu
>
> Le 12/07/2013 00:44, Anthony Scopatz a écrit :
>
> Hi Mathieu,
>
>  I think you should try opening a new file handle per process.  The
> following works for me on v3.0:
>
>  import tables
> import random
> import multiprocessing
>
>  # Reload the data
>
>  # Use multiprocessing to perform a simple computation (column average)
>
>  def f(filename):
>     h5file = tables.openFile(filename, mode='r')
>     name = multiprocessing.current_process().name
>     column = random.randint(0, 10)
>     print '%s use column %i' % (name, column)
>     rtn = h5file.root.X[:, column].mean()
>     h5file.close()
>     return rtn
>
>  p = multiprocessing.Pool(2)
> col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])
>
>  Be well
> Anthony
>
>
> On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois <
> dub...@ya...> wrote:
>
>>  Le 11/07/2013 21:56, Anthony Scopatz a écrit :
>>
>>
>>
>>
>> On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois <
>> dub...@ya...> wrote:
>>
>>> Hello,
>>>
>>> I wanted to use PyTables in conjunction with multiprocessing for some
>>> embarrassingly parallel tasks.
>>>
>>> However, it seems that it is not possible. In the following (very
>>> stupid) example, X is a Carray of size (100, 10) stored in the file
>>> test.hdf5:
>>>
>>> import tables
>>>
>>> import multiprocessing
>>>
>>> # Reload the data
>>>
>>> h5file = tables.openFile('test.hdf5', mode='r')
>>>
>>> X = h5file.root.X
>>>
>>> # Use multiprocessing to perform a simple computation (column average)
>>>
>>> def f(X):
>>>
>>>      name = multiprocessing.current_process().name
>>>
>>>      column = random.randint(0, n_features)
>>>
>>>      print '%s use column %i' % (name, column)
>>>
>>>      return X[:, column].mean()
>>>
>>> p = multiprocessing.Pool(2)
>>>
>>> col_mean = p.map(f, [X, X, X])
>>>
>>> When executing it the following error:
>>>
>>> Exception in thread Thread-2:
>>>
>>> Traceback (most recent call last):
>>>
>>>    File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
>>>
>>>      self.run()
>>>
>>>    File "/usr/lib/python2.7/threading.py", line 504, in run
>>>
>>>      self.__target(*self.__args, **self.__kwargs)
>>>
>>>    File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in
>>> _handle_tasks
>>>
>>>      put(task)
>>>
>>> PicklingError: Can't pickle <type 'weakref'>: attribute lookup
>>> __builtin__.weakref failed
>>>
>>>
>>> I have googled for weakref and pickle but can't find a solution.
>>>
>>> Any help?
>>>
>>
>>  Hello Mathieu,
>>
>>  I have used multiprocessing and files opened in read mode many times so
>> I am not sure what is going on here.
>>
>>  Thanks for your answer. Maybe you can point me to an working example?
>>
>>
>>   Could you provide the test.hdf5 file so that we could try to reproduce
>> this.
>>
>>  Here is the script that I have used to generate the data:
>>
>> import tables
>>
>> import numpy
>>
>> # Create data & store it
>>
>> n_features = 10
>>
>> n_obs      = 100
>>
>> X = numpy.random.rand(n_obs, n_features)
>>
>> h5file = tables.openFile('test.hdf5', mode='w')
>>
>> Xatom = tables.Atom.from_dtype(X.dtype)
>>
>> Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)
>>
>> Xhdf5[:] = X
>>
>> h5file.close()
>>
>>
>> I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu
>> 12.04 (libhdf5 is 1.8.4patch1).
>>
>>
>>
>>
>>> By the way, I have noticed that by slicing a Carray, I get a numpy array
>>> (I created the HDF5 file with numpy). Therefore, everything is copied to
>>> memory. Is there a way to avoid that?
>>>
>>
>>  Only the slice that you ask for is brought into memory an it is
>> returned as a non-view numpy array.
>>
>>  OK. I may be careful about that.
>>
>>
>>
>>  Be Well
>> Anthony
>>
>>
>>>
>>> Mathieu
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>
>>
>>
>> _______________________________________________
>> Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>