Re: [Pytables-users] Chunk selection for optimized data access

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 6/5/13 11:45 AM, Andreas Hilboll wrote:
> On 05.06.2013 10:31, Andreas Hilboll wrote:
>> On 05.06.2013 03:29, Tim Burgess wrote:
>>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>>> Here's an example based on what I was doing.
>>> I looked over the docs and it does mention that there is an option to
>>> throw away the 'file' rather than write it to disk.
>>> Not sure how to do that and can't actually think of a use case where I
>>> would want to :-)
>>>
>>> And be wary, it is H5FD_CORE.
>>>
>>>
>>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:
>>>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>>>> haven't ever used this personally, but it would be great to have an
>>>> example script, if someone wants to write one ;)
>>>>
>>>   
>>>
>>> import numpy as np
>>> import tables
>>>
>>> CHUNKY = 30
>>> CHUNKX = 8640
>>>
>>> if __name__ == '__main__':
>>>
>>>      # create dataset and add global attrs
>>>
>>>      file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>>
>>>      with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>>> example', driver='H5FD_CORE') as h5f:
>>>          
>>>          # dummy some data
>>>          lats = np.empty([4320])
>>>          lons = np.empty([8640])
>>>
>>>          # create some simple arrays
>>>          lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>>>          lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>>
>>>          # create a 365 x 4320 x 8640 CArray of 32bit float
>>>          shape = (365, 4320, 8640)
>>>          atom = tables.Float32Atom(dflt=np.nan)
>>>
>>>          # chunk into daily slices and then further chunk days
>>>          sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>>> chunkshape=(1, CHUNKY, CHUNKX))
>>>
>>>          # dummy up an ndarray
>>>          sst = np.empty([4320, 8640], dtype=np.float32)
>>>          sst.fill(30.0)
>>>
>>>          # write ndarray to a 2D plane in the HDF5
>>>          sst_node[0] = sst
>> Thanks Tim,
>>
>> I adapted your example for my use case (I'm using the EArray class,
>> because I need to continuously update my database), and it works well.
>>
>> However, when I use this with my own data (but also creating the arrays
>> like you did), I'm running into errors like "Could not wait on barrier".
>> It seems like the HDF library is spawing several threads.
>>
>> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
>> runtime?
> Update:
>
> When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
> seems to work as expected (but a bit on the slow side ...).

BTW, can you really notice the difference between using 1, 2 or 4 
threads?  Can you show some figures?  Just curious.

-- 
Francesc Alted