|
From: Francesc A. <fa...@gm...> - 2013-06-05 13:10:45
|
On 6/5/13 11:45 AM, Andreas Hilboll wrote:
> On 05.06.2013 10:31, Andreas Hilboll wrote:
>> On 05.06.2013 03:29, Tim Burgess wrote:
>>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>>> Here's an example based on what I was doing.
>>> I looked over the docs and it does mention that there is an option to
>>> throw away the 'file' rather than write it to disk.
>>> Not sure how to do that and can't actually think of a use case where I
>>> would want to :-)
>>>
>>> And be wary, it is H5FD_CORE.
>>>
>>>
>>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:
>>>> I think that you want to set parameters.DRIVER to H5DF_CORE [1]. I
>>>> haven't ever used this personally, but it would be great to have an
>>>> example script, if someone wants to write one ;)
>>>>
>>>
>>>
>>> import numpy as np
>>> import tables
>>>
>>> CHUNKY = 30
>>> CHUNKX = 8640
>>>
>>> if __name__ == '__main__':
>>>
>>> # create dataset and add global attrs
>>>
>>> file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>>
>>> with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>>> example', driver='H5FD_CORE') as h5f:
>>>
>>> # dummy some data
>>> lats = np.empty([4320])
>>> lons = np.empty([8640])
>>>
>>> # create some simple arrays
>>> lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>>> lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>>
>>> # create a 365 x 4320 x 8640 CArray of 32bit float
>>> shape = (365, 4320, 8640)
>>> atom = tables.Float32Atom(dflt=np.nan)
>>>
>>> # chunk into daily slices and then further chunk days
>>> sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>>> chunkshape=(1, CHUNKY, CHUNKX))
>>>
>>> # dummy up an ndarray
>>> sst = np.empty([4320, 8640], dtype=np.float32)
>>> sst.fill(30.0)
>>>
>>> # write ndarray to a 2D plane in the HDF5
>>> sst_node[0] = sst
>> Thanks Tim,
>>
>> I adapted your example for my use case (I'm using the EArray class,
>> because I need to continuously update my database), and it works well.
>>
>> However, when I use this with my own data (but also creating the arrays
>> like you did), I'm running into errors like "Could not wait on barrier".
>> It seems like the HDF library is spawing several threads.
>>
>> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
>> runtime?
> Update:
>
> When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
> seems to work as expected (but a bit on the slow side ...).
BTW, can you really notice the difference between using 1, 2 or 4
threads? Can you show some figures? Just curious.
--
Francesc Alted
|