Re: [Pytables-users] Chunk selection for optimized data access

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Jun 4, 2013 at 12:30 PM, Seref Arikan <ser...@gm...> wrote:

> I think I've seen this in the release notes of 3.0. This is actually
> something that I'm looking into as well. So any experience/feedback about
> creating files in memory would be much appreciated.
>

I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I haven't
ever used this personally, but it would be great to have an example script,
if someone wants to write one ;)

Be Well
Anthony

1.
http://pytables.github.io/usersguide/parameter_files.html#hdf5-driver-management

>
> Best regards
> Seref
>
>
>
> On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll <li...@hi...> wrote:
>
>> On 04.06.2013 05:35, Tim Burgess wrote:
>> > My thoughts are:
>> >
>> > - try it without any compression. Assuming 32 bit floats, your monthly
>> > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and
>> > at the least it will give you a baseline to work from - and will help if
>> > you are investigating IO tuning.
>> >
>> > - I have found with CArray that the auto chunksize works fairly well.
>> > Experiment with that chunksize and with some chunksizes that you think
>> > are more appropriate (maybe temporal rather than spatial in your case).
>> >
>> > On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote:
>> >
>> >> On 03.06.2013 14:43, Andreas Hilboll wrote:
>> >> > Hi,
>> >> >
>> >> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed
>> EArray
>> >> > (the last dimension represents time, and once per month there'll be
>> one
>> >> > more 5760x2880 array to add to the end).
>> >> >
>> >> > Now, extracting timeseries at one index location is slow; e.g., for
>> four
>> >> > indices, it takes several seconds:
>> >> >
>> >> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
>> >> >
>> >> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
>> >> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
>> >> > Wall time: 7.17 s
>> >> >
>> >> > I have the feeling that this performance could be improved, but I'm
>> not
>> >> > sure about how to properly use the `chunkshape` parameter in my case.
>> >> >
>> >> > Any help is greatly appreciated :)
>> >> >
>> >> > Cheers, Andreas.
>> >>
>> >> PS: If I could get significant performance gains by not using an EArray
>> >> and therefore re-creating the whole database each month, then this
>> would
>> >> also be an option.
>> >>
>> >> -- Andreas.
>>
>> Thanks a lot, Anthony and Tim! I was able to get down the readout time
>> considerably using  chunkshape=(32, 32, 256) for my 5760x2880x150 array.
>> Now, reading times are about as fast as I expected.
>>
>> the downside is that now, building up the database takes up a lot of
>> time, because i get the data in chunks of 5760x2880x1. So I guess that
>> writing the data to disk like this causes a load of IO operations ...
>>
>> My new question: Is there a way to create a file in-memory? If possible,
>> I could then build up my database in-memory and then, once it's done,
>> just copy the arrays to an on-disk file. Is that possible? If so, how?
>>
>> Thanks a lot for your help!
>>
>> -- Andreas.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> How ServiceNow helps IT people transform IT departments:
>> 1. A cloud service to automate IT design, transition and operations
>> 2. Dashboards that offer high-level views of enterprise services
>> 3. A single system of record for all IT processes
>> http://p.sf.net/sfu/servicenow-d2d-j
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>