Re: [Pytables-users] Chunk selection for optimized data access

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I think I've seen this in the release notes of 3.0. This is actually
something that I'm looking into as well. So any experience/feedback about
creating files in memory would be much appreciated.

Best regards
Seref

On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll <li...@hi...> wrote:

> On 04.06.2013 05:35, Tim Burgess wrote:
> > My thoughts are:
> >
> > - try it without any compression. Assuming 32 bit floats, your monthly
> > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and
> > at the least it will give you a baseline to work from - and will help if
> > you are investigating IO tuning.
> >
> > - I have found with CArray that the auto chunksize works fairly well.
> > Experiment with that chunksize and with some chunksizes that you think
> > are more appropriate (maybe temporal rather than spatial in your case).
> >
> > On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote:
> >
> >> On 03.06.2013 14:43, Andreas Hilboll wrote:
> >> > Hi,
> >> >
> >> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> >> > (the last dimension represents time, and once per month there'll be
> one
> >> > more 5760x2880 array to add to the end).
> >> >
> >> > Now, extracting timeseries at one index location is slow; e.g., for
> four
> >> > indices, it takes several seconds:
> >> >
> >> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> >> >
> >> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> >> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> >> > Wall time: 7.17 s
> >> >
> >> > I have the feeling that this performance could be improved, but I'm
> not
> >> > sure about how to properly use the `chunkshape` parameter in my case.
> >> >
> >> > Any help is greatly appreciated :)
> >> >
> >> > Cheers, Andreas.
> >>
> >> PS: If I could get significant performance gains by not using an EArray
> >> and therefore re-creating the whole database each month, then this would
> >> also be an option.
> >>
> >> -- Andreas.
>
> Thanks a lot, Anthony and Tim! I was able to get down the readout time
> considerably using  chunkshape=(32, 32, 256) for my 5760x2880x150 array.
> Now, reading times are about as fast as I expected.
>
> the downside is that now, building up the database takes up a lot of
> time, because i get the data in chunks of 5760x2880x1. So I guess that
> writing the data to disk like this causes a load of IO operations ...
>
> My new question: Is there a way to create a file in-memory? If possible,
> I could then build up my database in-memory and then, once it's done,
> just copy the arrays to an on-disk file. Is that possible? If so, how?
>
> Thanks a lot for your help!
>
> -- Andreas.
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>