[Pytables-users] Chunk selection for optimized data access

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
(the last dimension represents time, and once per month there'll be one
more 5760x2880 array to add to the end).

Now, extracting timeseries at one index location is slow; e.g., for four
indices, it takes several seconds:

   In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))

   In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
   CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
   Wall time: 7.17 s

I have the feeling that this performance could be improved, but I'm not
sure about how to properly use the `chunkshape` parameter in my case.

Any help is greatly appreciated :)

Cheers, Andreas.