Re: [Pytables-users] HDF5/PyTables/NumPy Question

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Robert,

Glad these materials can be helpful.  (Note: these questions really should
be asked on the pytables-users mailing list -- CC'd here -- so please join
that list: https://lists.sourceforge.net/lists/listinfo/pytables-users)

On Fri, Jul 12, 2013 at 12:48 PM, Robert Nelson <
rrn...@at...> wrote:

> Dr. Scopatz,
>
> I came across your SciPy 2012 "HDF5 is for lovers" video and thought you
> might be able to help me.
>
> I'm trying to read large (>1GB) HDF files and do multidimensional indexing
> (with repeated values) on them. I saw a post<http://www.mail-archive.com/pyt...@li.../msg02586.html>of yours from over a year ago saying that the best solution would be to
> convert it to a NumPy array but this takes too long.
>

I think that the strategy is the same as before.  Ask (to the best of my
recollection) did not open an issue and so no changes have been made to
PyTables to handle this.

Also in this strategy, you should only be loading in the indices to start
with.  I doubt (though I could be wrong) that you have 1 Gb worth of index
data alone.  The whole idea here is to do a unique (set) and a sort
operation on the much smaller index data AND THEN use fancy indexing to
pull the actual data back out.

As always some sample code and a sample file would be extremely helpful.  I
don't think I can do much more for you without these.

Be Well
Anthony

> Have there been any updates in PyTables that would make this possible?
>
> Thank you!
>
> Robert Nelson
> Colorado State University
> Rob...@gm...
>  763-354-8411
>