Thread: [Pytables-users] Histogramming 1000x too slow

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,
I am trying to find the best way to make histograms from large data 
sets.  Up to now, I've been just loading entire columns into in-memory 
numpy arrays and making histograms from those.  However, I'm currently 
working on a handful of datasets where this is prohibitively memory 
intensive (causing an out-of-memory kernel panic on a shared machine 
that you have to open a ticket to have rebooted makes you a little 
gun-shy), so I am now exploring other options.

I know that the Column object is rather nicely set up to act, in some 
circumstances, like a numpy ndarray.  So my first thought is to try just 
creating the histogram out of the Column object directly. This is, 
however, 1000x slower than loading it into memory and creating the 
histogram from the in-memory array.  Please see my test notebook at: 
http://www-cdf.fnal.gov/~jsw/pytables%20test%20stuff.html

For such a small table, loading into memory is not an issue.  For larger 
tables, though, it is a problem, and I had hoped that pytables was 
optimized so that histogramming directly from disk would proceed no 
slower than loading into memory and histogramming. Is there some other 
way of accessing the column (or Array or CArray) data that will make 
faster histograms?
Regards,
Jon

Thread: [Pytables-users] Histogramming 1000x too slow

pytables-users