From: David W. <dav...@gm...> - 2012-11-17 20:31:53
|
I've been using (and recommend) Pandas http://pandas.pydata.org/ along with this book: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDIQFjAA&url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920023784.do&ei=GfSnUJSbGqm5ywH7poCwDA&usg=AFQjCNEJuio5DbubgyNQR4Tp9iM1RClZHA Good luck, Dave On Fri, Nov 16, 2012 at 11:02 AM, Jon Wilson <js...@fn...> wrote: > Hi all, > I am trying to find the best way to make histograms from large data > sets. Up to now, I've been just loading entire columns into in-memory > numpy arrays and making histograms from those. However, I'm currently > working on a handful of datasets where this is prohibitively memory > intensive (causing an out-of-memory kernel panic on a shared machine > that you have to open a ticket to have rebooted makes you a little > gun-shy), so I am now exploring other options. > > I know that the Column object is rather nicely set up to act, in some > circumstances, like a numpy ndarray. So my first thought is to try just > creating the histogram out of the Column object directly. This is, > however, 1000x slower than loading it into memory and creating the > histogram from the in-memory array. Please see my test notebook at: > http://www-cdf.fnal.gov/~jsw/pytables%20test%20stuff.html > > For such a small table, loading into memory is not an issue. For larger > tables, though, it is a problem, and I had hoped that pytables was > optimized so that histogramming directly from disk would proceed no > slower than loading into memory and histogramming. Is there some other > way of accessing the column (or Array or CArray) data that will make > faster histograms? > Regards, > Jon > > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- David C. Wilson (612) 460-1329 dav...@gm... http://www.linkedin.com/in/davidcwilson |