pytables-users Mailing List for PyTables - Hierarchical datasets (Page 14)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, Nov 16, 2012 at 9:02 AM, Jon Wilson <js...@fn...> wrote:

> Hi all,
> I am trying to find the best way to make histograms from large data
> sets.  Up to now, I've been just loading entire columns into in-memory
> numpy arrays and making histograms from those.  However, I'm currently
> working on a handful of datasets where this is prohibitively memory
> intensive (causing an out-of-memory kernel panic on a shared machine
> that you have to open a ticket to have rebooted makes you a little
> gun-shy), so I am now exploring other options.
>
> I know that the Column object is rather nicely set up to act, in some
> circumstances, like a numpy ndarray.  So my first thought is to try just
> creating the histogram out of the Column object directly. This is,
> however, 1000x slower than loading it into memory and creating the
> histogram from the in-memory array.  Please see my test notebook at:
> http://www-cdf.fnal.gov/~jsw/pytables%20test%20stuff.html
>
> For such a small table, loading into memory is not an issue.  For larger
> tables, though, it is a problem, and I had hoped that pytables was
> optimized so that histogramming directly from disk would proceed no
> slower than loading into memory and histogramming. Is there some other
> way of accessing the column (or Array or CArray) data that will make
> faster histograms?
>

Hi Jon,

This is not surprising since the column object itself is going to be
iterated
over per row.  As you found, reading in each row individually will be
prohibitively expensive as compared to reading in all the data at one.

To do this in the right way for data that is larger than system memory, you
need to read it in in chunks.  Luckily there are tools to help you automate
this process already in PyTables.  I would recommend that you use
expressions [1] or queries [2] to do your historgramming more efficiently.

Be Well
Anthony

1. http://pytables.github.com/usersguide/libref/expr_class.html
2.
http://pytables.github.com/usersguide/libref/structured_storage.html?#table-methods-querying

> Regards,
> Jon
>
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

pytables-users Mailing List for PyTables - Hierarchical datasets (Page 14)

pytables-users — PyTables users discussion list