pytables-users Mailing List for PyTables - Hierarchical datasets (Page 12)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

David,

The change in issue 27 was only for iteration over a tables.Column
instance.  To use it, tweak Anthony's code as follows.  This will iterate
over the "element" column, as in your original example.

Note also that this will only work with the development version of PyTables
available on github.  It will be very slow using the released v2.4.0.

from itertools import izip

with tb.openFile(...) as f:
    data = f.root.data.cols.element
    data_i = iter(data)
    data_j = iter(data)
    data_i.next() # throw the first value away
    for i, j in izip(data_i, data_j):
        compare(i, j)

Hope that helps,
Josh

On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> wrote:

> HI David,
>
> Tables and table column iteration have been overhauled fairly recently
> [1].  So you might try creating two iterators, offset by one, and then
> doing the comparison.  I am hacking this out super quick so please forgive
> me:
>
> from itertools import izip
>
> with tb.openFile(...) as f:
>     data = f.root.data
>     data_i = iter(data)
>     data_j = iter(data)
>     data_i.next() # throw the first value away
>     for i, j in izip(data_i, data_j):
>         compare(i, j)
>
> You get the idea ;)
>
> Be Well
> Anthony
>
> 1. https://github.com/PyTables/PyTables/issues/27
>
>
> On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> wrote:
>
>> I was hoping someone could help me out here.
>>
>> This is from a post I put up on StackOverflow,
>>
>> I am have a fairly large dataset that I store in HDF5 and access using
>> PyTables. One operation I need to do on this dataset are pairwise
>> comparisons between each of the elements. This requires 2 loops, one to
>> iterate over each element, and an inner loop to iterate over every other
>> element. This operation thus looks at N(N-1)/2 comparisons.
>>
>> For fairly small sets I found it to be faster to dump the contents into a
>> multdimensional numpy array and then do my iteration. I run into problems
>> with large sets because of memory issues and need to access each element of
>> the dataset at run time.
>>
>> Putting the elements into an array gives me about 600 comparisons per
>> second, while operating on hdf5 data itself gives me about 300 comparisons
>> per second.
>>
>> Is there a way to speed this process up?
>>
>> Example follows (this is not my real code, just an example):
>>
>> *Small Set*:
>>
>>
>> with tb.openFile(h5_file, 'r') as f:
>>     data = f.root.data
>>
>>     N_elements = len(data)
>>     elements = np.empty((N_irises, 1e5))
>>
>>     for ii, d in enumerate(data):
>>         elements[ii] = data['element']
>>
>> D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
>>     for jj in xrange(ii+1, N_elements):
>>         D[ii, jj] = compare(elements[ii], elements[jj])
>>
>>  *Large Set*:
>>
>>
>> with tb.openFile(h5_file, 'r') as f:
>>     data = f.root.data
>>
>>     N_elements = len(data)
>>
>>     D = np.empty((N_irises, N_irises))
>>     for ii in xrange(N_elements):
>>         for jj in xrange(ii+1, N_elements):
>>              D[ii, jj] = compare(data['element'][ii], data['element'][jj])
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. ON SALE this month only -- learn more at:
>> http://p.sf.net/sfu/learnmore_122712
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

pytables-users Mailing List for PyTables - Hierarchical datasets (Page 12)

pytables-users — PyTables users discussion list