Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The change was in pure Python code, so you should be able to just paste in
the changes to your local copy.  Start with the table.Column.__iter__
method (lines 3296-3310) here.

https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py

It needs to be modified slightly because it uses some additional features
that aren't available in the released version (the out=buf_slice argument
to table.read).  The following should work.

def __iter__(self):
        table = self.table
        itemsize = self.dtype.itemsize
        nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize
        max_row = len(self)
        for start_row in xrange(0, len(self), nrowsinbuf):
            end_row = min([start_row + nrowsinbuf, max_row])
            buf = table.read(start_row, end_row, 1, field=self.pathname)
            for row in buf:
                yield row

I haven't tested this, but I think it will work.

Josh

On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> wrote:

> I apologize if I'm starting to sound helpless, but I'm forced to work on
> Windows 7 at work and have never had luck compiling python source
> successfully.  I have had to rely on precompiled binaries and now its
> biting me in the butt.
>
> Is there any quick fix I can do to improve this iteration using v2.4.0?
>
>
> On Thu, Jan 3, 2013 at 3:17 PM, <
> pyt...@li...> wrote:
>
>> Send Pytables-users mailing list submissions to
>>         pyt...@li...
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://lists.sourceforge.net/lists/listinfo/pytables-users
>> or, via email, send a message with subject or body 'help' to
>>         pyt...@li...
>>
>> You can reach the person managing the list at
>>         pyt...@li...
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Pytables-users digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed)
>>    2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Thu, 3 Jan 2013 13:44:29 -0500
>> From: David Reed <dav...@gm...>
>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2
>> To: pyt...@li...
>> Message-ID:
>>         <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
>> ev...@ma...>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Thanks Anthony, but unless Im missing something I don't think that method
>> will work since this will only be comparing the ith element with ith+1
>> element.  I still need 2 for loops right?
>>
>> Using itertools might speed things up though, I've never used them so I
>> will give it a shot and let you know how it goes.  Looks like I need to
>> download the latest release before I do that too.  Thanks for the help.
>>
>> -Dave
>>
>>
>>
>> On Thu, Jan 3, 2013 at 12:12 PM, <
>> pyt...@li...> wrote:
>>
>> > Send Pytables-users mailing list submissions to
>> >         pyt...@li...
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > or, via email, send a message with subject or body 'help' to
>> >         pyt...@li...
>> >
>> > You can reach the person managing the list at
>> >         pyt...@li...
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of Pytables-users digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> >    1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)
>> >
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Message: 1
>> > Date: Thu, 3 Jan 2013 11:11:47 -0600
>> > From: Anthony Scopatz <sc...@gm...>
>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
>> > To: Discussion list for PyTables
>> >         <pyt...@li...>
>> > Message-ID:
>> >         <CAPk-6T5b=
>> > 1EG...@ma...>
>> > Content-Type: text/plain; charset="iso-8859-1"
>> >
>> > HI David,
>> >
>> > Tables and table column iteration have been overhauled fairly recently
>> [1].
>> >  So you might try creating two iterators, offset by one, and then doing
>> the
>> > comparison.  I am hacking this out super quick so please forgive me:
>> >
>> > from itertools import izip
>> >
>> > with tb.openFile(...) as f:
>> >     data = f.root.data
>> >     data_i = iter(data)
>> >     data_j = iter(data)
>> >     data_i.next() # throw the first value away
>> >     for i, j in izip(data_i, data_j):
>> >         compare(i, j)
>> >
>> > You get the idea ;)
>> >
>> > Be Well
>> > Anthony
>> >
>> > 1. https://github.com/PyTables/PyTables/issues/27
>> >
>> >
>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>
>> wrote:
>> >
>> > > I was hoping someone could help me out here.
>> > >
>> > > This is from a post I put up on StackOverflow,
>> > >
>> > > I am have a fairly large dataset that I store in HDF5 and access using
>> > > PyTables. One operation I need to do on this dataset are pairwise
>> > > comparisons between each of the elements. This requires 2 loops, one
>> to
>> > > iterate over each element, and an inner loop to iterate over every
>> other
>> > > element. This operation thus looks at N(N-1)/2 comparisons.
>> > >
>> > > For fairly small sets I found it to be faster to dump the contents
>> into a
>> > > multdimensional numpy array and then do my iteration. I run into
>> problems
>> > > with large sets because of memory issues and need to access each
>> element
>> > of
>> > > the dataset at run time.
>> > >
>> > > Putting the elements into an array gives me about 600 comparisons per
>> > > second, while operating on hdf5 data itself gives me about 300
>> > comparisons
>> > > per second.
>> > >
>> > > Is there a way to speed this process up?
>> > >
>> > > Example follows (this is not my real code, just an example):
>> > >
>> > > *Small Set*:
>> > >
>> > >
>> > > with tb.openFile(h5_file, 'r') as f:
>> > >     data = f.root.data
>> > >
>> > >     N_elements = len(data)
>> > >     elements = np.empty((N_irises, 1e5))
>> > >
>> > >     for ii, d in enumerate(data):
>> > >         elements[ii] = data['element']
>> > >
>> > > D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
>> > >     for jj in xrange(ii+1, N_elements):
>> > >         D[ii, jj] = compare(elements[ii], elements[jj])
>> > >
>> > >  *Large Set*:
>> > >
>> > >
>> > > with tb.openFile(h5_file, 'r') as f:
>> > >     data = f.root.data
>> > >
>> > >     N_elements = len(data)
>> > >
>> > >     D = np.empty((N_irises, N_irises))
>> > >     for ii in xrange(N_elements):
>> > >         for jj in xrange(ii+1, N_elements):
>> > >              D[ii, jj] = compare(data['element'][ii],
>> > data['element'][jj])
>> > >
>> > >
>> > >
>> > >
>> >
>> ------------------------------------------------------------------------------
>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
>> current
>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> > > MVPs and experts. ON SALE this month only -- learn more at:
>> > > http://p.sf.net/sfu/learnmore_122712
>> > > _______________________________________________
>> > > Pytables-users mailing list
>> > > Pyt...@li...
>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >
>> > >
>> > -------------- next part --------------
>> > An HTML attachment was scrubbed...
>> >
>> > ------------------------------
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> > MVPs and experts. ON SALE this month only -- learn more at:
>> > http://p.sf.net/sfu/learnmore_122712
>> >
>> > ------------------------------
>> >
>> > _______________________________________________
>> > Pytables-users mailing list
>> > Pyt...@li...
>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>> >
>> > End of Pytables-users Digest, Vol 80, Issue 2
>> > *********************************************
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Thu, 3 Jan 2013 15:17:01 -0500
>> From: David Reed <dav...@gm...>
>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3
>> To: pyt...@li...
>> Message-ID:
>>         <
>> CAM...@ma...>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Thanks a lot for the help so far guys!
>>
>> Looking at itertools, I found what I believe to be the perfect function
>> for
>> what I need, itertools.combinations. This appears to be a valid
>> replacement
>> to the method proposed.
>>
>> There is a small problem that I didn't mention is that my compare function
>> actually takes as inputs 2 columns from the table. Like so:
>>
>> D = np.empty((N_irises, N_irises))
>> for ii in xrange(N_elements):
>>     for jj in xrange(ii+1, N_elements):
>>          D[ii, jj] = compare(data['element1'][ii],
>> data['element1'][jj],data['element2'][ii],
>> data['element2'][jj])
>>
>> Is there an efficient way of using itertools with this structure?
>>
>>
>> On Thu, Jan 3, 2013 at 1:29 PM, <
>> pyt...@li...> wrote:
>>
>> > Send Pytables-users mailing list submissions to
>> >         pyt...@li...
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > or, via email, send a message with subject or body 'help' to
>> >         pyt...@li...
>> >
>> > You can reach the person managing the list at
>> >         pyt...@li...
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of Pytables-users digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> >    1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)
>> >
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Message: 1
>> > Date: Thu, 3 Jan 2013 10:29:33 -0800
>> > From: Josh Ayers <jos...@gm...>
>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
>> > To: Discussion list for PyTables
>> >         <pyt...@li...>
>> > Message-ID:
>> >         <
>> > CAC...@ma...>
>> > Content-Type: text/plain; charset="iso-8859-1"
>> >
>> > David,
>> >
>> > The change in issue 27 was only for iteration over a tables.Column
>> > instance.  To use it, tweak Anthony's code as follows.  This will
>> iterate
>> > over the "element" column, as in your original example.
>> >
>> > Note also that this will only work with the development version of
>> PyTables
>> > available on github.  It will be very slow using the released v2.4.0.
>> >
>> >
>> > from itertools import izip
>> >
>> > with tb.openFile(...) as f:
>> >     data = f.root.data.cols.element
>> >     data_i = iter(data)
>> >     data_j = iter(data)
>> >     data_i.next() # throw the first value away
>> >     for i, j in izip(data_i, data_j):
>> >         compare(i, j)
>> >
>> >
>> > Hope that helps,
>> > Josh
>> >
>> >
>> >
>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...>
>> wrote:
>> >
>> > > HI David,
>> > >
>> > > Tables and table column iteration have been overhauled fairly recently
>> > > [1].  So you might try creating two iterators, offset by one, and then
>> > > doing the comparison.  I am hacking this out super quick so please
>> > forgive
>> > > me:
>> > >
>> > > from itertools import izip
>> > >
>> > > with tb.openFile(...) as f:
>> > >     data = f.root.data
>> > >     data_i = iter(data)
>> > >     data_j = iter(data)
>> > >     data_i.next() # throw the first value away
>> > >     for i, j in izip(data_i, data_j):
>> > >         compare(i, j)
>> > >
>> > > You get the idea ;)
>> > >
>> > > Be Well
>> > > Anthony
>> > >
>> > > 1. https://github.com/PyTables/PyTables/issues/27
>> > >
>> > >
>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>
>> > wrote:
>> > >
>> > >> I was hoping someone could help me out here.
>> > >>
>> > >> This is from a post I put up on StackOverflow,
>> > >>
>> > >> I am have a fairly large dataset that I store in HDF5 and access
>> using
>> > >> PyTables. One operation I need to do on this dataset are pairwise
>> > >> comparisons between each of the elements. This requires 2 loops, one
>> to
>> > >> iterate over each element, and an inner loop to iterate over every
>> other
>> > >> element. This operation thus looks at N(N-1)/2 comparisons.
>> > >>
>> > >> For fairly small sets I found it to be faster to dump the contents
>> into
>> > a
>> > >> multdimensional numpy array and then do my iteration. I run into
>> > problems
>> > >> with large sets because of memory issues and need to access each
>> > element of
>> > >> the dataset at run time.
>> > >>
>> > >> Putting the elements into an array gives me about 600 comparisons per
>> > >> second, while operating on hdf5 data itself gives me about 300
>> > comparisons
>> > >> per second.
>> > >>
>> > >> Is there a way to speed this process up?
>> > >>
>> > >> Example follows (this is not my real code, just an example):
>> > >>
>> > >> *Small Set*:
>> > >>
>> > >>
>> > >> with tb.openFile(h5_file, 'r') as f:
>> > >>     data = f.root.data
>> > >>
>> > >>     N_elements = len(data)
>> > >>     elements = np.empty((N_irises, 1e5))
>> > >>
>> > >>     for ii, d in enumerate(data):
>> > >>         elements[ii] = data['element']
>> > >>
>> > >> D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
>> > >>     for jj in xrange(ii+1, N_elements):
>> > >>         D[ii, jj] = compare(elements[ii], elements[jj])
>> > >>
>> > >>  *Large Set*:
>> > >>
>> > >>
>> > >> with tb.openFile(h5_file, 'r') as f:
>> > >>     data = f.root.data
>> > >>
>> > >>     N_elements = len(data)
>> > >>
>> > >>     D = np.empty((N_irises, N_irises))
>> > >>     for ii in xrange(N_elements):
>> > >>         for jj in xrange(ii+1, N_elements):
>> > >>              D[ii, jj] = compare(data['element'][ii],
>> > data['element'][jj])
>> > >>
>> > >>
>> > >>
>> > >>
>> >
>> ------------------------------------------------------------------------------
>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
>> current
>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> > >> MVPs and experts. ON SALE this month only -- learn more at:
>> > >> http://p.sf.net/sfu/learnmore_122712
>> > >> _______________________________________________
>> > >> Pytables-users mailing list
>> > >> Pyt...@li...
>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >>
>> > >>
>> > >
>> > >
>> > >
>> >
>> ------------------------------------------------------------------------------
>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
>> current
>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> > > MVPs and experts. ON SALE this month only -- learn more at:
>> > > http://p.sf.net/sfu/learnmore_122712
>> > > _______________________________________________
>> > > Pytables-users mailing list
>> > > Pyt...@li...
>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >
>> > >
>> > -------------- next part --------------
>> > An HTML attachment was scrubbed...
>> >
>> > ------------------------------
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> > MVPs and experts. ON SALE this month only -- learn more at:
>> > http://p.sf.net/sfu/learnmore_122712
>> >
>> > ------------------------------
>> >
>> > _______________________________________________
>> > Pytables-users mailing list
>> > Pyt...@li...
>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>> >
>> > End of Pytables-users Digest, Vol 80, Issue 3
>> > *********************************************
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>>
>> ------------------------------
>>
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. ON SALE this month only -- learn more at:
>> http://p.sf.net/sfu/learnmore_122712
>>
>> ------------------------------
>>
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>> End of Pytables-users Digest, Vol 80, Issue 4
>> *********************************************
>>
>
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>