From: Josh A. <jos...@gm...> - 2013-01-03 21:46:22
|
The change was in pure Python code, so you should be able to just paste in the changes to your local copy. Start with the table.Column.__iter__ method (lines 3296-3310) here. https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py It needs to be modified slightly because it uses some additional features that aren't available in the released version (the out=buf_slice argument to table.read). The following should work. def __iter__(self): table = self.table itemsize = self.dtype.itemsize nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize max_row = len(self) for start_row in xrange(0, len(self), nrowsinbuf): end_row = min([start_row + nrowsinbuf, max_row]) buf = table.read(start_row, end_row, 1, field=self.pathname) for row in buf: yield row I haven't tested this, but I think it will work. Josh On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> wrote: > I apologize if I'm starting to sound helpless, but I'm forced to work on > Windows 7 at work and have never had luck compiling python source > successfully. I have had to rely on precompiled binaries and now its > biting me in the butt. > > Is there any quick fix I can do to improve this iteration using v2.4.0? > > > On Thu, Jan 3, 2013 at 3:17 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> From: David Reed <dav...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 >> To: pyt...@li... >> Message-ID: >> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> ev...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Thanks Anthony, but unless Im missing something I don't think that method >> will work since this will only be comparing the ith element with ith+1 >> element. I still need 2 for loops right? >> >> Using itertools might speed things up though, I've never used them so I >> will give it a shot and let you know how it goes. Looks like I need to >> download the latest release before I do that too. Thanks for the help. >> >> -Dave >> >> >> >> On Thu, Jan 3, 2013 at 12:12 PM, < >> pyt...@li...> wrote: >> >> > Send Pytables-users mailing list submissions to >> > pyt...@li... >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > or, via email, send a message with subject or body 'help' to >> > pyt...@li... >> > >> > You can reach the person managing the list at >> > pyt...@li... >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Pytables-users digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> > From: Anthony Scopatz <sc...@gm...> >> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > <CAPk-6T5b= >> > 1EG...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > HI David, >> > >> > Tables and table column iteration have been overhauled fairly recently >> [1]. >> > So you might try creating two iterators, offset by one, and then doing >> the >> > comparison. I am hacking this out super quick so please forgive me: >> > >> > from itertools import izip >> > >> > with tb.openFile(...) as f: >> > data = f.root.data >> > data_i = iter(data) >> > data_j = iter(data) >> > data_i.next() # throw the first value away >> > for i, j in izip(data_i, data_j): >> > compare(i, j) >> > >> > You get the idea ;) >> > >> > Be Well >> > Anthony >> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> wrote: >> > >> > > I was hoping someone could help me out here. >> > > >> > > This is from a post I put up on StackOverflow, >> > > >> > > I am have a fairly large dataset that I store in HDF5 and access using >> > > PyTables. One operation I need to do on this dataset are pairwise >> > > comparisons between each of the elements. This requires 2 loops, one >> to >> > > iterate over each element, and an inner loop to iterate over every >> other >> > > element. This operation thus looks at N(N-1)/2 comparisons. >> > > >> > > For fairly small sets I found it to be faster to dump the contents >> into a >> > > multdimensional numpy array and then do my iteration. I run into >> problems >> > > with large sets because of memory issues and need to access each >> element >> > of >> > > the dataset at run time. >> > > >> > > Putting the elements into an array gives me about 600 comparisons per >> > > second, while operating on hdf5 data itself gives me about 300 >> > comparisons >> > > per second. >> > > >> > > Is there a way to speed this process up? >> > > >> > > Example follows (this is not my real code, just an example): >> > > >> > > *Small Set*: >> > > >> > > >> > > with tb.openFile(h5_file, 'r') as f: >> > > data = f.root.data >> > > >> > > N_elements = len(data) >> > > elements = np.empty((N_irises, 1e5)) >> > > >> > > for ii, d in enumerate(data): >> > > elements[ii] = data['element'] >> > > >> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> > > for jj in xrange(ii+1, N_elements): >> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> > > >> > > *Large Set*: >> > > >> > > >> > > with tb.openFile(h5_file, 'r') as f: >> > > data = f.root.data >> > > >> > > N_elements = len(data) >> > > >> > > D = np.empty((N_irises, N_irises)) >> > > for ii in xrange(N_elements): >> > > for jj in xrange(ii+1, N_elements): >> > > D[ii, jj] = compare(data['element'][ii], >> > data['element'][jj]) >> > > >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------------ >> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > > MVPs and experts. ON SALE this month only -- learn more at: >> > > http://p.sf.net/sfu/learnmore_122712 >> > > _______________________________________________ >> > > Pytables-users mailing list >> > > Pyt...@li... >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > >> > ------------------------------ >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > >> > ------------------------------ >> > >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > End of Pytables-users Digest, Vol 80, Issue 2 >> > ********************************************* >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 2 >> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> From: David Reed <dav...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> To: pyt...@li... >> Message-ID: >> < >> CAM...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Thanks a lot for the help so far guys! >> >> Looking at itertools, I found what I believe to be the perfect function >> for >> what I need, itertools.combinations. This appears to be a valid >> replacement >> to the method proposed. >> >> There is a small problem that I didn't mention is that my compare function >> actually takes as inputs 2 columns from the table. Like so: >> >> D = np.empty((N_irises, N_irises)) >> for ii in xrange(N_elements): >> for jj in xrange(ii+1, N_elements): >> D[ii, jj] = compare(data['element1'][ii], >> data['element1'][jj],data['element2'][ii], >> data['element2'][jj]) >> >> Is there an efficient way of using itertools with this structure? >> >> >> On Thu, Jan 3, 2013 at 1:29 PM, < >> pyt...@li...> wrote: >> >> > Send Pytables-users mailing list submissions to >> > pyt...@li... >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > or, via email, send a message with subject or body 'help' to >> > pyt...@li... >> > >> > You can reach the person managing the list at >> > pyt...@li... >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Pytables-users digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> > From: Josh Ayers <jos...@gm...> >> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > < >> > CAC...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > David, >> > >> > The change in issue 27 was only for iteration over a tables.Column >> > instance. To use it, tweak Anthony's code as follows. This will >> iterate >> > over the "element" column, as in your original example. >> > >> > Note also that this will only work with the development version of >> PyTables >> > available on github. It will be very slow using the released v2.4.0. >> > >> > >> > from itertools import izip >> > >> > with tb.openFile(...) as f: >> > data = f.root.data.cols.element >> > data_i = iter(data) >> > data_j = iter(data) >> > data_i.next() # throw the first value away >> > for i, j in izip(data_i, data_j): >> > compare(i, j) >> > >> > >> > Hope that helps, >> > Josh >> > >> > >> > >> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> wrote: >> > >> > > HI David, >> > > >> > > Tables and table column iteration have been overhauled fairly recently >> > > [1]. So you might try creating two iterators, offset by one, and then >> > > doing the comparison. I am hacking this out super quick so please >> > forgive >> > > me: >> > > >> > > from itertools import izip >> > > >> > > with tb.openFile(...) as f: >> > > data = f.root.data >> > > data_i = iter(data) >> > > data_j = iter(data) >> > > data_i.next() # throw the first value away >> > > for i, j in izip(data_i, data_j): >> > > compare(i, j) >> > > >> > > You get the idea ;) >> > > >> > > Be Well >> > > Anthony >> > > >> > > 1. https://github.com/PyTables/PyTables/issues/27 >> > > >> > > >> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> > wrote: >> > > >> > >> I was hoping someone could help me out here. >> > >> >> > >> This is from a post I put up on StackOverflow, >> > >> >> > >> I am have a fairly large dataset that I store in HDF5 and access >> using >> > >> PyTables. One operation I need to do on this dataset are pairwise >> > >> comparisons between each of the elements. This requires 2 loops, one >> to >> > >> iterate over each element, and an inner loop to iterate over every >> other >> > >> element. This operation thus looks at N(N-1)/2 comparisons. >> > >> >> > >> For fairly small sets I found it to be faster to dump the contents >> into >> > a >> > >> multdimensional numpy array and then do my iteration. I run into >> > problems >> > >> with large sets because of memory issues and need to access each >> > element of >> > >> the dataset at run time. >> > >> >> > >> Putting the elements into an array gives me about 600 comparisons per >> > >> second, while operating on hdf5 data itself gives me about 300 >> > comparisons >> > >> per second. >> > >> >> > >> Is there a way to speed this process up? >> > >> >> > >> Example follows (this is not my real code, just an example): >> > >> >> > >> *Small Set*: >> > >> >> > >> >> > >> with tb.openFile(h5_file, 'r') as f: >> > >> data = f.root.data >> > >> >> > >> N_elements = len(data) >> > >> elements = np.empty((N_irises, 1e5)) >> > >> >> > >> for ii, d in enumerate(data): >> > >> elements[ii] = data['element'] >> > >> >> > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> > >> for jj in xrange(ii+1, N_elements): >> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> > >> >> > >> *Large Set*: >> > >> >> > >> >> > >> with tb.openFile(h5_file, 'r') as f: >> > >> data = f.root.data >> > >> >> > >> N_elements = len(data) >> > >> >> > >> D = np.empty((N_irises, N_irises)) >> > >> for ii in xrange(N_elements): >> > >> for jj in xrange(ii+1, N_elements): >> > >> D[ii, jj] = compare(data['element'][ii], >> > data['element'][jj]) >> > >> >> > >> >> > >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> MVPs and experts. ON SALE this month only -- learn more at: >> > >> http://p.sf.net/sfu/learnmore_122712 >> > >> _______________________________________________ >> > >> Pytables-users mailing list >> > >> Pyt...@li... >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> > >> >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------------ >> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > > MVPs and experts. ON SALE this month only -- learn more at: >> > > http://p.sf.net/sfu/learnmore_122712 >> > > _______________________________________________ >> > > Pytables-users mailing list >> > > Pyt...@li... >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > >> > ------------------------------ >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > >> > ------------------------------ >> > >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > End of Pytables-users Digest, Vol 80, Issue 3 >> > ********************************************* >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 80, Issue 4 >> ********************************************* >> > > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |