From: David R. <dav...@gm...> - 2013-01-04 13:56:58
|
I can't thank you guys enough for the help. I was able to add the __iter__ function to the table.py file and everything seems to be working great! I'm not quite as fast as I was with iterating right of a matrix but pretty close. I was at 555 comparisons per second, and now im at 420. I handled the problem I mentioned earlier by doing this, and it seems to work great: A = f.root.data.cols.A B = f.root.data.cols.B D = np.empty((len(A), len(A)) for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), 2): D[ii, jj] = compare(a1, a2, b1, b2) Again, thanks a lot. -Dave On Thu, Jan 3, 2013 at 6:31 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 3 Jan 2013 17:26:55 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > Gz...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> wrote: > > > Thanks a lot for the help so far guys! > > > > Looking at itertools, I found what I believe to be the perfect function > > for what I need, itertools.combinations. This appears to be a valid > > replacement to the method proposed. > > > > Yes, combinations is awesome! > > > > > > There is a small problem that I didn't mention is that my compare > function > > actually takes as inputs 2 columns from the table. Like so: > > > > D = np.empty((N_irises, N_irises)) > > for ii in xrange(N_elements): > > for jj in xrange(ii+1, N_elements): > > D[ii, jj] = compare(data['element1'][ii], > data['element1'][jj],data['element2'][ii], > > data['element2'][jj]) > > > > Is there an efficient way of using itertools with this structure? > > > > You can always make two other iterators for each column. Since you have > two columns you would have 4 iterators. I am not sure how fast this is > going to be but I am confident that there is definitely a way to do this in > one for-loop, which is going to be way faster than nested loops. > > Be Well > Anthony > > > > > > > > On Thu, Jan 3, 2013 at 1:29 PM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> From: Josh Ayers <jos...@gm...> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> < > >> CAC...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> David, > >> > >> The change in issue 27 was only for iteration over a tables.Column > >> instance. To use it, tweak Anthony's code as follows. This will > iterate > >> over the "element" column, as in your original example. > >> > >> Note also that this will only work with the development version of > >> PyTables > >> available on github. It will be very slow using the released v2.4.0. > >> > >> > >> from itertools import izip > >> > >> with tb.openFile(...) as f: > >> data = f.root.data.cols.element > >> data_i = iter(data) > >> data_j = iter(data) > >> data_i.next() # throw the first value away > >> for i, j in izip(data_i, data_j): > >> compare(i, j) > >> > >> > >> Hope that helps, > >> Josh > >> > >> > >> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> > >> wrote: > >> > >> > HI David, > >> > > >> > Tables and table column iteration have been overhauled fairly recently > >> > [1]. So you might try creating two iterators, offset by one, and then > >> > doing the comparison. I am hacking this out super quick so please > >> forgive > >> > me: > >> > > >> > from itertools import izip > >> > > >> > with tb.openFile(...) as f: > >> > data = f.root.data > >> > data_i = iter(data) > >> > data_j = iter(data) > >> > data_i.next() # throw the first value away > >> > for i, j in izip(data_i, data_j): > >> > compare(i, j) > >> > > >> > You get the idea ;) > >> > > >> > Be Well > >> > Anthony > >> > > >> > 1. https://github.com/PyTables/PyTables/issues/27 > >> > > >> > > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > >> wrote: > >> > > >> >> I was hoping someone could help me out here. > >> >> > >> >> This is from a post I put up on StackOverflow, > >> >> > >> >> I am have a fairly large dataset that I store in HDF5 and access > using > >> >> PyTables. One operation I need to do on this dataset are pairwise > >> >> comparisons between each of the elements. This requires 2 loops, one > to > >> >> iterate over each element, and an inner loop to iterate over every > >> other > >> >> element. This operation thus looks at N(N-1)/2 comparisons. > >> >> > >> >> For fairly small sets I found it to be faster to dump the contents > >> into a > >> >> multdimensional numpy array and then do my iteration. I run into > >> problems > >> >> with large sets because of memory issues and need to access each > >> element of > >> >> the dataset at run time. > >> >> > >> >> Putting the elements into an array gives me about 600 comparisons per > >> >> second, while operating on hdf5 data itself gives me about 300 > >> comparisons > >> >> per second. > >> >> > >> >> Is there a way to speed this process up? > >> >> > >> >> Example follows (this is not my real code, just an example): > >> >> > >> >> *Small Set*: > >> >> > >> >> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> data = f.root.data > >> >> > >> >> N_elements = len(data) > >> >> elements = np.empty((N_irises, 1e5)) > >> >> > >> >> for ii, d in enumerate(data): > >> >> elements[ii] = data['element'] > >> >> > >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > >> >> for jj in xrange(ii+1, N_elements): > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > >> >> > >> >> *Large Set*: > >> >> > >> >> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> data = f.root.data > >> >> > >> >> N_elements = len(data) > >> >> > >> >> D = np.empty((N_irises, N_irises)) > >> >> for ii in xrange(N_elements): > >> >> for jj in xrange(ii+1, N_elements): > >> >> D[ii, jj] = compare(data['element'][ii], > >> data['element'][jj]) > >> >> > >> >> > >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> >> MVPs and experts. ON SALE this month only -- learn more at: > >> >> http://p.sf.net/sfu/learnmore_122712 > >> >> _______________________________________________ > >> >> Pytables-users mailing list > >> >> Pyt...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> > >> > > >> > > >> > > >> > ------------------------------------------------------------------------------ > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> > MVPs and experts. ON SALE this month only -- learn more at: > >> > http://p.sf.net/sfu/learnmore_122712 > >> > _______________________________________________ > >> > Pytables-users mailing list > >> > Pyt...@li... > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > > >> > > >> -------------- next part -------------- > >> An HTML attachment was scrubbed... > >> > >> ------------------------------ > >> > >> > >> > ------------------------------------------------------------------------------ > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> MVPs and experts. ON SALE this month only -- learn more at: > >> http://p.sf.net/sfu/learnmore_122712 > >> > >> ------------------------------ > >> > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> ********************************************* > >> > > > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Thu, 3 Jan 2013 17:30:59 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Josh is right that you can just edit the code by hand (which works but > sucks). > > However, on Windows -- on the rare occasion when I also have to develop on > it -- I typically use a distribution that includes a compiler, cython, > hdf5, and pytables already and then I install my development version from > github OVER this. I recommend either EPD or Anaconda, though other > distributions listed here [1] might also work. > > Be well > Anthony > > 1. http://numfocus.org/projects-2/software-distributions/ > > > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote: > > > The change was in pure Python code, so you should be able to just paste > in > > the changes to your local copy. Start with the table.Column.__iter__ > > method (lines 3296-3310) here. > > > > > > > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > > > > It needs to be modified slightly because it uses some additional features > > that aren't available in the released version (the out=buf_slice argument > > to table.read). The following should work. > > > > def __iter__(self): > > table = self.table > > itemsize = self.dtype.itemsize > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize > > max_row = len(self) > > for start_row in xrange(0, len(self), nrowsinbuf): > > end_row = min([start_row + nrowsinbuf, max_row]) > > buf = table.read(start_row, end_row, 1, field=self.pathname) > > for row in buf: > > yield row > > > > > > I haven't tested this, but I think it will work. > > > > Josh > > > > > > > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> > wrote: > > > >> I apologize if I'm starting to sound helpless, but I'm forced to work on > >> Windows 7 at work and have never had luck compiling python source > >> successfully. I have had to rely on precompiled binaries and now its > >> biting me in the butt. > >> > >> Is there any quick fix I can do to improve this iteration using v2.4.0? > >> > >> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > >> pyt...@li...> wrote: > >> > >>> Send Pytables-users mailing list submissions to > >>> pyt...@li... > >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> or, via email, send a message with subject or body 'help' to > >>> pyt...@li... > >>> > >>> You can reach the person managing the list at > >>> pyt...@li... > >>> > >>> When replying, please edit your Subject line so it is more specific > >>> than "Re: Contents of Pytables-users digest..." > >>> > >>> > >>> Today's Topics: > >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) > >>> > >>> > >>> ---------------------------------------------------------------------- > >>> > >>> Message: 1 > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > >>> From: David Reed <dav...@gm...> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 > >>> To: pyt...@li... > >>> Message-ID: > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > >>> ev...@ma...> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> Thanks Anthony, but unless Im missing something I don't think that > method > >>> will work since this will only be comparing the ith element with ith+1 > >>> element. I still need 2 for loops right? > >>> > >>> Using itertools might speed things up though, I've never used them so I > >>> will give it a shot and let you know how it goes. Looks like I need to > >>> download the latest release before I do that too. Thanks for the help. > >>> > >>> -Dave > >>> > >>> > >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > >>> pyt...@li...> wrote: > >>> > >>> > Send Pytables-users mailing list submissions to > >>> > pyt...@li... > >>> > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > or, via email, send a message with subject or body 'help' to > >>> > pyt...@li... > >>> > > >>> > You can reach the person managing the list at > >>> > pyt...@li... > >>> > > >>> > When replying, please edit your Subject line so it is more specific > >>> > than "Re: Contents of Pytables-users digest..." > >>> > > >>> > > >>> > Today's Topics: > >>> > > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) > >>> > > >>> > > >>> > > ---------------------------------------------------------------------- > >>> > > >>> > Message: 1 > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > >>> > From: Anthony Scopatz <sc...@gm...> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > >>> > To: Discussion list for PyTables > >>> > <pyt...@li...> > >>> > Message-ID: > >>> > <CAPk-6T5b= > >>> > 1EG...@ma...> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > > >>> > HI David, > >>> > > >>> > Tables and table column iteration have been overhauled fairly > recently > >>> [1]. > >>> > So you might try creating two iterators, offset by one, and then > >>> doing the > >>> > comparison. I am hacking this out super quick so please forgive me: > >>> > > >>> > from itertools import izip > >>> > > >>> > with tb.openFile(...) as f: > >>> > data = f.root.data > >>> > data_i = iter(data) > >>> > data_j = iter(data) > >>> > data_i.next() # throw the first value away > >>> > for i, j in izip(data_i, data_j): > >>> > compare(i, j) > >>> > > >>> > You get the idea ;) > >>> > > >>> > Be Well > >>> > Anthony > >>> > > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > >>> > > >>> > > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > >>> wrote: > >>> > > >>> > > I was hoping someone could help me out here. > >>> > > > >>> > > This is from a post I put up on StackOverflow, > >>> > > > >>> > > I am have a fairly large dataset that I store in HDF5 and access > >>> using > >>> > > PyTables. One operation I need to do on this dataset are pairwise > >>> > > comparisons between each of the elements. This requires 2 loops, > one > >>> to > >>> > > iterate over each element, and an inner loop to iterate over every > >>> other > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. > >>> > > > >>> > > For fairly small sets I found it to be faster to dump the contents > >>> into a > >>> > > multdimensional numpy array and then do my iteration. I run into > >>> problems > >>> > > with large sets because of memory issues and need to access each > >>> element > >>> > of > >>> > > the dataset at run time. > >>> > > > >>> > > Putting the elements into an array gives me about 600 comparisons > per > >>> > > second, while operating on hdf5 data itself gives me about 300 > >>> > comparisons > >>> > > per second. > >>> > > > >>> > > Is there a way to speed this process up? > >>> > > > >>> > > Example follows (this is not my real code, just an example): > >>> > > > >>> > > *Small Set*: > >>> > > > >>> > > > >>> > > with tb.openFile(h5_file, 'r') as f: > >>> > > data = f.root.data > >>> > > > >>> > > N_elements = len(data) > >>> > > elements = np.empty((N_irises, 1e5)) > >>> > > > >>> > > for ii, d in enumerate(data): > >>> > > elements[ii] = data['element'] > >>> > > > >>> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > >>> > > for jj in xrange(ii+1, N_elements): > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) > >>> > > > >>> > > *Large Set*: > >>> > > > >>> > > > >>> > > with tb.openFile(h5_file, 'r') as f: > >>> > > data = f.root.data > >>> > > > >>> > > N_elements = len(data) > >>> > > > >>> > > D = np.empty((N_irises, N_irises)) > >>> > > for ii in xrange(N_elements): > >>> > > for jj in xrange(ii+1, N_elements): > >>> > > D[ii, jj] = compare(data['element'][ii], > >>> > data['element'][jj]) > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >>> > > http://p.sf.net/sfu/learnmore_122712 > >>> > > _______________________________________________ > >>> > > Pytables-users mailing list > >>> > > Pyt...@li... > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > > >>> > > > >>> > -------------- next part -------------- > >>> > An HTML attachment was scrubbed... > >>> > > >>> > ------------------------------ > >>> > > >>> > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >>> > http://p.sf.net/sfu/learnmore_122712 > >>> > > >>> > ------------------------------ > >>> > > >>> > _______________________________________________ > >>> > Pytables-users mailing list > >>> > Pyt...@li... > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > >>> > > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > >>> > ********************************************* > >>> > > >>> -------------- next part -------------- > >>> An HTML attachment was scrubbed... > >>> > >>> ------------------------------ > >>> > >>> Message: 2 > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > >>> From: David Reed <dav...@gm...> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > >>> To: pyt...@li... > >>> Message-ID: > >>> < > >>> CAM...@ma...> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> Thanks a lot for the help so far guys! > >>> > >>> Looking at itertools, I found what I believe to be the perfect function > >>> for > >>> what I need, itertools.combinations. This appears to be a valid > >>> replacement > >>> to the method proposed. > >>> > >>> There is a small problem that I didn't mention is that my compare > >>> function > >>> actually takes as inputs 2 columns from the table. Like so: > >>> > >>> D = np.empty((N_irises, N_irises)) > >>> for ii in xrange(N_elements): > >>> for jj in xrange(ii+1, N_elements): > >>> D[ii, jj] = compare(data['element1'][ii], > >>> data['element1'][jj],data['element2'][ii], > >>> data['element2'][jj]) > >>> > >>> Is there an efficient way of using itertools with this structure? > >>> > >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > >>> pyt...@li...> wrote: > >>> > >>> > Send Pytables-users mailing list submissions to > >>> > pyt...@li... > >>> > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > or, via email, send a message with subject or body 'help' to > >>> > pyt...@li... > >>> > > >>> > You can reach the person managing the list at > >>> > pyt...@li... > >>> > > >>> > When replying, please edit your Subject line so it is more specific > >>> > than "Re: Contents of Pytables-users digest..." > >>> > > >>> > > >>> > Today's Topics: > >>> > > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > >>> > > >>> > > >>> > > ---------------------------------------------------------------------- > >>> > > >>> > Message: 1 > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > >>> > From: Josh Ayers <jos...@gm...> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > >>> > To: Discussion list for PyTables > >>> > <pyt...@li...> > >>> > Message-ID: > >>> > < > >>> > CAC...@ma...> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > > >>> > David, > >>> > > >>> > The change in issue 27 was only for iteration over a tables.Column > >>> > instance. To use it, tweak Anthony's code as follows. This will > >>> iterate > >>> > over the "element" column, as in your original example. > >>> > > >>> > Note also that this will only work with the development version of > >>> PyTables > >>> > available on github. It will be very slow using the released v2.4.0. > >>> > > >>> > > >>> > from itertools import izip > >>> > > >>> > with tb.openFile(...) as f: > >>> > data = f.root.data.cols.element > >>> > data_i = iter(data) > >>> > data_j = iter(data) > >>> > data_i.next() # throw the first value away > >>> > for i, j in izip(data_i, data_j): > >>> > compare(i, j) > >>> > > >>> > > >>> > Hope that helps, > >>> > Josh > >>> > > >>> > > >>> > > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> > >>> wrote: > >>> > > >>> > > HI David, > >>> > > > >>> > > Tables and table column iteration have been overhauled fairly > >>> recently > >>> > > [1]. So you might try creating two iterators, offset by one, and > >>> then > >>> > > doing the comparison. I am hacking this out super quick so please > >>> > forgive > >>> > > me: > >>> > > > >>> > > from itertools import izip > >>> > > > >>> > > with tb.openFile(...) as f: > >>> > > data = f.root.data > >>> > > data_i = iter(data) > >>> > > data_j = iter(data) > >>> > > data_i.next() # throw the first value away > >>> > > for i, j in izip(data_i, data_j): > >>> > > compare(i, j) > >>> > > > >>> > > You get the idea ;) > >>> > > > >>> > > Be Well > >>> > > Anthony > >>> > > > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > >>> > > > >>> > > > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm... > > > >>> > wrote: > >>> > > > >>> > >> I was hoping someone could help me out here. > >>> > >> > >>> > >> This is from a post I put up on StackOverflow, > >>> > >> > >>> > >> I am have a fairly large dataset that I store in HDF5 and access > >>> using > >>> > >> PyTables. One operation I need to do on this dataset are pairwise > >>> > >> comparisons between each of the elements. This requires 2 loops, > >>> one to > >>> > >> iterate over each element, and an inner loop to iterate over every > >>> other > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. > >>> > >> > >>> > >> For fairly small sets I found it to be faster to dump the contents > >>> into > >>> > a > >>> > >> multdimensional numpy array and then do my iteration. I run into > >>> > problems > >>> > >> with large sets because of memory issues and need to access each > >>> > element of > >>> > >> the dataset at run time. > >>> > >> > >>> > >> Putting the elements into an array gives me about 600 comparisons > >>> per > >>> > >> second, while operating on hdf5 data itself gives me about 300 > >>> > comparisons > >>> > >> per second. > >>> > >> > >>> > >> Is there a way to speed this process up? > >>> > >> > >>> > >> Example follows (this is not my real code, just an example): > >>> > >> > >>> > >> *Small Set*: > >>> > >> > >>> > >> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >>> > >> data = f.root.data > >>> > >> > >>> > >> N_elements = len(data) > >>> > >> elements = np.empty((N_irises, 1e5)) > >>> > >> > >>> > >> for ii, d in enumerate(data): > >>> > >> elements[ii] = data['element'] > >>> > >> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > >>> > >> for jj in xrange(ii+1, N_elements): > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) > >>> > >> > >>> > >> *Large Set*: > >>> > >> > >>> > >> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >>> > >> data = f.root.data > >>> > >> > >>> > >> N_elements = len(data) > >>> > >> > >>> > >> D = np.empty((N_irises, N_irises)) > >>> > >> for ii in xrange(N_elements): > >>> > >> for jj in xrange(ii+1, N_elements): > >>> > >> D[ii, jj] = compare(data['element'][ii], > >>> > data['element'][jj]) > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >>> CSS, > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >>> > >> http://p.sf.net/sfu/learnmore_122712 > >>> > >> _______________________________________________ > >>> > >> Pytables-users mailing list > >>> > >> Pyt...@li... > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >> > >>> > >> > >>> > > > >>> > > > >>> > > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >>> > > http://p.sf.net/sfu/learnmore_122712 > >>> > > _______________________________________________ > >>> > > Pytables-users mailing list > >>> > > Pyt...@li... > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > > >>> > > > >>> > -------------- next part -------------- > >>> > An HTML attachment was scrubbed... > >>> > > >>> > ------------------------------ > >>> > > >>> > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >>> > http://p.sf.net/sfu/learnmore_122712 > >>> > > >>> > ------------------------------ > >>> > > >>> > _______________________________________________ > >>> > Pytables-users mailing list > >>> > Pyt...@li... > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > >>> > > >>> > End of Pytables-users Digest, Vol 80, Issue 3 > >>> > ********************************************* > >>> > > >>> -------------- next part -------------- > >>> An HTML attachment was scrubbed... > >>> > >>> ------------------------------ > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> MVPs and experts. ON SALE this month only -- learn more at: > >>> http://p.sf.net/sfu/learnmore_122712 > >>> > >>> ------------------------------ > >>> > >>> _______________________________________________ > >>> Pytables-users mailing list > >>> Pyt...@li... > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 > >>> ********************************************* > >>> > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> MVPs and experts. ON SALE this month only -- learn more at: > >> http://p.sf.net/sfu/learnmore_122712 > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 80, Issue 8 > ********************************************* > |