From: Anthony S. <sc...@gm...> - 2013-01-04 23:15:50
|
Glad that this worked for you David! On Fri, Jan 4, 2013 at 7:56 AM, David Reed <dav...@gm...> wrote: > I can't thank you guys enough for the help. I was able to add the > __iter__ function to the table.py file and everything seems to be working > great! I'm not quite as fast as I was with iterating right of a matrix but > pretty close. I was at 555 comparisons per second, and now im at 420. > > I handled the problem I mentioned earlier by doing this, and it seems to > work great: > > A = f.root.data.cols.A > B = f.root.data.cols.B > > D = np.empty((len(A), len(A)) > for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), > 2): > D[ii, jj] = compare(a1, a2, b1, b2) > > Again, thanks a lot. > > -Dave > > > On Thu, Jan 3, 2013 at 6:31 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) >> 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 3 Jan 2013 17:26:55 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> Gz...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> >> wrote: >> >> > Thanks a lot for the help so far guys! >> > >> > Looking at itertools, I found what I believe to be the perfect function >> > for what I need, itertools.combinations. This appears to be a valid >> > replacement to the method proposed. >> > >> >> Yes, combinations is awesome! >> >> >> > >> > There is a small problem that I didn't mention is that my compare >> function >> > actually takes as inputs 2 columns from the table. Like so: >> > >> > D = np.empty((N_irises, N_irises)) >> > for ii in xrange(N_elements): >> > for jj in xrange(ii+1, N_elements): >> > D[ii, jj] = compare(data['element1'][ii], >> data['element1'][jj],data['element2'][ii], >> > data['element2'][jj]) >> > >> > Is there an efficient way of using itertools with this structure? >> > >> >> You can always make two other iterators for each column. Since you have >> two columns you would have 4 iterators. I am not sure how fast this is >> going to be but I am confident that there is definitely a way to do this >> in >> one for-loop, which is going to be way faster than nested loops. >> >> Be Well >> Anthony >> >> >> > >> > >> > On Thu, Jan 3, 2013 at 1:29 PM, < >> > pyt...@li...> wrote: >> > >> >> Send Pytables-users mailing list submissions to >> >> pyt...@li... >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> or, via email, send a message with subject or body 'help' to >> >> pyt...@li... >> >> >> >> You can reach the person managing the list at >> >> pyt...@li... >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> From: Josh Ayers <jos...@gm...> >> >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> >> To: Discussion list for PyTables >> >> <pyt...@li...> >> >> Message-ID: >> >> < >> >> CAC...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> David, >> >> >> >> The change in issue 27 was only for iteration over a tables.Column >> >> instance. To use it, tweak Anthony's code as follows. This will >> iterate >> >> over the "element" column, as in your original example. >> >> >> >> Note also that this will only work with the development version of >> >> PyTables >> >> available on github. It will be very slow using the released v2.4.0. >> >> >> >> >> >> from itertools import izip >> >> >> >> with tb.openFile(...) as f: >> >> data = f.root.data.cols.element >> >> data_i = iter(data) >> >> data_j = iter(data) >> >> data_i.next() # throw the first value away >> >> for i, j in izip(data_i, data_j): >> >> compare(i, j) >> >> >> >> >> >> Hope that helps, >> >> Josh >> >> >> >> >> >> >> >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> >> wrote: >> >> >> >> > HI David, >> >> > >> >> > Tables and table column iteration have been overhauled fairly >> recently >> >> > [1]. So you might try creating two iterators, offset by one, and >> then >> >> > doing the comparison. I am hacking this out super quick so please >> >> forgive >> >> > me: >> >> > >> >> > from itertools import izip >> >> > >> >> > with tb.openFile(...) as f: >> >> > data = f.root.data >> >> > data_i = iter(data) >> >> > data_j = iter(data) >> >> > data_i.next() # throw the first value away >> >> > for i, j in izip(data_i, data_j): >> >> > compare(i, j) >> >> > >> >> > You get the idea ;) >> >> > >> >> > Be Well >> >> > Anthony >> >> > >> >> > 1. https://github.com/PyTables/PyTables/issues/27 >> >> > >> >> > >> >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> >> wrote: >> >> > >> >> >> I was hoping someone could help me out here. >> >> >> >> >> >> This is from a post I put up on StackOverflow, >> >> >> >> >> >> I am have a fairly large dataset that I store in HDF5 and access >> using >> >> >> PyTables. One operation I need to do on this dataset are pairwise >> >> >> comparisons between each of the elements. This requires 2 loops, >> one to >> >> >> iterate over each element, and an inner loop to iterate over every >> >> other >> >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> >> >> >> >> >> For fairly small sets I found it to be faster to dump the contents >> >> into a >> >> >> multdimensional numpy array and then do my iteration. I run into >> >> problems >> >> >> with large sets because of memory issues and need to access each >> >> element of >> >> >> the dataset at run time. >> >> >> >> >> >> Putting the elements into an array gives me about 600 comparisons >> per >> >> >> second, while operating on hdf5 data itself gives me about 300 >> >> comparisons >> >> >> per second. >> >> >> >> >> >> Is there a way to speed this process up? >> >> >> >> >> >> Example follows (this is not my real code, just an example): >> >> >> >> >> >> *Small Set*: >> >> >> >> >> >> >> >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> data = f.root.data >> >> >> >> >> >> N_elements = len(data) >> >> >> elements = np.empty((N_irises, 1e5)) >> >> >> >> >> >> for ii, d in enumerate(data): >> >> >> elements[ii] = data['element'] >> >> >> >> >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> >> >> for jj in xrange(ii+1, N_elements): >> >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >> >> >> >> *Large Set*: >> >> >> >> >> >> >> >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> data = f.root.data >> >> >> >> >> >> N_elements = len(data) >> >> >> >> >> >> D = np.empty((N_irises, N_irises)) >> >> >> for ii in xrange(N_elements): >> >> >> for jj in xrange(ii+1, N_elements): >> >> >> D[ii, jj] = compare(data['element'][ii], >> >> data['element'][jj]) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >> _______________________________________________ >> >> >> Pytables-users mailing list >> >> >> Pyt...@li... >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> >> ------------------------------------------------------------------------------ >> >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> > MVPs and experts. ON SALE this month only -- learn more at: >> >> > http://p.sf.net/sfu/learnmore_122712 >> >> > _______________________________________________ >> >> > Pytables-users mailing list >> >> > Pyt...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> > >> >> -------------- next part -------------- >> >> An HTML attachment was scrubbed... >> >> >> >> ------------------------------ >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >> >> ------------------------------ >> >> >> >> _______________________________________________ >> >> Pytables-users mailing list >> >> Pyt...@li... >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> >> End of Pytables-users Digest, Vol 80, Issue 3 >> >> ********************************************* >> >> >> > >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 2 >> Date: Thu, 3 Jan 2013 17:30:59 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Josh is right that you can just edit the code by hand (which works but >> sucks). >> >> However, on Windows -- on the rare occasion when I also have to develop on >> it -- I typically use a distribution that includes a compiler, cython, >> hdf5, and pytables already and then I install my development version from >> github OVER this. I recommend either EPD or Anaconda, though other >> distributions listed here [1] might also work. >> >> Be well >> Anthony >> >> 1. http://numfocus.org/projects-2/software-distributions/ >> >> >> On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote: >> >> > The change was in pure Python code, so you should be able to just paste >> in >> > the changes to your local copy. Start with the table.Column.__iter__ >> > method (lines 3296-3310) here. >> > >> > >> > >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> > >> > It needs to be modified slightly because it uses some additional >> features >> > that aren't available in the released version (the out=buf_slice >> argument >> > to table.read). The following should work. >> > >> > def __iter__(self): >> > table = self.table >> > itemsize = self.dtype.itemsize >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize >> > max_row = len(self) >> > for start_row in xrange(0, len(self), nrowsinbuf): >> > end_row = min([start_row + nrowsinbuf, max_row]) >> > buf = table.read(start_row, end_row, 1, field=self.pathname) >> > for row in buf: >> > yield row >> > >> > >> > I haven't tested this, but I think it will work. >> > >> > Josh >> > >> > >> > >> > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> >> wrote: >> > >> >> I apologize if I'm starting to sound helpless, but I'm forced to work >> on >> >> Windows 7 at work and have never had luck compiling python source >> >> successfully. I have had to rely on precompiled binaries and now its >> >> biting me in the butt. >> >> >> >> Is there any quick fix I can do to improve this iteration using v2.4.0? >> >> >> >> >> >> On Thu, Jan 3, 2013 at 3:17 PM, < >> >> pyt...@li...> wrote: >> >> >> >>> Send Pytables-users mailing list submissions to >> >>> pyt...@li... >> >>> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> or, via email, send a message with subject or body 'help' to >> >>> pyt...@li... >> >>> >> >>> You can reach the person managing the list at >> >>> pyt...@li... >> >>> >> >>> When replying, please edit your Subject line so it is more specific >> >>> than "Re: Contents of Pytables-users digest..." >> >>> >> >>> >> >>> Today's Topics: >> >>> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> >>> >> >>> >> >>> ---------------------------------------------------------------------- >> >>> >> >>> Message: 1 >> >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> >>> From: David Reed <dav...@gm...> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 >> >>> To: pyt...@li... >> >>> Message-ID: >> >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> >>> ev...@ma...> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> >> >>> Thanks Anthony, but unless Im missing something I don't think that >> method >> >>> will work since this will only be comparing the ith element with ith+1 >> >>> element. I still need 2 for loops right? >> >>> >> >>> Using itertools might speed things up though, I've never used them so >> I >> >>> will give it a shot and let you know how it goes. Looks like I need >> to >> >>> download the latest release before I do that too. Thanks for the >> help. >> >>> >> >>> -Dave >> >>> >> >>> >> >>> >> >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> >>> pyt...@li...> wrote: >> >>> >> >>> > Send Pytables-users mailing list submissions to >> >>> > pyt...@li... >> >>> > >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > or, via email, send a message with subject or body 'help' to >> >>> > pyt...@li... >> >>> > >> >>> > You can reach the person managing the list at >> >>> > pyt...@li... >> >>> > >> >>> > When replying, please edit your Subject line so it is more specific >> >>> > than "Re: Contents of Pytables-users digest..." >> >>> > >> >>> > >> >>> > Today's Topics: >> >>> > >> >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) >> >>> > >> >>> > >> >>> > >> ---------------------------------------------------------------------- >> >>> > >> >>> > Message: 1 >> >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> >>> > From: Anthony Scopatz <sc...@gm...> >> >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> PyTables >> >>> > To: Discussion list for PyTables >> >>> > <pyt...@li...> >> >>> > Message-ID: >> >>> > <CAPk-6T5b= >> >>> > 1EG...@ma...> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >>> > >> >>> > HI David, >> >>> > >> >>> > Tables and table column iteration have been overhauled fairly >> recently >> >>> [1]. >> >>> > So you might try creating two iterators, offset by one, and then >> >>> doing the >> >>> > comparison. I am hacking this out super quick so please forgive me: >> >>> > >> >>> > from itertools import izip >> >>> > >> >>> > with tb.openFile(...) as f: >> >>> > data = f.root.data >> >>> > data_i = iter(data) >> >>> > data_j = iter(data) >> >>> > data_i.next() # throw the first value away >> >>> > for i, j in izip(data_i, data_j): >> >>> > compare(i, j) >> >>> > >> >>> > You get the idea ;) >> >>> > >> >>> > Be Well >> >>> > Anthony >> >>> > >> >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> >>> > >> >>> > >> >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> >>> wrote: >> >>> > >> >>> > > I was hoping someone could help me out here. >> >>> > > >> >>> > > This is from a post I put up on StackOverflow, >> >>> > > >> >>> > > I am have a fairly large dataset that I store in HDF5 and access >> >>> using >> >>> > > PyTables. One operation I need to do on this dataset are pairwise >> >>> > > comparisons between each of the elements. This requires 2 loops, >> one >> >>> to >> >>> > > iterate over each element, and an inner loop to iterate over every >> >>> other >> >>> > > element. This operation thus looks at N(N-1)/2 comparisons. >> >>> > > >> >>> > > For fairly small sets I found it to be faster to dump the contents >> >>> into a >> >>> > > multdimensional numpy array and then do my iteration. I run into >> >>> problems >> >>> > > with large sets because of memory issues and need to access each >> >>> element >> >>> > of >> >>> > > the dataset at run time. >> >>> > > >> >>> > > Putting the elements into an array gives me about 600 comparisons >> per >> >>> > > second, while operating on hdf5 data itself gives me about 300 >> >>> > comparisons >> >>> > > per second. >> >>> > > >> >>> > > Is there a way to speed this process up? >> >>> > > >> >>> > > Example follows (this is not my real code, just an example): >> >>> > > >> >>> > > *Small Set*: >> >>> > > >> >>> > > >> >>> > > with tb.openFile(h5_file, 'r') as f: >> >>> > > data = f.root.data >> >>> > > >> >>> > > N_elements = len(data) >> >>> > > elements = np.empty((N_irises, 1e5)) >> >>> > > >> >>> > > for ii, d in enumerate(data): >> >>> > > elements[ii] = data['element'] >> >>> > > >> >>> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> >>> > > for jj in xrange(ii+1, N_elements): >> >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> >>> > > >> >>> > > *Large Set*: >> >>> > > >> >>> > > >> >>> > > with tb.openFile(h5_file, 'r') as f: >> >>> > > data = f.root.data >> >>> > > >> >>> > > N_elements = len(data) >> >>> > > >> >>> > > D = np.empty((N_irises, N_irises)) >> >>> > > for ii in xrange(N_elements): >> >>> > > for jj in xrange(ii+1, N_elements): >> >>> > > D[ii, jj] = compare(data['element'][ii], >> >>> > data['element'][jj]) >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >>> > > _______________________________________________ >> >>> > > Pytables-users mailing list >> >>> > > Pyt...@li... >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > > >> >>> > > >> >>> > -------------- next part -------------- >> >>> > An HTML attachment was scrubbed... >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > http://p.sf.net/sfu/learnmore_122712 >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > _______________________________________________ >> >>> > Pytables-users mailing list >> >>> > Pyt...@li... >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >>> > >> >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> >>> > ********************************************* >> >>> > >> >>> -------------- next part -------------- >> >>> An HTML attachment was scrubbed... >> >>> >> >>> ------------------------------ >> >>> >> >>> Message: 2 >> >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> >>> From: David Reed <dav...@gm...> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> >>> To: pyt...@li... >> >>> Message-ID: >> >>> < >> >>> CAM...@ma...> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> >> >>> Thanks a lot for the help so far guys! >> >>> >> >>> Looking at itertools, I found what I believe to be the perfect >> function >> >>> for >> >>> what I need, itertools.combinations. This appears to be a valid >> >>> replacement >> >>> to the method proposed. >> >>> >> >>> There is a small problem that I didn't mention is that my compare >> >>> function >> >>> actually takes as inputs 2 columns from the table. Like so: >> >>> >> >>> D = np.empty((N_irises, N_irises)) >> >>> for ii in xrange(N_elements): >> >>> for jj in xrange(ii+1, N_elements): >> >>> D[ii, jj] = compare(data['element1'][ii], >> >>> data['element1'][jj],data['element2'][ii], >> >>> data['element2'][jj]) >> >>> >> >>> Is there an efficient way of using itertools with this structure? >> >>> >> >>> >> >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> >>> pyt...@li...> wrote: >> >>> >> >>> > Send Pytables-users mailing list submissions to >> >>> > pyt...@li... >> >>> > >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > or, via email, send a message with subject or body 'help' to >> >>> > pyt...@li... >> >>> > >> >>> > You can reach the person managing the list at >> >>> > pyt...@li... >> >>> > >> >>> > When replying, please edit your Subject line so it is more specific >> >>> > than "Re: Contents of Pytables-users digest..." >> >>> > >> >>> > >> >>> > Today's Topics: >> >>> > >> >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >>> > >> >>> > >> >>> > >> ---------------------------------------------------------------------- >> >>> > >> >>> > Message: 1 >> >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >>> > From: Josh Ayers <jos...@gm...> >> >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> PyTables >> >>> > To: Discussion list for PyTables >> >>> > <pyt...@li...> >> >>> > Message-ID: >> >>> > < >> >>> > CAC...@ma...> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >>> > >> >>> > David, >> >>> > >> >>> > The change in issue 27 was only for iteration over a tables.Column >> >>> > instance. To use it, tweak Anthony's code as follows. This will >> >>> iterate >> >>> > over the "element" column, as in your original example. >> >>> > >> >>> > Note also that this will only work with the development version of >> >>> PyTables >> >>> > available on github. It will be very slow using the released >> v2.4.0. >> >>> > >> >>> > >> >>> > from itertools import izip >> >>> > >> >>> > with tb.openFile(...) as f: >> >>> > data = f.root.data.cols.element >> >>> > data_i = iter(data) >> >>> > data_j = iter(data) >> >>> > data_i.next() # throw the first value away >> >>> > for i, j in izip(data_i, data_j): >> >>> > compare(i, j) >> >>> > >> >>> > >> >>> > Hope that helps, >> >>> > Josh >> >>> > >> >>> > >> >>> > >> >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> >>> wrote: >> >>> > >> >>> > > HI David, >> >>> > > >> >>> > > Tables and table column iteration have been overhauled fairly >> >>> recently >> >>> > > [1]. So you might try creating two iterators, offset by one, and >> >>> then >> >>> > > doing the comparison. I am hacking this out super quick so please >> >>> > forgive >> >>> > > me: >> >>> > > >> >>> > > from itertools import izip >> >>> > > >> >>> > > with tb.openFile(...) as f: >> >>> > > data = f.root.data >> >>> > > data_i = iter(data) >> >>> > > data_j = iter(data) >> >>> > > data_i.next() # throw the first value away >> >>> > > for i, j in izip(data_i, data_j): >> >>> > > compare(i, j) >> >>> > > >> >>> > > You get the idea ;) >> >>> > > >> >>> > > Be Well >> >>> > > Anthony >> >>> > > >> >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >> >>> > > >> >>> > > >> >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> dav...@gm...> >> >>> > wrote: >> >>> > > >> >>> > >> I was hoping someone could help me out here. >> >>> > >> >> >>> > >> This is from a post I put up on StackOverflow, >> >>> > >> >> >>> > >> I am have a fairly large dataset that I store in HDF5 and access >> >>> using >> >>> > >> PyTables. One operation I need to do on this dataset are pairwise >> >>> > >> comparisons between each of the elements. This requires 2 loops, >> >>> one to >> >>> > >> iterate over each element, and an inner loop to iterate over >> every >> >>> other >> >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. >> >>> > >> >> >>> > >> For fairly small sets I found it to be faster to dump the >> contents >> >>> into >> >>> > a >> >>> > >> multdimensional numpy array and then do my iteration. I run into >> >>> > problems >> >>> > >> with large sets because of memory issues and need to access each >> >>> > element of >> >>> > >> the dataset at run time. >> >>> > >> >> >>> > >> Putting the elements into an array gives me about 600 comparisons >> >>> per >> >>> > >> second, while operating on hdf5 data itself gives me about 300 >> >>> > comparisons >> >>> > >> per second. >> >>> > >> >> >>> > >> Is there a way to speed this process up? >> >>> > >> >> >>> > >> Example follows (this is not my real code, just an example): >> >>> > >> >> >>> > >> *Small Set*: >> >>> > >> >> >>> > >> >> >>> > >> with tb.openFile(h5_file, 'r') as f: >> >>> > >> data = f.root.data >> >>> > >> >> >>> > >> N_elements = len(data) >> >>> > >> elements = np.empty((N_irises, 1e5)) >> >>> > >> >> >>> > >> for ii, d in enumerate(data): >> >>> > >> elements[ii] = data['element'] >> >>> > >> >> >>> > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> >>> > >> for jj in xrange(ii+1, N_elements): >> >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >>> > >> >> >>> > >> *Large Set*: >> >>> > >> >> >>> > >> >> >>> > >> with tb.openFile(h5_file, 'r') as f: >> >>> > >> data = f.root.data >> >>> > >> >> >>> > >> N_elements = len(data) >> >>> > >> >> >>> > >> D = np.empty((N_irises, N_irises)) >> >>> > >> for ii in xrange(N_elements): >> >>> > >> for jj in xrange(ii+1, N_elements): >> >>> > >> D[ii, jj] = compare(data['element'][ii], >> >>> > data['element'][jj]) >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> >>> CSS, >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >>> > >> _______________________________________________ >> >>> > >> Pytables-users mailing list >> >>> > >> Pyt...@li... >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >> >>> > >> >> >>> > > >> >>> > > >> >>> > > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >>> > > _______________________________________________ >> >>> > > Pytables-users mailing list >> >>> > > Pyt...@li... >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > > >> >>> > > >> >>> > -------------- next part -------------- >> >>> > An HTML attachment was scrubbed... >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > http://p.sf.net/sfu/learnmore_122712 >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > _______________________________________________ >> >>> > Pytables-users mailing list >> >>> > Pyt...@li... >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >>> > >> >>> > End of Pytables-users Digest, Vol 80, Issue 3 >> >>> > ********************************************* >> >>> > >> >>> -------------- next part -------------- >> >>> An HTML attachment was scrubbed... >> >>> >> >>> ------------------------------ >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> MVPs and experts. ON SALE this month only -- learn more at: >> >>> http://p.sf.net/sfu/learnmore_122712 >> >>> >> >>> ------------------------------ >> >>> >> >>> _______________________________________________ >> >>> Pytables-users mailing list >> >>> Pyt...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> >> >>> >> >>> End of Pytables-users Digest, Vol 80, Issue 4 >> >>> ********************************************* >> >>> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> http://p.sf.net/sfu/learnmore_122712 >> >> _______________________________________________ >> >> Pytables-users mailing list >> >> Pyt...@li... >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 80, Issue 8 >> ********************************************* >> > > > > ------------------------------------------------------------------------------ > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > much more. Get web development skills now with LearnDevNow - > 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. > SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122812 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |