From: David R. <dav...@gm...> - 2013-01-30 22:00:53
|
I think I have to reopen this issue. I have been running fine for awhile using the combinations method from itertools, but have recently run into a memory since I have recently quadrupled the size of the hdf file. Here is my code again: from itertools import combinations, izip with tb.openFile(h5_all, 'r') as f: irises = f.root.irises templates = f.root.irises.cols.templates masks = f.root.irises.cols.masks1 N_irises = len(irises) index = np.ones((20 * 480), np.bool) print '%i Comparisons' % (N_irises*(N_irises - 1)/2) D = np.empty((N_irises, N_irises)) for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, range(N_irises)), 2): # print ii D[ii, jj] = ham_dist( t1[8, index], t2[:, index], m1[8, index], m2[:, index], ) And here is the error: In [10]: get_hd3() 10669890 Comparisons --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) <ipython-input-10-cfb255ce7bd1> in <module>() ----> 1 get_hd3() 118 print '%i Comparisons' % (N_irises*(N_irises - 1)/2) 119 D = np.empty((N_irises, N_irises)) --> 120 for (t1, m1, ii), (t2, m2, jj) in combinations(izip(temp lates, masks, range(N_irises)), 2): 121 # print ii 122 D[ii, jj] = ham_dist( c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) 3274 for start_row in xrange(0, len(self), nrowsinbuf): 3275 end_row = min([start_row + nrowsinbuf, max_row]) -> 3276 buf = table.read(start_row, end_row, 1, field=self.pathname) 3277 for row in buf: 3278 yield row c:\python27\lib\site-packages\tables\table.pyc in read(self, start, stop, step, field) 1772 (start, stop, step) = self._processRangeRead(start, stop, step) 1773 -> 1774 arr = self._read(start, stop, step, field) 1775 return internal_to_flavor(arr, self.flavor) 1776 c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, stop, step, field) 1719 if field: 1720 # Create a container for the results -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) 1722 else: 1723 # Recarray case MemoryError: > c:\python27\lib\site-packages\tables\table.py(1721)_read() 1720 # Create a container for the results -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) 1722 else: Also, if you guys see any performance problems in my code, please let me know. Thank you so much for the help. -Dave On Fri, Jan 4, 2013 at 8:57 AM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 4 Jan 2013 08:56:28 -0500 > From: David Reed <dav...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8 > To: pyt...@li... > Message-ID: > < > CAM...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > I can't thank you guys enough for the help. I was able to add the __iter__ > function to the table.py file and everything seems to be working great! > I'm not quite as fast as I was with iterating right of a matrix but pretty > close. I was at 555 comparisons per second, and now im at 420. > > I handled the problem I mentioned earlier by doing this, and it seems to > work great: > > A = f.root.data.cols.A > B = f.root.data.cols.B > > D = np.empty((len(A), len(A)) > for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), > 2): > D[ii, jj] = compare(a1, a2, b1, b2) > > Again, thanks a lot. > > -Dave > > > On Thu, Jan 3, 2013 at 6:31 PM, < > pyt...@li...> wrote: > > > Send Pytables-users mailing list submissions to > > pyt...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > or, via email, send a message with subject or body 'help' to > > pyt...@li... > > > > You can reach the person managing the list at > > pyt...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Pytables-users digest..." > > > > > > Today's Topics: > > > > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) > > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 3 Jan 2013 17:26:55 -0600 > > From: Anthony Scopatz <sc...@gm...> > > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > > Gz...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> > wrote: > > > > > Thanks a lot for the help so far guys! > > > > > > Looking at itertools, I found what I believe to be the perfect function > > > for what I need, itertools.combinations. This appears to be a valid > > > replacement to the method proposed. > > > > > > > Yes, combinations is awesome! > > > > > > > > > > There is a small problem that I didn't mention is that my compare > > function > > > actually takes as inputs 2 columns from the table. Like so: > > > > > > D = np.empty((N_irises, N_irises)) > > > for ii in xrange(N_elements): > > > for jj in xrange(ii+1, N_elements): > > > D[ii, jj] = compare(data['element1'][ii], > > data['element1'][jj],data['element2'][ii], > > > data['element2'][jj]) > > > > > > Is there an efficient way of using itertools with this structure? > > > > > > > You can always make two other iterators for each column. Since you have > > two columns you would have 4 iterators. I am not sure how fast this is > > going to be but I am confident that there is definitely a way to do this > in > > one for-loop, which is going to be way faster than nested loops. > > > > Be Well > > Anthony > > > > > > > > > > > > > On Thu, Jan 3, 2013 at 1:29 PM, < > > > pyt...@li...> wrote: > > > > > >> Send Pytables-users mailing list submissions to > > >> pyt...@li... > > >> > > >> To subscribe or unsubscribe via the World Wide Web, visit > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> or, via email, send a message with subject or body 'help' to > > >> pyt...@li... > > >> > > >> You can reach the person managing the list at > > >> pyt...@li... > > >> > > >> When replying, please edit your Subject line so it is more specific > > >> than "Re: Contents of Pytables-users digest..." > > >> > > >> > > >> Today's Topics: > > >> > > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > > >> > > >> > > >> ---------------------------------------------------------------------- > > >> > > >> Message: 1 > > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > > >> From: Josh Ayers <jos...@gm...> > > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > > >> To: Discussion list for PyTables > > >> <pyt...@li...> > > >> Message-ID: > > >> < > > >> CAC...@ma...> > > >> Content-Type: text/plain; charset="iso-8859-1" > > >> > > >> David, > > >> > > >> The change in issue 27 was only for iteration over a tables.Column > > >> instance. To use it, tweak Anthony's code as follows. This will > > iterate > > >> over the "element" column, as in your original example. > > >> > > >> Note also that this will only work with the development version of > > >> PyTables > > >> available on github. It will be very slow using the released v2.4.0. > > >> > > >> > > >> from itertools import izip > > >> > > >> with tb.openFile(...) as f: > > >> data = f.root.data.cols.element > > >> data_i = iter(data) > > >> data_j = iter(data) > > >> data_i.next() # throw the first value away > > >> for i, j in izip(data_i, data_j): > > >> compare(i, j) > > >> > > >> > > >> Hope that helps, > > >> Josh > > >> > > >> > > >> > > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> > > >> wrote: > > >> > > >> > HI David, > > >> > > > >> > Tables and table column iteration have been overhauled fairly > recently > > >> > [1]. So you might try creating two iterators, offset by one, and > then > > >> > doing the comparison. I am hacking this out super quick so please > > >> forgive > > >> > me: > > >> > > > >> > from itertools import izip > > >> > > > >> > with tb.openFile(...) as f: > > >> > data = f.root.data > > >> > data_i = iter(data) > > >> > data_j = iter(data) > > >> > data_i.next() # throw the first value away > > >> > for i, j in izip(data_i, data_j): > > >> > compare(i, j) > > >> > > > >> > You get the idea ;) > > >> > > > >> > Be Well > > >> > Anthony > > >> > > > >> > 1. https://github.com/PyTables/PyTables/issues/27 > > >> > > > >> > > > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > > >> wrote: > > >> > > > >> >> I was hoping someone could help me out here. > > >> >> > > >> >> This is from a post I put up on StackOverflow, > > >> >> > > >> >> I am have a fairly large dataset that I store in HDF5 and access > > using > > >> >> PyTables. One operation I need to do on this dataset are pairwise > > >> >> comparisons between each of the elements. This requires 2 loops, > one > > to > > >> >> iterate over each element, and an inner loop to iterate over every > > >> other > > >> >> element. This operation thus looks at N(N-1)/2 comparisons. > > >> >> > > >> >> For fairly small sets I found it to be faster to dump the contents > > >> into a > > >> >> multdimensional numpy array and then do my iteration. I run into > > >> problems > > >> >> with large sets because of memory issues and need to access each > > >> element of > > >> >> the dataset at run time. > > >> >> > > >> >> Putting the elements into an array gives me about 600 comparisons > per > > >> >> second, while operating on hdf5 data itself gives me about 300 > > >> comparisons > > >> >> per second. > > >> >> > > >> >> Is there a way to speed this process up? > > >> >> > > >> >> Example follows (this is not my real code, just an example): > > >> >> > > >> >> *Small Set*: > > >> >> > > >> >> > > >> >> with tb.openFile(h5_file, 'r') as f: > > >> >> data = f.root.data > > >> >> > > >> >> N_elements = len(data) > > >> >> elements = np.empty((N_irises, 1e5)) > > >> >> > > >> >> for ii, d in enumerate(data): > > >> >> elements[ii] = data['element'] > > >> >> > > >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > >> >> for jj in xrange(ii+1, N_elements): > > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > > >> >> > > >> >> *Large Set*: > > >> >> > > >> >> > > >> >> with tb.openFile(h5_file, 'r') as f: > > >> >> data = f.root.data > > >> >> > > >> >> N_elements = len(data) > > >> >> > > >> >> D = np.empty((N_irises, N_irises)) > > >> >> for ii in xrange(N_elements): > > >> >> for jj in xrange(ii+1, N_elements): > > >> >> D[ii, jj] = compare(data['element'][ii], > > >> data['element'][jj]) > > >> >> > > >> >> > > >> >> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> >> MVPs and experts. ON SALE this month only -- learn more at: > > >> >> http://p.sf.net/sfu/learnmore_122712 > > >> >> _______________________________________________ > > >> >> Pytables-users mailing list > > >> >> Pyt...@li... > > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> > > >> >> > > >> > > > >> > > > >> > > > >> > > > ------------------------------------------------------------------------------ > > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> > MVPs and experts. ON SALE this month only -- learn more at: > > >> > http://p.sf.net/sfu/learnmore_122712 > > >> > _______________________________________________ > > >> > Pytables-users mailing list > > >> > Pyt...@li... > > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> > > > >> > > > >> -------------- next part -------------- > > >> An HTML attachment was scrubbed... > > >> > > >> ------------------------------ > > >> > > >> > > >> > > > ------------------------------------------------------------------------------ > > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> MVPs and experts. ON SALE this month only -- learn more at: > > >> http://p.sf.net/sfu/learnmore_122712 > > >> > > >> ------------------------------ > > >> > > >> _______________________________________________ > > >> Pytables-users mailing list > > >> Pyt...@li... > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> > > >> > > >> End of Pytables-users Digest, Vol 80, Issue 3 > > >> ********************************************* > > >> > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > > MVPs and experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Pytables-users mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 3 Jan 2013 17:30:59 -0600 > > From: Anthony Scopatz <sc...@gm...> > > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > < > > CAP...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > Josh is right that you can just edit the code by hand (which works but > > sucks). > > > > However, on Windows -- on the rare occasion when I also have to develop > on > > it -- I typically use a distribution that includes a compiler, cython, > > hdf5, and pytables already and then I install my development version from > > github OVER this. I recommend either EPD or Anaconda, though other > > distributions listed here [1] might also work. > > > > Be well > > Anthony > > > > 1. http://numfocus.org/projects-2/software-distributions/ > > > > > > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote: > > > > > The change was in pure Python code, so you should be able to just paste > > in > > > the changes to your local copy. Start with the table.Column.__iter__ > > > method (lines 3296-3310) here. > > > > > > > > > > > > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > > > > > > It needs to be modified slightly because it uses some additional > features > > > that aren't available in the released version (the out=buf_slice > argument > > > to table.read). The following should work. > > > > > > def __iter__(self): > > > table = self.table > > > itemsize = self.dtype.itemsize > > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize > > > max_row = len(self) > > > for start_row in xrange(0, len(self), nrowsinbuf): > > > end_row = min([start_row + nrowsinbuf, max_row]) > > > buf = table.read(start_row, end_row, 1, > field=self.pathname) > > > for row in buf: > > > yield row > > > > > > > > > I haven't tested this, but I think it will work. > > > > > > Josh > > > > > > > > > > > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> > > wrote: > > > > > >> I apologize if I'm starting to sound helpless, but I'm forced to work > on > > >> Windows 7 at work and have never had luck compiling python source > > >> successfully. I have had to rely on precompiled binaries and now its > > >> biting me in the butt. > > >> > > >> Is there any quick fix I can do to improve this iteration using > v2.4.0? > > >> > > >> > > >> On Thu, Jan 3, 2013 at 3:17 PM, < > > >> pyt...@li...> wrote: > > >> > > >>> Send Pytables-users mailing list submissions to > > >>> pyt...@li... > > >>> > > >>> To subscribe or unsubscribe via the World Wide Web, visit > > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> or, via email, send a message with subject or body 'help' to > > >>> pyt...@li... > > >>> > > >>> You can reach the person managing the list at > > >>> pyt...@li... > > >>> > > >>> When replying, please edit your Subject line so it is more specific > > >>> than "Re: Contents of Pytables-users digest..." > > >>> > > >>> > > >>> Today's Topics: > > >>> > > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) > > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) > > >>> > > >>> > > >>> > ---------------------------------------------------------------------- > > >>> > > >>> Message: 1 > > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > > >>> From: David Reed <dav...@gm...> > > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 > > >>> To: pyt...@li... > > >>> Message-ID: > > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > > >>> ev...@ma...> > > >>> Content-Type: text/plain; charset="iso-8859-1" > > >>> > > >>> Thanks Anthony, but unless Im missing something I don't think that > > method > > >>> will work since this will only be comparing the ith element with > ith+1 > > >>> element. I still need 2 for loops right? > > >>> > > >>> Using itertools might speed things up though, I've never used them > so I > > >>> will give it a shot and let you know how it goes. Looks like I need > to > > >>> download the latest release before I do that too. Thanks for the > help. > > >>> > > >>> -Dave > > >>> > > >>> > > >>> > > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > > >>> pyt...@li...> wrote: > > >>> > > >>> > Send Pytables-users mailing list submissions to > > >>> > pyt...@li... > > >>> > > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > or, via email, send a message with subject or body 'help' to > > >>> > pyt...@li... > > >>> > > > >>> > You can reach the person managing the list at > > >>> > pyt...@li... > > >>> > > > >>> > When replying, please edit your Subject line so it is more specific > > >>> > than "Re: Contents of Pytables-users digest..." > > >>> > > > >>> > > > >>> > Today's Topics: > > >>> > > > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) > > >>> > > > >>> > > > >>> > > > ---------------------------------------------------------------------- > > >>> > > > >>> > Message: 1 > > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > > >>> > From: Anthony Scopatz <sc...@gm...> > > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > PyTables > > >>> > To: Discussion list for PyTables > > >>> > <pyt...@li...> > > >>> > Message-ID: > > >>> > <CAPk-6T5b= > > >>> > 1EG...@ma...> > > >>> > Content-Type: text/plain; charset="iso-8859-1" > > >>> > > > >>> > HI David, > > >>> > > > >>> > Tables and table column iteration have been overhauled fairly > > recently > > >>> [1]. > > >>> > So you might try creating two iterators, offset by one, and then > > >>> doing the > > >>> > comparison. I am hacking this out super quick so please forgive > me: > > >>> > > > >>> > from itertools import izip > > >>> > > > >>> > with tb.openFile(...) as f: > > >>> > data = f.root.data > > >>> > data_i = iter(data) > > >>> > data_j = iter(data) > > >>> > data_i.next() # throw the first value away > > >>> > for i, j in izip(data_i, data_j): > > >>> > compare(i, j) > > >>> > > > >>> > You get the idea ;) > > >>> > > > >>> > Be Well > > >>> > Anthony > > >>> > > > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > > >>> > > > >>> > > > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm... > > > > >>> wrote: > > >>> > > > >>> > > I was hoping someone could help me out here. > > >>> > > > > >>> > > This is from a post I put up on StackOverflow, > > >>> > > > > >>> > > I am have a fairly large dataset that I store in HDF5 and access > > >>> using > > >>> > > PyTables. One operation I need to do on this dataset are pairwise > > >>> > > comparisons between each of the elements. This requires 2 loops, > > one > > >>> to > > >>> > > iterate over each element, and an inner loop to iterate over > every > > >>> other > > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. > > >>> > > > > >>> > > For fairly small sets I found it to be faster to dump the > contents > > >>> into a > > >>> > > multdimensional numpy array and then do my iteration. I run into > > >>> problems > > >>> > > with large sets because of memory issues and need to access each > > >>> element > > >>> > of > > >>> > > the dataset at run time. > > >>> > > > > >>> > > Putting the elements into an array gives me about 600 comparisons > > per > > >>> > > second, while operating on hdf5 data itself gives me about 300 > > >>> > comparisons > > >>> > > per second. > > >>> > > > > >>> > > Is there a way to speed this process up? > > >>> > > > > >>> > > Example follows (this is not my real code, just an example): > > >>> > > > > >>> > > *Small Set*: > > >>> > > > > >>> > > > > >>> > > with tb.openFile(h5_file, 'r') as f: > > >>> > > data = f.root.data > > >>> > > > > >>> > > N_elements = len(data) > > >>> > > elements = np.empty((N_irises, 1e5)) > > >>> > > > > >>> > > for ii, d in enumerate(data): > > >>> > > elements[ii] = data['element'] > > >>> > > > > >>> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > >>> > > for jj in xrange(ii+1, N_elements): > > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) > > >>> > > > > >>> > > *Large Set*: > > >>> > > > > >>> > > > > >>> > > with tb.openFile(h5_file, 'r') as f: > > >>> > > data = f.root.data > > >>> > > > > >>> > > N_elements = len(data) > > >>> > > > > >>> > > D = np.empty((N_irises, N_irises)) > > >>> > > for ii in xrange(N_elements): > > >>> > > for jj in xrange(ii+1, N_elements): > > >>> > > D[ii, jj] = compare(data['element'][ii], > > >>> > data['element'][jj]) > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > > CSS, > > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > >>> current > > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > > >>> > > http://p.sf.net/sfu/learnmore_122712 > > >>> > > _______________________________________________ > > >>> > > Pytables-users mailing list > > >>> > > Pyt...@li... > > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > > > >>> > > > > >>> > -------------- next part -------------- > > >>> > An HTML attachment was scrubbed... > > >>> > > > >>> > ------------------------------ > > >>> > > > >>> > > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >>> > MVPs and experts. ON SALE this month only -- learn more at: > > >>> > http://p.sf.net/sfu/learnmore_122712 > > >>> > > > >>> > ------------------------------ > > >>> > > > >>> > _______________________________________________ > > >>> > Pytables-users mailing list > > >>> > Pyt...@li... > > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > > >>> > > > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > > >>> > ********************************************* > > >>> > > > >>> -------------- next part -------------- > > >>> An HTML attachment was scrubbed... > > >>> > > >>> ------------------------------ > > >>> > > >>> Message: 2 > > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > > >>> From: David Reed <dav...@gm...> > > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > > >>> To: pyt...@li... > > >>> Message-ID: > > >>> < > > >>> CAM...@ma...> > > >>> Content-Type: text/plain; charset="iso-8859-1" > > >>> > > >>> Thanks a lot for the help so far guys! > > >>> > > >>> Looking at itertools, I found what I believe to be the perfect > function > > >>> for > > >>> what I need, itertools.combinations. This appears to be a valid > > >>> replacement > > >>> to the method proposed. > > >>> > > >>> There is a small problem that I didn't mention is that my compare > > >>> function > > >>> actually takes as inputs 2 columns from the table. Like so: > > >>> > > >>> D = np.empty((N_irises, N_irises)) > > >>> for ii in xrange(N_elements): > > >>> for jj in xrange(ii+1, N_elements): > > >>> D[ii, jj] = compare(data['element1'][ii], > > >>> data['element1'][jj],data['element2'][ii], > > >>> data['element2'][jj]) > > >>> > > >>> Is there an efficient way of using itertools with this structure? > > >>> > > >>> > > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > > >>> pyt...@li...> wrote: > > >>> > > >>> > Send Pytables-users mailing list submissions to > > >>> > pyt...@li... > > >>> > > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > or, via email, send a message with subject or body 'help' to > > >>> > pyt...@li... > > >>> > > > >>> > You can reach the person managing the list at > > >>> > pyt...@li... > > >>> > > > >>> > When replying, please edit your Subject line so it is more specific > > >>> > than "Re: Contents of Pytables-users digest..." > > >>> > > > >>> > > > >>> > Today's Topics: > > >>> > > > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > > >>> > > > >>> > > > >>> > > > ---------------------------------------------------------------------- > > >>> > > > >>> > Message: 1 > > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > > >>> > From: Josh Ayers <jos...@gm...> > > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > PyTables > > >>> > To: Discussion list for PyTables > > >>> > <pyt...@li...> > > >>> > Message-ID: > > >>> > < > > >>> > CAC...@ma... > > > > >>> > Content-Type: text/plain; charset="iso-8859-1" > > >>> > > > >>> > David, > > >>> > > > >>> > The change in issue 27 was only for iteration over a tables.Column > > >>> > instance. To use it, tweak Anthony's code as follows. This will > > >>> iterate > > >>> > over the "element" column, as in your original example. > > >>> > > > >>> > Note also that this will only work with the development version of > > >>> PyTables > > >>> > available on github. It will be very slow using the released > v2.4.0. > > >>> > > > >>> > > > >>> > from itertools import izip > > >>> > > > >>> > with tb.openFile(...) as f: > > >>> > data = f.root.data.cols.element > > >>> > data_i = iter(data) > > >>> > data_j = iter(data) > > >>> > data_i.next() # throw the first value away > > >>> > for i, j in izip(data_i, data_j): > > >>> > compare(i, j) > > >>> > > > >>> > > > >>> > Hope that helps, > > >>> > Josh > > >>> > > > >>> > > > >>> > > > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm... > > > > >>> wrote: > > >>> > > > >>> > > HI David, > > >>> > > > > >>> > > Tables and table column iteration have been overhauled fairly > > >>> recently > > >>> > > [1]. So you might try creating two iterators, offset by one, and > > >>> then > > >>> > > doing the comparison. I am hacking this out super quick so > please > > >>> > forgive > > >>> > > me: > > >>> > > > > >>> > > from itertools import izip > > >>> > > > > >>> > > with tb.openFile(...) as f: > > >>> > > data = f.root.data > > >>> > > data_i = iter(data) > > >>> > > data_j = iter(data) > > >>> > > data_i.next() # throw the first value away > > >>> > > for i, j in izip(data_i, data_j): > > >>> > > compare(i, j) > > >>> > > > > >>> > > You get the idea ;) > > >>> > > > > >>> > > Be Well > > >>> > > Anthony > > >>> > > > > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > > >>> > > > > >>> > > > > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > dav...@gm... > > > > > >>> > wrote: > > >>> > > > > >>> > >> I was hoping someone could help me out here. > > >>> > >> > > >>> > >> This is from a post I put up on StackOverflow, > > >>> > >> > > >>> > >> I am have a fairly large dataset that I store in HDF5 and access > > >>> using > > >>> > >> PyTables. One operation I need to do on this dataset are > pairwise > > >>> > >> comparisons between each of the elements. This requires 2 loops, > > >>> one to > > >>> > >> iterate over each element, and an inner loop to iterate over > every > > >>> other > > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. > > >>> > >> > > >>> > >> For fairly small sets I found it to be faster to dump the > contents > > >>> into > > >>> > a > > >>> > >> multdimensional numpy array and then do my iteration. I run into > > >>> > problems > > >>> > >> with large sets because of memory issues and need to access each > > >>> > element of > > >>> > >> the dataset at run time. > > >>> > >> > > >>> > >> Putting the elements into an array gives me about 600 > comparisons > > >>> per > > >>> > >> second, while operating on hdf5 data itself gives me about 300 > > >>> > comparisons > > >>> > >> per second. > > >>> > >> > > >>> > >> Is there a way to speed this process up? > > >>> > >> > > >>> > >> Example follows (this is not my real code, just an example): > > >>> > >> > > >>> > >> *Small Set*: > > >>> > >> > > >>> > >> > > >>> > >> with tb.openFile(h5_file, 'r') as f: > > >>> > >> data = f.root.data > > >>> > >> > > >>> > >> N_elements = len(data) > > >>> > >> elements = np.empty((N_irises, 1e5)) > > >>> > >> > > >>> > >> for ii, d in enumerate(data): > > >>> > >> elements[ii] = data['element'] > > >>> > >> > > >>> > >> D = np.empty((N_irises, N_irises)) for ii in > xrange(N_elements): > > >>> > >> for jj in xrange(ii+1, N_elements): > > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) > > >>> > >> > > >>> > >> *Large Set*: > > >>> > >> > > >>> > >> > > >>> > >> with tb.openFile(h5_file, 'r') as f: > > >>> > >> data = f.root.data > > >>> > >> > > >>> > >> N_elements = len(data) > > >>> > >> > > >>> > >> D = np.empty((N_irises, N_irises)) > > >>> > >> for ii in xrange(N_elements): > > >>> > >> for jj in xrange(ii+1, N_elements): > > >>> > >> D[ii, jj] = compare(data['element'][ii], > > >>> > data['element'][jj]) > > >>> > >> > > >>> > >> > > >>> > >> > > >>> > >> > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > > >>> CSS, > > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > >>> current > > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > > >>> > >> http://p.sf.net/sfu/learnmore_122712 > > >>> > >> _______________________________________________ > > >>> > >> Pytables-users mailing list > > >>> > >> Pyt...@li... > > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > >> > > >>> > >> > > >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > > CSS, > > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > >>> current > > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > > >>> > > http://p.sf.net/sfu/learnmore_122712 > > >>> > > _______________________________________________ > > >>> > > Pytables-users mailing list > > >>> > > Pyt...@li... > > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > > > >>> > > > > >>> > -------------- next part -------------- > > >>> > An HTML attachment was scrubbed... > > >>> > > > >>> > ------------------------------ > > >>> > > > >>> > > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >>> > MVPs and experts. ON SALE this month only -- learn more at: > > >>> > http://p.sf.net/sfu/learnmore_122712 > > >>> > > > >>> > ------------------------------ > > >>> > > > >>> > _______________________________________________ > > >>> > Pytables-users mailing list > > >>> > Pyt...@li... > > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > > >>> > > > >>> > End of Pytables-users Digest, Vol 80, Issue 3 > > >>> > ********************************************* > > >>> > > > >>> -------------- next part -------------- > > >>> An HTML attachment was scrubbed... > > >>> > > >>> ------------------------------ > > >>> > > >>> > > >>> > > > ------------------------------------------------------------------------------ > > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >>> MVPs and experts. ON SALE this month only -- learn more at: > > >>> http://p.sf.net/sfu/learnmore_122712 > > >>> > > >>> ------------------------------ > > >>> > > >>> _______________________________________________ > > >>> Pytables-users mailing list > > >>> Pyt...@li... > > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > >>> > > >>> End of Pytables-users Digest, Vol 80, Issue 4 > > >>> ********************************************* > > >>> > > >> > > >> > > >> > > >> > > > ------------------------------------------------------------------------------ > > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> MVPs and experts. ON SALE this month only -- learn more at: > > >> http://p.sf.net/sfu/learnmore_122712 > > >> _______________________________________________ > > >> Pytables-users mailing list > > >> Pyt...@li... > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> > > >> > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > > MVPs and experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Pytables-users mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > > > ------------------------------ > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > End of Pytables-users Digest, Vol 80, Issue 8 > > ********************************************* > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > much more. Get web development skills now with LearnDevNow - > 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. > SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122812 > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 80, Issue 9 > ********************************************* > |