From: David R. <dav...@gm...> - 2013-02-01 12:16:04
|
I'm still having problems with this one. I can't tell if this something dumb Im doing with itertools, or if its something in pytables. Would appreciate any help. Thanks On Wed, Jan 30, 2013 at 5:00 PM, David Reed <dav...@gm...> wrote: > I think I have to reopen this issue. I have been running fine for awhile > using the combinations method from itertools, but have recently run into a > memory since I have recently quadrupled the size of the hdf file. > > Here is my code again: > > from itertools import combinations, izip > with tb.openFile(h5_all, 'r') as f: > irises = f.root.irises > > templates = f.root.irises.cols.templates > masks = f.root.irises.cols.masks1 > > N_irises = len(irises) > index = np.ones((20 * 480), np.bool) > > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > D = np.empty((N_irises, N_irises)) > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > range(N_irises)), 2): > # print ii > D[ii, jj] = ham_dist( > t1[8, index], > t2[:, index], > m1[8, index], > m2[:, index], > ) > > And here is the error: > > In [10]: get_hd3() > 10669890 Comparisons > --------------------------------------------------------------------------- > MemoryError Traceback (most recent call last) > <ipython-input-10-cfb255ce7bd1> in <module>() > ----> 1 get_hd3() > > > 118 print '%i Comparisons' % (N_irises*(N_irises - > 1)/2) > 119 D = np.empty((N_irises, N_irises)) > --> 120 for (t1, m1, ii), (t2, m2, jj) in > combinations(izip(temp > lates, masks, range(N_irises)), 2): > 121 # print ii > 122 D[ii, jj] = ham_dist( > > c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) > 3274 for start_row in xrange(0, len(self), nrowsinbuf): > 3275 end_row = min([start_row + nrowsinbuf, max_row]) > -> 3276 buf = table.read(start_row, end_row, 1, > field=self.pathname) > > 3277 for row in buf: > 3278 yield row > > c:\python27\lib\site-packages\tables\table.pyc in read(self, start, stop, > step, > field) > 1772 (start, stop, step) = self._processRangeRead(start, stop, > step) > 1773 > -> 1774 arr = self._read(start, stop, step, field) > 1775 return internal_to_flavor(arr, self.flavor) > 1776 > > c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, stop, > step, > field) > 1719 if field: > 1720 # Create a container for the results > -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) > 1722 else: > 1723 # Recarray case > > MemoryError: > > c:\python27\lib\site-packages\tables\table.py(1721)_read() > 1720 # Create a container for the results > -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) > 1722 else: > > Also, if you guys see any performance problems in my code, please let me > know. > > Thank you so much for the help. > > -Dave > > > On Fri, Jan 4, 2013 at 8:57 AM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> From: David Reed <dav...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8 >> To: pyt...@li... >> Message-ID: >> < >> CAM...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> I can't thank you guys enough for the help. I was able to add the >> __iter__ >> function to the table.py file and everything seems to be working great! >> I'm not quite as fast as I was with iterating right of a matrix but >> pretty >> close. I was at 555 comparisons per second, and now im at 420. >> >> I handled the problem I mentioned earlier by doing this, and it seems to >> work great: >> >> A = f.root.data.cols.A >> B = f.root.data.cols.B >> >> D = np.empty((len(A), len(A)) >> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), >> 2): >> D[ii, jj] = compare(a1, a2, b1, b2) >> >> Again, thanks a lot. >> >> -Dave >> >> >> On Thu, Jan 3, 2013 at 6:31 PM, < >> pyt...@li...> wrote: >> >> > Send Pytables-users mailing list submissions to >> > pyt...@li... >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > or, via email, send a message with subject or body 'help' to >> > pyt...@li... >> > >> > You can reach the person managing the list at >> > pyt...@li... >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Pytables-users digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) >> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> > From: Anthony Scopatz <sc...@gm...> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> > Gz...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> >> wrote: >> > >> > > Thanks a lot for the help so far guys! >> > > >> > > Looking at itertools, I found what I believe to be the perfect >> function >> > > for what I need, itertools.combinations. This appears to be a valid >> > > replacement to the method proposed. >> > > >> > >> > Yes, combinations is awesome! >> > >> > >> > > >> > > There is a small problem that I didn't mention is that my compare >> > function >> > > actually takes as inputs 2 columns from the table. Like so: >> > > >> > > D = np.empty((N_irises, N_irises)) >> > > for ii in xrange(N_elements): >> > > for jj in xrange(ii+1, N_elements): >> > > D[ii, jj] = compare(data['element1'][ii], >> > data['element1'][jj],data['element2'][ii], >> > > data['element2'][jj]) >> > > >> > > Is there an efficient way of using itertools with this structure? >> > > >> > >> > You can always make two other iterators for each column. Since you have >> > two columns you would have 4 iterators. I am not sure how fast this is >> > going to be but I am confident that there is definitely a way to do >> this in >> > one for-loop, which is going to be way faster than nested loops. >> > >> > Be Well >> > Anthony >> > >> > >> > > >> > > >> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> > > pyt...@li...> wrote: >> > > >> > >> Send Pytables-users mailing list submissions to >> > >> pyt...@li... >> > >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> or, via email, send a message with subject or body 'help' to >> > >> pyt...@li... >> > >> >> > >> You can reach the person managing the list at >> > >> pyt...@li... >> > >> >> > >> When replying, please edit your Subject line so it is more specific >> > >> than "Re: Contents of Pytables-users digest..." >> > >> >> > >> >> > >> Today's Topics: >> > >> >> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> > >> >> > >> >> > >> >> ---------------------------------------------------------------------- >> > >> >> > >> Message: 1 >> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> > >> From: Josh Ayers <jos...@gm...> >> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> > >> To: Discussion list for PyTables >> > >> <pyt...@li...> >> > >> Message-ID: >> > >> < >> > >> CAC...@ma...> >> > >> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> > >> David, >> > >> >> > >> The change in issue 27 was only for iteration over a tables.Column >> > >> instance. To use it, tweak Anthony's code as follows. This will >> > iterate >> > >> over the "element" column, as in your original example. >> > >> >> > >> Note also that this will only work with the development version of >> > >> PyTables >> > >> available on github. It will be very slow using the released v2.4.0. >> > >> >> > >> >> > >> from itertools import izip >> > >> >> > >> with tb.openFile(...) as f: >> > >> data = f.root.data.cols.element >> > >> data_i = iter(data) >> > >> data_j = iter(data) >> > >> data_i.next() # throw the first value away >> > >> for i, j in izip(data_i, data_j): >> > >> compare(i, j) >> > >> >> > >> >> > >> Hope that helps, >> > >> Josh >> > >> >> > >> >> > >> >> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> > >> wrote: >> > >> >> > >> > HI David, >> > >> > >> > >> > Tables and table column iteration have been overhauled fairly >> recently >> > >> > [1]. So you might try creating two iterators, offset by one, and >> then >> > >> > doing the comparison. I am hacking this out super quick so please >> > >> forgive >> > >> > me: >> > >> > >> > >> > from itertools import izip >> > >> > >> > >> > with tb.openFile(...) as f: >> > >> > data = f.root.data >> > >> > data_i = iter(data) >> > >> > data_j = iter(data) >> > >> > data_i.next() # throw the first value away >> > >> > for i, j in izip(data_i, data_j): >> > >> > compare(i, j) >> > >> > >> > >> > You get the idea ;) >> > >> > >> > >> > Be Well >> > >> > Anthony >> > >> > >> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> > >> > >> > >> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm... >> > >> > >> wrote: >> > >> > >> > >> >> I was hoping someone could help me out here. >> > >> >> >> > >> >> This is from a post I put up on StackOverflow, >> > >> >> >> > >> >> I am have a fairly large dataset that I store in HDF5 and access >> > using >> > >> >> PyTables. One operation I need to do on this dataset are pairwise >> > >> >> comparisons between each of the elements. This requires 2 loops, >> one >> > to >> > >> >> iterate over each element, and an inner loop to iterate over every >> > >> other >> > >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> > >> >> >> > >> >> For fairly small sets I found it to be faster to dump the contents >> > >> into a >> > >> >> multdimensional numpy array and then do my iteration. I run into >> > >> problems >> > >> >> with large sets because of memory issues and need to access each >> > >> element of >> > >> >> the dataset at run time. >> > >> >> >> > >> >> Putting the elements into an array gives me about 600 comparisons >> per >> > >> >> second, while operating on hdf5 data itself gives me about 300 >> > >> comparisons >> > >> >> per second. >> > >> >> >> > >> >> Is there a way to speed this process up? >> > >> >> >> > >> >> Example follows (this is not my real code, just an example): >> > >> >> >> > >> >> *Small Set*: >> > >> >> >> > >> >> >> > >> >> with tb.openFile(h5_file, 'r') as f: >> > >> >> data = f.root.data >> > >> >> >> > >> >> N_elements = len(data) >> > >> >> elements = np.empty((N_irises, 1e5)) >> > >> >> >> > >> >> for ii, d in enumerate(data): >> > >> >> elements[ii] = data['element'] >> > >> >> >> > >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> > >> >> for jj in xrange(ii+1, N_elements): >> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> > >> >> >> > >> >> *Large Set*: >> > >> >> >> > >> >> >> > >> >> with tb.openFile(h5_file, 'r') as f: >> > >> >> data = f.root.data >> > >> >> >> > >> >> N_elements = len(data) >> > >> >> >> > >> >> D = np.empty((N_irises, N_irises)) >> > >> >> for ii in xrange(N_elements): >> > >> >> for jj in xrange(ii+1, N_elements): >> > >> >> D[ii, jj] = compare(data['element'][ii], >> > >> data['element'][jj]) >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > current >> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> >> MVPs and experts. ON SALE this month only -- learn more at: >> > >> >> http://p.sf.net/sfu/learnmore_122712 >> > >> >> _______________________________________________ >> > >> >> Pytables-users mailing list >> > >> >> Pyt...@li... >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >> > >> >> >> > >> > >> > >> > >> > >> > >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > current >> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> > MVPs and experts. ON SALE this month only -- learn more at: >> > >> > http://p.sf.net/sfu/learnmore_122712 >> > >> > _______________________________________________ >> > >> > Pytables-users mailing list >> > >> > Pyt...@li... >> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > >> > >> > >> -------------- next part -------------- >> > >> An HTML attachment was scrubbed... >> > >> >> > >> ------------------------------ >> > >> >> > >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> MVPs and experts. ON SALE this month only -- learn more at: >> > >> http://p.sf.net/sfu/learnmore_122712 >> > >> >> > >> ------------------------------ >> > >> >> > >> _______________________________________________ >> > >> Pytables-users mailing list >> > >> Pyt...@li... >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> > >> >> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> > >> ********************************************* >> > >> >> > > >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------------ >> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > > MVPs and experts. ON SALE this month only -- learn more at: >> > > http://p.sf.net/sfu/learnmore_122712 >> > > _______________________________________________ >> > > Pytables-users mailing list >> > > Pyt...@li... >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > >> > ------------------------------ >> > >> > Message: 2 >> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> > From: Anthony Scopatz <sc...@gm...> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > < >> > CAP...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > Josh is right that you can just edit the code by hand (which works but >> > sucks). >> > >> > However, on Windows -- on the rare occasion when I also have to develop >> on >> > it -- I typically use a distribution that includes a compiler, cython, >> > hdf5, and pytables already and then I install my development version >> from >> > github OVER this. I recommend either EPD or Anaconda, though other >> > distributions listed here [1] might also work. >> > >> > Be well >> > Anthony >> > >> > 1. http://numfocus.org/projects-2/software-distributions/ >> > >> > >> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> >> wrote: >> > >> > > The change was in pure Python code, so you should be able to just >> paste >> > in >> > > the changes to your local copy. Start with the table.Column.__iter__ >> > > method (lines 3296-3310) here. >> > > >> > > >> > > >> > >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> > > >> > > It needs to be modified slightly because it uses some additional >> features >> > > that aren't available in the released version (the out=buf_slice >> argument >> > > to table.read). The following should work. >> > > >> > > def __iter__(self): >> > > table = self.table >> > > itemsize = self.dtype.itemsize >> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >> itemsize >> > > max_row = len(self) >> > > for start_row in xrange(0, len(self), nrowsinbuf): >> > > end_row = min([start_row + nrowsinbuf, max_row]) >> > > buf = table.read(start_row, end_row, 1, >> field=self.pathname) >> > > for row in buf: >> > > yield row >> > > >> > > >> > > I haven't tested this, but I think it will work. >> > > >> > > Josh >> > > >> > > >> > > >> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> >> > wrote: >> > > >> > >> I apologize if I'm starting to sound helpless, but I'm forced to >> work on >> > >> Windows 7 at work and have never had luck compiling python source >> > >> successfully. I have had to rely on precompiled binaries and now its >> > >> biting me in the butt. >> > >> >> > >> Is there any quick fix I can do to improve this iteration using >> v2.4.0? >> > >> >> > >> >> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> > >> pyt...@li...> wrote: >> > >> >> > >>> Send Pytables-users mailing list submissions to >> > >>> pyt...@li... >> > >>> >> > >>> To subscribe or unsubscribe via the World Wide Web, visit >> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> or, via email, send a message with subject or body 'help' to >> > >>> pyt...@li... >> > >>> >> > >>> You can reach the person managing the list at >> > >>> pyt...@li... >> > >>> >> > >>> When replying, please edit your Subject line so it is more specific >> > >>> than "Re: Contents of Pytables-users digest..." >> > >>> >> > >>> >> > >>> Today's Topics: >> > >>> >> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> > >>> >> > >>> >> > >>> >> ---------------------------------------------------------------------- >> > >>> >> > >>> Message: 1 >> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> > >>> From: David Reed <dav...@gm...> >> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 >> > >>> To: pyt...@li... >> > >>> Message-ID: >> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> > >>> ev...@ma...> >> > >>> Content-Type: text/plain; charset="iso-8859-1" >> > >>> >> > >>> Thanks Anthony, but unless Im missing something I don't think that >> > method >> > >>> will work since this will only be comparing the ith element with >> ith+1 >> > >>> element. I still need 2 for loops right? >> > >>> >> > >>> Using itertools might speed things up though, I've never used them >> so I >> > >>> will give it a shot and let you know how it goes. Looks like I >> need to >> > >>> download the latest release before I do that too. Thanks for the >> help. >> > >>> >> > >>> -Dave >> > >>> >> > >>> >> > >>> >> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> > >>> pyt...@li...> wrote: >> > >>> >> > >>> > Send Pytables-users mailing list submissions to >> > >>> > pyt...@li... >> > >>> > >> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > or, via email, send a message with subject or body 'help' to >> > >>> > pyt...@li... >> > >>> > >> > >>> > You can reach the person managing the list at >> > >>> > pyt...@li... >> > >>> > >> > >>> > When replying, please edit your Subject line so it is more >> specific >> > >>> > than "Re: Contents of Pytables-users digest..." >> > >>> > >> > >>> > >> > >>> > Today's Topics: >> > >>> > >> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony >> Scopatz) >> > >>> > >> > >>> > >> > >>> > >> > ---------------------------------------------------------------------- >> > >>> > >> > >>> > Message: 1 >> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> > >>> > From: Anthony Scopatz <sc...@gm...> >> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> PyTables >> > >>> > To: Discussion list for PyTables >> > >>> > <pyt...@li...> >> > >>> > Message-ID: >> > >>> > <CAPk-6T5b= >> > >>> > 1EG...@ma...> >> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >>> > >> > >>> > HI David, >> > >>> > >> > >>> > Tables and table column iteration have been overhauled fairly >> > recently >> > >>> [1]. >> > >>> > So you might try creating two iterators, offset by one, and then >> > >>> doing the >> > >>> > comparison. I am hacking this out super quick so please forgive >> me: >> > >>> > >> > >>> > from itertools import izip >> > >>> > >> > >>> > with tb.openFile(...) as f: >> > >>> > data = f.root.data >> > >>> > data_i = iter(data) >> > >>> > data_j = iter(data) >> > >>> > data_i.next() # throw the first value away >> > >>> > for i, j in izip(data_i, data_j): >> > >>> > compare(i, j) >> > >>> > >> > >>> > You get the idea ;) >> > >>> > >> > >>> > Be Well >> > >>> > Anthony >> > >>> > >> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >>> > >> > >>> > >> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> dav...@gm...> >> > >>> wrote: >> > >>> > >> > >>> > > I was hoping someone could help me out here. >> > >>> > > >> > >>> > > This is from a post I put up on StackOverflow, >> > >>> > > >> > >>> > > I am have a fairly large dataset that I store in HDF5 and access >> > >>> using >> > >>> > > PyTables. One operation I need to do on this dataset are >> pairwise >> > >>> > > comparisons between each of the elements. This requires 2 loops, >> > one >> > >>> to >> > >>> > > iterate over each element, and an inner loop to iterate over >> every >> > >>> other >> > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. >> > >>> > > >> > >>> > > For fairly small sets I found it to be faster to dump the >> contents >> > >>> into a >> > >>> > > multdimensional numpy array and then do my iteration. I run into >> > >>> problems >> > >>> > > with large sets because of memory issues and need to access each >> > >>> element >> > >>> > of >> > >>> > > the dataset at run time. >> > >>> > > >> > >>> > > Putting the elements into an array gives me about 600 >> comparisons >> > per >> > >>> > > second, while operating on hdf5 data itself gives me about 300 >> > >>> > comparisons >> > >>> > > per second. >> > >>> > > >> > >>> > > Is there a way to speed this process up? >> > >>> > > >> > >>> > > Example follows (this is not my real code, just an example): >> > >>> > > >> > >>> > > *Small Set*: >> > >>> > > >> > >>> > > >> > >>> > > with tb.openFile(h5_file, 'r') as f: >> > >>> > > data = f.root.data >> > >>> > > >> > >>> > > N_elements = len(data) >> > >>> > > elements = np.empty((N_irises, 1e5)) >> > >>> > > >> > >>> > > for ii, d in enumerate(data): >> > >>> > > elements[ii] = data['element'] >> > >>> > > >> > >>> > > D = np.empty((N_irises, N_irises)) for ii in >> xrange(N_elements): >> > >>> > > for jj in xrange(ii+1, N_elements): >> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> > >>> > > >> > >>> > > *Large Set*: >> > >>> > > >> > >>> > > >> > >>> > > with tb.openFile(h5_file, 'r') as f: >> > >>> > > data = f.root.data >> > >>> > > >> > >>> > > N_elements = len(data) >> > >>> > > >> > >>> > > D = np.empty((N_irises, N_irises)) >> > >>> > > for ii in xrange(N_elements): >> > >>> > > for jj in xrange(ii+1, N_elements): >> > >>> > > D[ii, jj] = compare(data['element'][ii], >> > >>> > data['element'][jj]) >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> > CSS, >> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > >>> current >> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> > >>> > > _______________________________________________ >> > >>> > > Pytables-users mailing list >> > >>> > > Pyt...@li... >> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > > >> > >>> > > >> > >>> > -------------- next part -------------- >> > >>> > An HTML attachment was scrubbed... >> > >>> > >> > >>> > ------------------------------ >> > >>> > >> > >>> > >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > current >> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > http://p.sf.net/sfu/learnmore_122712 >> > >>> > >> > >>> > ------------------------------ >> > >>> > >> > >>> > _______________________________________________ >> > >>> > Pytables-users mailing list >> > >>> > Pyt...@li... >> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > >> > >>> > >> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> > >>> > ********************************************* >> > >>> > >> > >>> -------------- next part -------------- >> > >>> An HTML attachment was scrubbed... >> > >>> >> > >>> ------------------------------ >> > >>> >> > >>> Message: 2 >> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> > >>> From: David Reed <dav...@gm...> >> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> > >>> To: pyt...@li... >> > >>> Message-ID: >> > >>> < >> > >>> CAM...@ma...> >> > >>> Content-Type: text/plain; charset="iso-8859-1" >> > >>> >> > >>> Thanks a lot for the help so far guys! >> > >>> >> > >>> Looking at itertools, I found what I believe to be the perfect >> function >> > >>> for >> > >>> what I need, itertools.combinations. This appears to be a valid >> > >>> replacement >> > >>> to the method proposed. >> > >>> >> > >>> There is a small problem that I didn't mention is that my compare >> > >>> function >> > >>> actually takes as inputs 2 columns from the table. Like so: >> > >>> >> > >>> D = np.empty((N_irises, N_irises)) >> > >>> for ii in xrange(N_elements): >> > >>> for jj in xrange(ii+1, N_elements): >> > >>> D[ii, jj] = compare(data['element1'][ii], >> > >>> data['element1'][jj],data['element2'][ii], >> > >>> data['element2'][jj]) >> > >>> >> > >>> Is there an efficient way of using itertools with this structure? >> > >>> >> > >>> >> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> > >>> pyt...@li...> wrote: >> > >>> >> > >>> > Send Pytables-users mailing list submissions to >> > >>> > pyt...@li... >> > >>> > >> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > or, via email, send a message with subject or body 'help' to >> > >>> > pyt...@li... >> > >>> > >> > >>> > You can reach the person managing the list at >> > >>> > pyt...@li... >> > >>> > >> > >>> > When replying, please edit your Subject line so it is more >> specific >> > >>> > than "Re: Contents of Pytables-users digest..." >> > >>> > >> > >>> > >> > >>> > Today's Topics: >> > >>> > >> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> > >>> > >> > >>> > >> > >>> > >> > ---------------------------------------------------------------------- >> > >>> > >> > >>> > Message: 1 >> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> > >>> > From: Josh Ayers <jos...@gm...> >> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> PyTables >> > >>> > To: Discussion list for PyTables >> > >>> > <pyt...@li...> >> > >>> > Message-ID: >> > >>> > < >> > >>> > >> CAC...@ma...> >> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >>> > >> > >>> > David, >> > >>> > >> > >>> > The change in issue 27 was only for iteration over a tables.Column >> > >>> > instance. To use it, tweak Anthony's code as follows. This will >> > >>> iterate >> > >>> > over the "element" column, as in your original example. >> > >>> > >> > >>> > Note also that this will only work with the development version of >> > >>> PyTables >> > >>> > available on github. It will be very slow using the released >> v2.4.0. >> > >>> > >> > >>> > >> > >>> > from itertools import izip >> > >>> > >> > >>> > with tb.openFile(...) as f: >> > >>> > data = f.root.data.cols.element >> > >>> > data_i = iter(data) >> > >>> > data_j = iter(data) >> > >>> > data_i.next() # throw the first value away >> > >>> > for i, j in izip(data_i, data_j): >> > >>> > compare(i, j) >> > >>> > >> > >>> > >> > >>> > Hope that helps, >> > >>> > Josh >> > >>> > >> > >>> > >> > >>> > >> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> sc...@gm...> >> > >>> wrote: >> > >>> > >> > >>> > > HI David, >> > >>> > > >> > >>> > > Tables and table column iteration have been overhauled fairly >> > >>> recently >> > >>> > > [1]. So you might try creating two iterators, offset by one, >> and >> > >>> then >> > >>> > > doing the comparison. I am hacking this out super quick so >> please >> > >>> > forgive >> > >>> > > me: >> > >>> > > >> > >>> > > from itertools import izip >> > >>> > > >> > >>> > > with tb.openFile(...) as f: >> > >>> > > data = f.root.data >> > >>> > > data_i = iter(data) >> > >>> > > data_j = iter(data) >> > >>> > > data_i.next() # throw the first value away >> > >>> > > for i, j in izip(data_i, data_j): >> > >>> > > compare(i, j) >> > >>> > > >> > >>> > > You get the idea ;) >> > >>> > > >> > >>> > > Be Well >> > >>> > > Anthony >> > >>> > > >> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >> > >>> > > >> > >>> > > >> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> dav...@gm... >> > > >> > >>> > wrote: >> > >>> > > >> > >>> > >> I was hoping someone could help me out here. >> > >>> > >> >> > >>> > >> This is from a post I put up on StackOverflow, >> > >>> > >> >> > >>> > >> I am have a fairly large dataset that I store in HDF5 and >> access >> > >>> using >> > >>> > >> PyTables. One operation I need to do on this dataset are >> pairwise >> > >>> > >> comparisons between each of the elements. This requires 2 >> loops, >> > >>> one to >> > >>> > >> iterate over each element, and an inner loop to iterate over >> every >> > >>> other >> > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. >> > >>> > >> >> > >>> > >> For fairly small sets I found it to be faster to dump the >> contents >> > >>> into >> > >>> > a >> > >>> > >> multdimensional numpy array and then do my iteration. I run >> into >> > >>> > problems >> > >>> > >> with large sets because of memory issues and need to access >> each >> > >>> > element of >> > >>> > >> the dataset at run time. >> > >>> > >> >> > >>> > >> Putting the elements into an array gives me about 600 >> comparisons >> > >>> per >> > >>> > >> second, while operating on hdf5 data itself gives me about 300 >> > >>> > comparisons >> > >>> > >> per second. >> > >>> > >> >> > >>> > >> Is there a way to speed this process up? >> > >>> > >> >> > >>> > >> Example follows (this is not my real code, just an example): >> > >>> > >> >> > >>> > >> *Small Set*: >> > >>> > >> >> > >>> > >> >> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> > >>> > >> data = f.root.data >> > >>> > >> >> > >>> > >> N_elements = len(data) >> > >>> > >> elements = np.empty((N_irises, 1e5)) >> > >>> > >> >> > >>> > >> for ii, d in enumerate(data): >> > >>> > >> elements[ii] = data['element'] >> > >>> > >> >> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in >> xrange(N_elements): >> > >>> > >> for jj in xrange(ii+1, N_elements): >> > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> > >>> > >> >> > >>> > >> *Large Set*: >> > >>> > >> >> > >>> > >> >> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> > >>> > >> data = f.root.data >> > >>> > >> >> > >>> > >> N_elements = len(data) >> > >>> > >> >> > >>> > >> D = np.empty((N_irises, N_irises)) >> > >>> > >> for ii in xrange(N_elements): >> > >>> > >> for jj in xrange(ii+1, N_elements): >> > >>> > >> D[ii, jj] = compare(data['element'][ii], >> > >>> > data['element'][jj]) >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> > >>> CSS, >> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > >>> current >> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > >> http://p.sf.net/sfu/learnmore_122712 >> > >>> > >> _______________________________________________ >> > >>> > >> Pytables-users mailing list >> > >>> > >> Pyt...@li... >> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > >> >> > >>> > >> >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> > CSS, >> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > >>> current >> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> > >>> > > _______________________________________________ >> > >>> > > Pytables-users mailing list >> > >>> > > Pyt...@li... >> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > > >> > >>> > > >> > >>> > -------------- next part -------------- >> > >>> > An HTML attachment was scrubbed... >> > >>> > >> > >>> > ------------------------------ >> > >>> > >> > >>> > >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > current >> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > http://p.sf.net/sfu/learnmore_122712 >> > >>> > >> > >>> > ------------------------------ >> > >>> > >> > >>> > _______________________________________________ >> > >>> > Pytables-users mailing list >> > >>> > Pyt...@li... >> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > >> > >>> > >> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 >> > >>> > ********************************************* >> > >>> > >> > >>> -------------- next part -------------- >> > >>> An HTML attachment was scrubbed... >> > >>> >> > >>> ------------------------------ >> > >>> >> > >>> >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >>> MVPs and experts. ON SALE this month only -- learn more at: >> > >>> http://p.sf.net/sfu/learnmore_122712 >> > >>> >> > >>> ------------------------------ >> > >>> >> > >>> _______________________________________________ >> > >>> Pytables-users mailing list >> > >>> Pyt...@li... >> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> >> > >>> >> > >>> End of Pytables-users Digest, Vol 80, Issue 4 >> > >>> ********************************************* >> > >>> >> > >> >> > >> >> > >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> MVPs and experts. ON SALE this month only -- learn more at: >> > >> http://p.sf.net/sfu/learnmore_122712 >> > >> _______________________________________________ >> > >> Pytables-users mailing list >> > >> Pyt...@li... >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> > >> >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------------ >> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > > MVPs and experts. ON SALE this month only -- learn more at: >> > > http://p.sf.net/sfu/learnmore_122712 >> > > _______________________________________________ >> > > Pytables-users mailing list >> > > Pyt...@li... >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > >> > ------------------------------ >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > >> > ------------------------------ >> > >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > End of Pytables-users Digest, Vol 80, Issue 8 >> > ********************************************* >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and >> much more. Get web development skills now with LearnDevNow - >> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. >> SALE $99.99 this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122812 >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 80, Issue 9 >> ********************************************* >> > > |