From: David R. <dav...@gm...> - 2013-02-04 15:54:22
|
Hi Josh, Here is my __iter__ code: def __iter__(self): table = self.table itemsize = self.dtype.itemsize nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize max_row = len(self) for start_row in xrange(0, len(self), nrowsinbuf): end_row = min([start_row + nrowsinbuf, max_row]) buf = table.read(start_row, end_row, 1, field=self.pathname) for row in buf: yield row It does look different, I will try swapping in the code from github and see what happens. On Mon, Feb 4, 2013 at 9:59 AM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) > 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 1 Feb 2013 14:08:47 -0800 > From: Josh Ayers <jos...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > <CACOB4aPG4NZ6b2a3v= > 1Ue...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > David, > > You added a custom version of table.Column.__iter__, correct? Could you > also include that along with the script to reproduce the error? > > It seems like the problem may be in the 'nrowsinbuf' calculation - see > [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using the > default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6 > chunks. Instead, it's reading the entire table. > > [1]: > https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 > > > > On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> wrote: > > > > > > > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> > wrote: > > > >> at the error: > >> > >> result = numpy.empty(shape=nrows, dtype=dtypeField) > >> > >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> > >> I'm not sure what that means as a dtype, but thats what it is. > >> > >> Forgive me if I'm being totally naive, but I thought the whole point of > >> __iter__ with pyttables was to do iteration on the fly, so there is no > >> preallocation. > >> > > > > Nope you are not being naive at all. That is the point. > > > > > >> If you have any ideas on this I'm all ears. > >> > > > > If you could send a minimal script which reproduces this error, that > would > > help a lot. > > > > Be Well > > Anthony > > > > > >> > >> > >> Thanks again. > >> > >> Dave > >> > >> > >> On Fri, Feb 1, 2013 at 3:45 PM, < > >> pyt...@li...> wrote: > >> > >>> Send Pytables-users mailing list submissions to > >>> pyt...@li... > >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> or, via email, send a message with subject or body 'help' to > >>> pyt...@li... > >>> > >>> You can reach the person managing the list at > >>> pyt...@li... > >>> > >>> When replying, please edit your Subject line so it is more specific > >>> than "Re: Contents of Pytables-users digest..." > >>> > >>> > >>> Today's Topics: > >>> > >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) > >>> > >>> > >>> ---------------------------------------------------------------------- > >>> > >>> Message: 1 > >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >>> From: Anthony Scopatz <sc...@gm...> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 > >>> To: Discussion list for PyTables > >>> <pyt...@li...> > >>> Message-ID: > >>> < > >>> CAP...@ma...> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> > >>> wrote: > >>> > >>> > Hi Anthony, > >>> > > >>> > Thanks for the reply. > >>> > > >>> > I honestly don't know how to monitor my Python memory usage, but I'm > >>> sure > >>> > that its caused by out of memory. > >>> > > >>> > >>> Well, I would just run top or process monitor or something while > running > >>> the python script to see what happens to memory usage as the script > chugs > >>> along... > >>> > >>> > >>> > I'm just trying to find out how to fix it. My HDF5 table has 4620 > >>> rows > >>> > and the column I'm iterating over is a 17x9600 boolean matrix. The > >>> > __iter__ method is preallocating an array that is this size which > >>> appears > >>> > to be root of the error. I was hoping there is a fix somewhere in > >>> here to > >>> > not have to do this preallocation. > >>> > > >>> > >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of > >>> these is ~760 MB. If you have 2 GB of memory and you are iterating > over > >>> 2 > >>> of these (templates & masks) it is conceivable that you are just > running > >>> out of memory. Maybe there is a way that __iter__ could not > preallocate > >>> something that is basically a temporary. What is the dtype of the > >>> templates array? > >>> > >>> Be Well > >>> Anthony > >>> > >>> > >>> > > >>> > Thanks again. > >>> > >>> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Mon, 4 Feb 2013 09:58:53 -0500 > From: David Reed <dav...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 > To: pyt...@li... > Message-ID: > <CAM6XA7= > h50...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Anthony, > > Sorry to just get back to you. I can send a script, should I send a script > that creates some fake data as well? > > -Dave > > > On Fri, Feb 1, 2013 at 4:50 PM, < > pyt...@li...> wrote: > > > Send Pytables-users mailing list submissions to > > pyt...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > or, via email, send a message with subject or body 'help' to > > pyt...@li... > > > > You can reach the person managing the list at > > pyt...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Pytables-users digest..." > > > > > > Today's Topics: > > > > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Fri, 1 Feb 2013 15:50:11 -0600 > > From: Anthony Scopatz <sc...@gm...> > > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > < > > CAP...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> > wrote: > > > > > at the error: > > > > > > result = numpy.empty(shape=nrows, dtype=dtypeField) > > > > > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > > > > > > I'm not sure what that means as a dtype, but thats what it is. > > > > > > Forgive me if I'm being totally naive, but I thought the whole point of > > > __iter__ with pyttables was to do iteration on the fly, so there is no > > > preallocation. > > > > > > > Nope you are not being naive at all. That is the point. > > > > > > > If you have any ideas on this I'm all ears. > > > > > > > If you could send a minimal script which reproduces this error, that > would > > help a lot. > > > > Be Well > > Anthony > > > > > > > > > > > > > Thanks again. > > > > > > Dave > > > > > > > > > On Fri, Feb 1, 2013 at 3:45 PM, < > > > pyt...@li...> wrote: > > > > > >> Send Pytables-users mailing list submissions to > > >> pyt...@li... > > >> > > >> To subscribe or unsubscribe via the World Wide Web, visit > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> or, via email, send a message with subject or body 'help' to > > >> pyt...@li... > > >> > > >> You can reach the person managing the list at > > >> pyt...@li... > > >> > > >> When replying, please edit your Subject line so it is more specific > > >> than "Re: Contents of Pytables-users digest..." > > >> > > >> > > >> Today's Topics: > > >> > > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) > > >> > > >> > > >> ---------------------------------------------------------------------- > > >> > > >> Message: 1 > > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 > > >> From: Anthony Scopatz <sc...@gm...> > > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 > > >> To: Discussion list for PyTables > > >> <pyt...@li...> > > >> Message-ID: > > >> < > > >> CAP...@ma...> > > >> Content-Type: text/plain; charset="iso-8859-1" > > >> > > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> > > >> wrote: > > >> > > >> > Hi Anthony, > > >> > > > >> > Thanks for the reply. > > >> > > > >> > I honestly don't know how to monitor my Python memory usage, but I'm > > >> sure > > >> > that its caused by out of memory. > > >> > > > >> > > >> Well, I would just run top or process monitor or something while > running > > >> the python script to see what happens to memory usage as the script > > chugs > > >> along... > > >> > > >> > > >> > I'm just trying to find out how to fix it. My HDF5 table has 4620 > > rows > > >> > and the column I'm iterating over is a 17x9600 boolean matrix. The > > >> > __iter__ method is preallocating an array that is this size which > > >> appears > > >> > to be root of the error. I was hoping there is a fix somewhere in > > here > > >> to > > >> > not have to do this preallocation. > > >> > > > >> > > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of > > >> these is ~760 MB. If you have 2 GB of memory and you are iterating > > over 2 > > >> of these (templates & masks) it is conceivable that you are just > running > > >> out of memory. Maybe there is a way that __iter__ could not > preallocate > > >> something that is basically a temporary. What is the dtype of the > > >> templates array? > > >> > > >> Be Well > > >> Anthony > > >> > > >> > > >> > > > >> > Thanks again. > > >> > > > >> > > > >> > > > >> > > > >> > On Fri, Feb 1, 2013 at 11:12 AM, < > > >> > pyt...@li...> wrote: > > >> > > > >> >> Send Pytables-users mailing list submissions to > > >> >> pyt...@li... > > >> >> > > >> >> To subscribe or unsubscribe via the World Wide Web, visit > > >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> or, via email, send a message with subject or body 'help' to > > >> >> pyt...@li... > > >> >> > > >> >> You can reach the person managing the list at > > >> >> pyt...@li... > > >> >> > > >> >> When replying, please edit your Subject line so it is more specific > > >> >> than "Re: Contents of Pytables-users digest..." > > >> >> > > >> >> > > >> >> Today's Topics: > > >> >> > > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) > > >> >> > > >> >> > > >> >> > > ---------------------------------------------------------------------- > > >> >> > > >> >> Message: 1 > > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > > >> >> From: Anthony Scopatz <sc...@gm...> > > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue > 9 > > >> >> To: Discussion list for PyTables > > >> >> <pyt...@li...> > > >> >> Message-ID: > > >> >> < > > >> >> CAP...@ma... > > > > >> >> Content-Type: text/plain; charset="iso-8859-1" > > >> >> > > >> >> Hi David, > > >> >> > > >> >> Sorry, I haven't had a ton of time recently. You seem to be > getting > > a > > >> >> memory error on creating a numpy array. This kind of thing > typically > > >> >> happens when you are out of memory. Does this seem to be the case > > with > > >> >> you? When this dies, is your memory usage at 100%? If so, this > > >> algorithm > > >> >> might require a little tweaking... > > >> >> > > >> >> Be Well > > >> >> Anthony > > >> >> > > >> >> > > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm... > > > > >> >> wrote: > > >> >> > > >> >> > I'm still having problems with this one. I can't tell if this > > >> something > > >> >> > dumb Im doing with itertools, or if its something in pytables. > > >> >> > > > >> >> > Would appreciate any help. > > >> >> > > > >> >> > Thanks > > >> >> > > > >> >> > > > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < > > dav...@gm... > > >> >> >wrote: > > >> >> > > > >> >> >> I think I have to reopen this issue. I have been running fine > for > > >> >> awhile > > >> >> >> using the combinations method from itertools, but have recently > > run > > >> >> into a > > >> >> >> memory since I have recently quadrupled the size of the hdf > file. > > >> >> >> > > >> >> >> Here is my code again: > > >> >> >> > > >> >> >> from itertools import combinations, izip > > >> >> >> with tb.openFile(h5_all, 'r') as f: > > >> >> >> irises = f.root.irises > > >> >> >> > > >> >> >> templates = f.root.irises.cols.templates > > >> >> >> masks = f.root.irises.cols.masks1 > > >> >> >> > > >> >> >> N_irises = len(irises) > > >> >> >> index = np.ones((20 * 480), np.bool) > > >> >> >> > > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > > >> >> >> D = np.empty((N_irises, N_irises)) > > >> >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, > > >> masks, > > >> >> >> range(N_irises)), 2): > > >> >> >> # print ii > > >> >> >> D[ii, jj] = ham_dist( > > >> >> >> t1[8, index], > > >> >> >> t2[:, index], > > >> >> >> m1[8, index], > > >> >> >> m2[:, index], > > >> >> >> ) > > >> >> >> > > >> >> >> And here is the error: > > >> >> >> > > >> >> >> In [10]: get_hd3() > > >> >> >> 10669890 Comparisons > > >> >> >> > > >> >> >> > > >> >> > > >> > > > --------------------------------------------------------------------------- > > >> >> >> MemoryError Traceback (most recent > > >> call > > >> >> >> last) > > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > > >> >> >> ----> 1 get_hd3() > > >> >> >> > > >> >> >> > > >> >> >> 118 print '%i Comparisons' % > > >> (N_irises*(N_irises - > > >> >> >> 1)/2) > > >> >> >> 119 D = np.empty((N_irises, N_irises)) > > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in > > >> >> >> combinations(izip(temp > > >> >> >> lates, masks, range(N_irises)), 2): > > >> >> >> 121 # print ii > > >> >> >> 122 D[ii, jj] = ham_dist( > > >> >> >> > > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) > > >> >> >> 3274 for start_row in xrange(0, len(self), > nrowsinbuf): > > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, > > max_row]) > > >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, > > >> >> >> field=self.pathname) > > >> >> >> > > >> >> >> 3277 for row in buf: > > >> >> >> 3278 yield row > > >> >> >> > > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, > > start, > > >> >> stop, > > >> >> >> step, > > >> >> >> field) > > >> >> >> 1772 (start, stop, step) = > > self._processRangeRead(start, > > >> >> stop, > > >> >> >> step) > > >> >> >> 1773 > > >> >> >> -> 1774 arr = self._read(start, stop, step, field) > > >> >> >> 1775 return internal_to_flavor(arr, self.flavor) > > >> >> >> 1776 > > >> >> >> > > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, > > start, > > >> >> >> stop, step, > > >> >> >> field) > > >> >> >> 1719 if field: > > >> >> >> 1720 # Create a container for the results > > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > > >> dtype=dtypeField) > > >> >> >> 1722 else: > > >> >> >> 1723 # Recarray case > > >> >> >> > > >> >> >> MemoryError: > > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() > > >> >> >> 1720 # Create a container for the results > > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > > >> dtype=dtypeField) > > >> >> >> 1722 else: > > >> >> >> > > >> >> >> Also, if you guys see any performance problems in my code, > please > > >> let > > >> >> me > > >> >> >> know. > > >> >> >> > > >> >> >> Thank you so much for the help. > > >> >> >> > > >> >> >> -Dave > > >> >> >> > > >> >> >> > > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > > >> >> >> pyt...@li...> wrote: > > >> >> >> > > >> >> >>> Send Pytables-users mailing list submissions to > > >> >> >>> pyt...@li... > > >> >> >>> > > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > > >> >> >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> or, via email, send a message with subject or body 'help' to > > >> >> >>> pyt...@li... > > >> >> >>> > > >> >> >>> You can reach the person managing the list at > > >> >> >>> pyt...@li... > > >> >> >>> > > >> >> >>> When replying, please edit your Subject line so it is more > > specific > > >> >> >>> than "Re: Contents of Pytables-users digest..." > > >> >> >>> > > >> >> >>> > > >> >> >>> Today's Topics: > > >> >> >>> > > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) > > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> ---------------------------------------------------------------------- > > >> >> >>> > > >> >> >>> Message: 1 > > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > > >> >> >>> From: David Reed <dav...@gm...> > > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > > Issue > > >> 8 > > >> >> >>> To: pyt...@li... > > >> >> >>> Message-ID: > > >> >> >>> < > > >> >> >>> > > CAM...@ma... > > >> > > > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > > >> >> >>> I can't thank you guys enough for the help. I was able to add > > the > > >> >> >>> __iter__ > > >> >> >>> function to the table.py file and everything seems to be > working > > >> >> great! > > >> >> >>> I'm not quite as fast as I was with iterating right of a > matrix > > >> but > > >> >> >>> pretty > > >> >> >>> close. I was at 555 comparisons per second, and now im at 420. > > >> >> >>> > > >> >> >>> I handled the problem I mentioned earlier by doing this, and it > > >> seems > > >> >> to > > >> >> >>> work great: > > >> >> >>> > > >> >> >>> A = f.root.data.cols.A > > >> >> >>> B = f.root.data.cols.B > > >> >> >>> > > >> >> >>> D = np.empty((len(A), len(A)) > > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, > > >> >> range(len(A))), > > >> >> >>> 2): > > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > > >> >> >>> > > >> >> >>> Again, thanks a lot. > > >> >> >>> > > >> >> >>> -Dave > > >> >> >>> > > >> >> >>> > > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > > >> >> >>> pyt...@li...> wrote: > > >> >> >>> > > >> >> >>> > Send Pytables-users mailing list submissions to > > >> >> >>> > pyt...@li... > > >> >> >>> > > > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit > > >> >> >>> > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > or, via email, send a message with subject or body 'help' to > > >> >> >>> > pyt...@li... > > >> >> >>> > > > >> >> >>> > You can reach the person managing the list at > > >> >> >>> > pyt...@li... > > >> >> >>> > > > >> >> >>> > When replying, please edit your Subject line so it is more > > >> specific > > >> >> >>> > than "Re: Contents of Pytables-users digest..." > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > Today's Topics: > > >> >> >>> > > > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony > > >> Scopatz) > > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony > > >> Scopatz) > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> > > ---------------------------------------------------------------------- > > >> >> >>> > > > >> >> >>> > Message: 1 > > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > > >> Issue 3 > > >> >> >>> > To: Discussion list for PyTables > > >> >> >>> > <pyt...@li...> > > >> >> >>> > Message-ID: > > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > > >> >> >>> > Gz...@ma...> > > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > > > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > > >> dav...@gm...> > > >> >> >>> wrote: > > >> >> >>> > > > >> >> >>> > > Thanks a lot for the help so far guys! > > >> >> >>> > > > > >> >> >>> > > Looking at itertools, I found what I believe to be the > > perfect > > >> >> >>> function > > >> >> >>> > > for what I need, itertools.combinations. This appears to > be a > > >> >> valid > > >> >> >>> > > replacement to the method proposed. > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > Yes, combinations is awesome! > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > > >> >> >>> > > There is a small problem that I didn't mention is that my > > >> compare > > >> >> >>> > function > > >> >> >>> > > actually takes as inputs 2 columns from the table. Like so: > > >> >> >>> > > > > >> >> >>> > > D = np.empty((N_irises, N_irises)) > > >> >> >>> > > for ii in xrange(N_elements): > > >> >> >>> > > for jj in xrange(ii+1, N_elements): > > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], > > >> >> >>> > data['element1'][jj],data['element2'][ii], > > >> >> >>> > > data['element2'][jj]) > > >> >> >>> > > > > >> >> >>> > > Is there an efficient way of using itertools with this > > >> structure? > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > You can always make two other iterators for each column. > Since > > >> you > > >> >> >>> have > > >> >> >>> > two columns you would have 4 iterators. I am not sure how > fast > > >> >> this is > > >> >> >>> > going to be but I am confident that there is definitely a way > > to > > >> do > > >> >> >>> this in > > >> >> >>> > one for-loop, which is going to be way faster than nested > > loops. > > >> >> >>> > > > >> >> >>> > Be Well > > >> >> >>> > Anthony > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > > >> >> >>> > > pyt...@li...> wrote: > > >> >> >>> > > > > >> >> >>> > >> Send Pytables-users mailing list submissions to > > >> >> >>> > >> pyt...@li... > > >> >> >>> > >> > > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit > > >> >> >>> > >> > > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >> or, via email, send a message with subject or body 'help' > to > > >> >> >>> > >> pyt...@li... > > >> >> >>> > >> > > >> >> >>> > >> You can reach the person managing the list at > > >> >> >>> > >> pyt...@li... > > >> >> >>> > >> > > >> >> >>> > >> When replying, please edit your Subject line so it is more > > >> >> specific > > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> Today's Topics: > > >> >> >>> > >> > > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh > > Ayers) > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > > >> ---------------------------------------------------------------------- > > >> >> >>> > >> > > >> >> >>> > >> Message: 1 > > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > > >> >> >>> > >> From: Josh Ayers <jos...@gm...> > > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 > using > > >> >> >>> PyTables > > >> >> >>> > >> To: Discussion list for PyTables > > >> >> >>> > >> <pyt...@li...> > > >> >> >>> > >> Message-ID: > > >> >> >>> > >> < > > >> >> >>> > >> > > >> >> CAC...@ma... > > > > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >> > > >> >> >>> > >> David, > > >> >> >>> > >> > > >> >> >>> > >> The change in issue 27 was only for iteration over a > > >> >> tables.Column > > >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. > This > > >> will > > >> >> >>> > iterate > > >> >> >>> > >> over the "element" column, as in your original example. > > >> >> >>> > >> > > >> >> >>> > >> Note also that this will only work with the development > > >> version > > >> >> of > > >> >> >>> > >> PyTables > > >> >> >>> > >> available on github. It will be very slow using the > > released > > >> >> >>> v2.4.0. > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> from itertools import izip > > >> >> >>> > >> > > >> >> >>> > >> with tb.openFile(...) as f: > > >> >> >>> > >> data = f.root.data.cols.element > > >> >> >>> > >> data_i = iter(data) > > >> >> >>> > >> data_j = iter(data) > > >> >> >>> > >> data_i.next() # throw the first value away > > >> >> >>> > >> for i, j in izip(data_i, data_j): > > >> >> >>> > >> compare(i, j) > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> Hope that helps, > > >> >> >>> > >> Josh > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > > >> >> sc...@gm...> > > >> >> >>> > >> wrote: > > >> >> >>> > >> > > >> >> >>> > >> > HI David, > > >> >> >>> > >> > > > >> >> >>> > >> > Tables and table column iteration have been overhauled > > >> fairly > > >> >> >>> recently > > >> >> >>> > >> > [1]. So you might try creating two iterators, offset by > > >> one, > > >> >> and > > >> >> >>> then > > >> >> >>> > >> > doing the comparison. I am hacking this out super quick > > so > > >> >> please > > >> >> >>> > >> forgive > > >> >> >>> > >> > me: > > >> >> >>> > >> > > > >> >> >>> > >> > from itertools import izip > > >> >> >>> > >> > > > >> >> >>> > >> > with tb.openFile(...) as f: > > >> >> >>> > >> > data = f.root.data > > >> >> >>> > >> > data_i = iter(data) > > >> >> >>> > >> > data_j = iter(data) > > >> >> >>> > >> > data_i.next() # throw the first value away > > >> >> >>> > >> > for i, j in izip(data_i, data_j): > > >> >> >>> > >> > compare(i, j) > > >> >> >>> > >> > > > >> >> >>> > >> > You get the idea ;) > > >> >> >>> > >> > > > >> >> >>> > >> > Be Well > > >> >> >>> > >> > Anthony > > >> >> >>> > >> > > > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > > >> >> >>> dav...@gm...> > > >> >> >>> > >> wrote: > > >> >> >>> > >> > > > >> >> >>> > >> >> I was hoping someone could help me out here. > > >> >> >>> > >> >> > > >> >> >>> > >> >> This is from a post I put up on StackOverflow, > > >> >> >>> > >> >> > > >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 > and > > >> >> access > > >> >> >>> > using > > >> >> >>> > >> >> PyTables. One operation I need to do on this dataset > are > > >> >> pairwise > > >> >> >>> > >> >> comparisons between each of the elements. This > requires 2 > > >> >> loops, > > >> >> >>> one > > >> >> >>> > to > > >> >> >>> > >> >> iterate over each element, and an inner loop to iterate > > >> over > > >> >> >>> every > > >> >> >>> > >> other > > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 > > comparisons. > > >> >> >>> > >> >> > > >> >> >>> > >> >> For fairly small sets I found it to be faster to dump > the > > >> >> >>> contents > > >> >> >>> > >> into a > > >> >> >>> > >> >> multdimensional numpy array and then do my iteration. I > > run > > >> >> into > > >> >> >>> > >> problems > > >> >> >>> > >> >> with large sets because of memory issues and need to > > access > > >> >> each > > >> >> >>> > >> element of > > >> >> >>> > >> >> the dataset at run time. > > >> >> >>> > >> >> > > >> >> >>> > >> >> Putting the elements into an array gives me about 600 > > >> >> >>> comparisons per > > >> >> >>> > >> >> second, while operating on hdf5 data itself gives me > > about > > >> 300 > > >> >> >>> > >> comparisons > > >> >> >>> > >> >> per second. > > >> >> >>> > >> >> > > >> >> >>> > >> >> Is there a way to speed this process up? > > >> >> >>> > >> >> > > >> >> >>> > >> >> Example follows (this is not my real code, just an > > >> example): > > >> >> >>> > >> >> > > >> >> >>> > >> >> *Small Set*: > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > > >> >> >>> > >> >> data = f.root.data > > >> >> >>> > >> >> > > >> >> >>> > >> >> N_elements = len(data) > > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > > >> >> >>> > >> >> > > >> >> >>> > >> >> for ii, d in enumerate(data): > > >> >> >>> > >> >> elements[ii] = data['element'] > > >> >> >>> > >> >> > > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > > >> >> xrange(N_elements): > > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > > >> >> >>> > >> >> > > >> >> >>> > >> >> *Large Set*: > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > > >> >> >>> > >> >> data = f.root.data > > >> >> >>> > >> >> > > >> >> >>> > >> >> N_elements = len(data) > > >> >> >>> > >> >> > > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) > > >> >> >>> > >> >> for ii in xrange(N_elements): > > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > > >> >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], > > >> >> >>> > >> data['element'][jj]) > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > > >> >> HTML5, > > >> >> >>> CSS, > > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep > your > > >> >> skills > > >> >> >>> > current > > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials > by > > >> >> >>> Microsoft > > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn more > > at: > > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >> >> _______________________________________________ > > >> >> >>> > >> >> Pytables-users mailing list > > >> >> >>> > >> >> Pyt...@li... > > >> >> >>> > >> >> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > > >> >> HTML5, > > >> >> >>> CSS, > > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your > > >> skills > > >> >> >>> > current > > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by > > >> >> Microsoft > > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more > > at: > > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >> > _______________________________________________ > > >> >> >>> > >> > Pytables-users mailing list > > >> >> >>> > >> > Pyt...@li... > > >> >> >>> > >> > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> -------------- next part -------------- > > >> >> >>> > >> An HTML attachment was scrubbed... > > >> >> >>> > >> > > >> >> >>> > >> ------------------------------ > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > > >> HTML5, > > >> >> >>> CSS, > > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your > > >> skills > > >> >> >>> current > > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > > >> >> Microsoft > > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more > at: > > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >> > > >> >> >>> > >> ------------------------------ > > >> >> >>> > >> > > >> >> >>> > >> _______________________________________________ > > >> >> >>> > >> Pytables-users mailing list > > >> >> >>> > >> Pyt...@li... > > >> >> >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > > >> >> >>> > >> ********************************************* > > >> >> >>> > >> > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > > >> HTML5, > > >> >> CSS, > > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > > skills > > >> >> >>> current > > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > > >> Microsoft > > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: > > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > > _______________________________________________ > > >> >> >>> > > Pytables-users mailing list > > >> >> >>> > > Pyt...@li... > > >> >> >>> > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > -------------- next part -------------- > > >> >> >>> > An HTML attachment was scrubbed... > > >> >> >>> > > > >> >> >>> > ------------------------------ > > >> >> >>> > > > >> >> >>> > Message: 2 > > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > > >> Issue 4 > > >> >> >>> > To: Discussion list for PyTables > > >> >> >>> > <pyt...@li...> > > >> >> >>> > Message-ID: > > >> >> >>> > < > > >> >> >>> > > > >> CAP...@ma...> > > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > > > >> >> >>> > Josh is right that you can just edit the code by hand (which > > >> works > > >> >> but > > >> >> >>> > sucks). > > >> >> >>> > > > >> >> >>> > However, on Windows -- on the rare occasion when I also have > to > > >> >> >>> develop on > > >> >> >>> > it -- I typically use a distribution that includes a > compiler, > > >> >> cython, > > >> >> >>> > hdf5, and pytables already and then I install my development > > >> version > > >> >> >>> from > > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, though > > >> other > > >> >> >>> > distributions listed here [1] might also work. > > >> >> >>> > > > >> >> >>> > Be well > > >> >> >>> > Anthony > > >> >> >>> > > > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < > > jos...@gm... > > >> > > > >> >> >>> wrote: > > >> >> >>> > > > >> >> >>> > > The change was in pure Python code, so you should be able > to > > >> just > > >> >> >>> paste > > >> >> >>> > in > > >> >> >>> > > the changes to your local copy. Start with the > > >> >> table.Column.__iter__ > > >> >> >>> > > method (lines 3296-3310) here. > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > > >> >> >>> > > > > >> >> >>> > > It needs to be modified slightly because it uses some > > >> additional > > >> >> >>> features > > >> >> >>> > > that aren't available in the released version (the > > >> out=buf_slice > > >> >> >>> argument > > >> >> >>> > > to table.read). The following should work. > > >> >> >>> > > > > >> >> >>> > > def __iter__(self): > > >> >> >>> > > table = self.table > > >> >> >>> > > itemsize = self.dtype.itemsize > > >> >> >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] > > // > > >> >> >>> itemsize > > >> >> >>> > > max_row = len(self) > > >> >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): > > >> >> >>> > > end_row = min([start_row + nrowsinbuf, > max_row]) > > >> >> >>> > > buf = table.read(start_row, end_row, 1, > > >> >> >>> field=self.pathname) > > >> >> >>> > > for row in buf: > > >> >> >>> > > yield row > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > I haven't tested this, but I think it will work. > > >> >> >>> > > > > >> >> >>> > > Josh > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < > > >> >> dav...@gm...> > > >> >> >>> > wrote: > > >> >> >>> > > > > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm > > forced > > >> to > > >> >> >>> work on > > >> >> >>> > >> Windows 7 at work and have never had luck compiling python > > >> source > > >> >> >>> > >> successfully. I have had to rely on precompiled binaries > > and > > >> now > > >> >> >>> its > > >> >> >>> > >> biting me in the butt. > > >> >> >>> > >> > > >> >> >>> > >> Is there any quick fix I can do to improve this iteration > > >> using > > >> >> >>> v2.4.0? > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > > >> >> >>> > >> pyt...@li...> wrote: > > >> >> >>> > >> > > >> >> >>> > >>> Send Pytables-users mailing list submissions to > > >> >> >>> > >>> pyt...@li... > > >> >> >>> > >>> > > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > > >> >> >>> > >>> > > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> or, via email, send a message with subject or body 'help' > > to > > >> >> >>> > >>> pyt...@li... > > >> >> >>> > >>> > > >> >> >>> > >>> You can reach the person managing the list at > > >> >> >>> > >>> pyt...@li... > > >> >> >>> > >>> > > >> >> >>> > >>> When replying, please edit your Subject line so it is > more > > >> >> specific > > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> Today's Topics: > > >> >> >>> > >>> > > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David > > Reed) > > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David > > Reed) > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > > >> ---------------------------------------------------------------------- > > >> >> >>> > >>> > > >> >> >>> > >>> Message: 1 > > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > > >> >> >>> > >>> From: David Reed <dav...@gm...> > > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > > 80, > > >> >> Issue > > >> >> >>> 2 > > >> >> >>> > >>> To: pyt...@li... > > >> >> >>> > >>> Message-ID: > > >> >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > > >> >> >>> > >>> ev...@ma...> > > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >>> > > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't > > think > > >> >> that > > >> >> >>> > method > > >> >> >>> > >>> will work since this will only be comparing the ith > element > > >> with > > >> >> >>> ith+1 > > >> >> >>> > >>> element. I still need 2 for loops right? > > >> >> >>> > >>> > > >> >> >>> > >>> Using itertools might speed things up though, I've never > > used > > >> >> them > > >> >> >>> so I > > >> >> >>> > >>> will give it a shot and let you know how it goes. Looks > > >> like I > > >> >> >>> need to > > >> >> >>> > >>> download the latest release before I do that too. Thanks > > for > > >> >> the > > >> >> >>> help. > > >> >> >>> > >>> > > >> >> >>> > >>> -Dave > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > > >> >> >>> > >>> pyt...@li...> wrote: > > >> >> >>> > >>> > > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, > visit > > >> >> >>> > >>> > > > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> > or, via email, send a message with subject or body > 'help' > > >> to > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > You can reach the person managing the list at > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > When replying, please edit your Subject line so it is > > more > > >> >> >>> specific > > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > Today's Topics: > > >> >> >>> > >>> > > > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables > > (Anthony > > >> >> >>> Scopatz) > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > > > >> >> > > ---------------------------------------------------------------------- > > >> >> >>> > >>> > > > >> >> >>> > >>> > Message: 1 > > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> > > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 > > >> using > > >> >> >>> PyTables > > >> >> >>> > >>> > To: Discussion list for PyTables > > >> >> >>> > >>> > <pyt...@li...> > > >> >> >>> > >>> > Message-ID: > > >> >> >>> > >>> > <CAPk-6T5b= > > >> >> >>> > >>> > > 1EG...@ma... > > > > > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >>> > > > >> >> >>> > >>> > HI David, > > >> >> >>> > >>> > > > >> >> >>> > >>> > Tables and table column iteration have been overhauled > > >> fairly > > >> >> >>> > recently > > >> >> >>> > >>> [1]. > > >> >> >>> > >>> > So you might try creating two iterators, offset by > one, > > >> and > > >> >> then > > >> >> >>> > >>> doing the > > >> >> >>> > >>> > comparison. I am hacking this out super quick so > please > > >> >> forgive > > >> >> >>> me: > > >> >> >>> > >>> > > > >> >> >>> > >>> > from itertools import izip > > >> >> >>> > >>> > > > >> >> >>> > >>> > with tb.openFile(...) as f: > > >> >> >>> > >>> > data = f.root.data > > >> >> >>> > >>> > data_i = iter(data) > > >> >> >>> > >>> > data_j = iter(data) > > >> >> >>> > >>> > data_i.next() # throw the first value away > > >> >> >>> > >>> > for i, j in izip(data_i, data_j): > > >> >> >>> > >>> > compare(i, j) > > >> >> >>> > >>> > > > >> >> >>> > >>> > You get the idea ;) > > >> >> >>> > >>> > > > >> >> >>> > >>> > Be Well > > >> >> >>> > >>> > Anthony > > >> >> >>> > >>> > > > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > > >> >> >>> dav...@gm...> > > >> >> >>> > >>> wrote: > > >> >> >>> > >>> > > > >> >> >>> > >>> > > I was hoping someone could help me out here. > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > This is from a post I put up on StackOverflow, > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > I am have a fairly large dataset that I store in HDF5 > > and > > >> >> >>> access > > >> >> >>> > >>> using > > >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset > > are > > >> >> >>> pairwise > > >> >> >>> > >>> > > comparisons between each of the elements. This > > requires 2 > > >> >> >>> loops, > > >> >> >>> > one > > >> >> >>> > >>> to > > >> >> >>> > >>> > > iterate over each element, and an inner loop to > iterate > > >> over > > >> >> >>> every > > >> >> >>> > >>> other > > >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 > > >> comparisons. > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > For fairly small sets I found it to be faster to dump > > the > > >> >> >>> contents > > >> >> >>> > >>> into a > > >> >> >>> > >>> > > multdimensional numpy array and then do my > iteration. I > > >> run > > >> >> >>> into > > >> >> >>> > >>> problems > > >> >> >>> > >>> > > with large sets because of memory issues and need to > > >> access > > >> >> >>> each > > >> >> >>> > >>> element > > >> >> >>> > >>> > of > > >> >> >>> > >>> > > the dataset at run time. > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Putting the elements into an array gives me about 600 > > >> >> >>> comparisons > > >> >> >>> > per > > >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me > > >> about > > >> >> 300 > > >> >> >>> > >>> > comparisons > > >> >> >>> > >>> > > per second. > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Is there a way to speed this process up? > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Example follows (this is not my real code, just an > > >> example): > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > *Small Set*: > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > > >> >> >>> > >>> > > data = f.root.data > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > N_elements = len(data) > > >> >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > for ii, d in enumerate(data): > > >> >> >>> > >>> > > elements[ii] = data['element'] > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in > > >> >> >>> xrange(N_elements): > > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > > >> >> >>> > >>> > > D[ii, jj] = compare(elements[ii], > elements[jj]) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > *Large Set*: > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > > >> >> >>> > >>> > > data = f.root.data > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > N_elements = len(data) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) > > >> >> >>> > >>> > > for ii in xrange(N_elements): > > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > > >> >> >>> > >>> > > D[ii, jj] = compare(data['element'][ii], > > >> >> >>> > >>> > data['element'][jj]) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > > 2012, > > >> >> >>> HTML5, > > >> >> >>> > CSS, > > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep > > your > > >> >> skills > > >> >> >>> > >>> current > > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials > > by > > >> >> >>> Microsoft > > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn > more > > >> at: > > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >>> > > _______________________________________________ > > >> >> >>> > >>> > > Pytables-users mailing list > > >> >> >>> > >>> > > Pyt...@li... > > >> >> >>> > >>> > > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > -------------- next part -------------- > > >> >> >>> > >>> > An HTML attachment was scrubbed... > > >> >> >>> > >>> > > > >> >> >>> > >>> > ------------------------------ > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > > >> >> HTML5, > > >> >> >>> CSS, > > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep > your > > >> >> skills > > >> >> >>> > current > > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials > by > > >> >> >>> Microsoft > > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more > > at: > > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >>> > > > >> >> >>> > >>> > ------------------------------ > > >> >> >>> > >>> > > > >> >> >>> > >>> > _______________________________________________ > > >> >> >>> > >>> > Pytables-users mailing list > > >> >> >>> > >>> > Pyt...@li... > > >> >> >>> > >>> > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > > >> >> >>> > >>> > ********************************************* > > >> >> >>> > >>> > > > >> >> >>> > >>> -------------- next part -------------- > > >> >> >>> > >>> An HTML attachment was scrubbed... > > >> >> >>> > >>> > > >> >> >>> > >>> ------------------------------ > > >> >> >>> > >>> > > >> >> >>> > >>> Message: 2 > > >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > > >> >> >>> > >>> From: David Reed <dav...@gm...> > > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > > 80, > > >> >> Issue > > >> >> >>> 3 > > >> >> >>> > >>> To: pyt...@li... > > >> >> >>> > >>> Message-ID: > > >> >> >>> > >>> < > > >> >> >>> > >>> > > >> >> CAM...@ma... > > >> >> >>> > > > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >>> > > >> >> >>> > >>> Thanks a lot for the help so far guys! > > >> >> >>> > >>> > > >> >> >>> > >>> Looking at itertools, I found what I believe to be the > > >> perfect > > >> >> >>> function > > >> >> >>> > >>> for > > >> >> >>> > >>> what I need, itertools.combinations. This appears to be a > > >> valid > > >> >> >>> > >>> replacement > > >> >> >>> > >>> to the method proposed. > > >> >> >>> > >>> > > >> >> >>> > >>> There is a small problem that I didn't mention is that my > > >> >> compare > > >> >> >>> > >>> function > > >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like > so: > > >> >> >>> > >>> > > >> >> >>> > >>> D = np.empty((N_irises, N_irises)) > > >> >> >>> > >>> for ii in xrange(N_elements): > > >> >> >>> > >>> for jj in xrange(ii+1, N_elements): > > >> >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], > > >> >> >>> > >>> data['element1'][jj],data['element2'][ii], > > >> >> >>> > >>> data['element2'][jj]) > > >> >> >>> > >>> > > >> >> >>> > >>> Is there an efficient way of using itertools with this > > >> >> structure? > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > > >> >> >>> > >>> pyt...@li...> wrote: > > >> >> >>> > >>> > > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, > visit > > >> >> >>> > >>> > > > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> > or, via email, send a message with subject or body > 'help' > > >> to > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > You can reach the person managing the list at > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > When replying, please edit your Subject line so it is > > more > > >> >> >>> specific > > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > Today's Topics: > > >> >> >>> > >>> > > > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh > > >> Ayers) > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > > > >> >> > > ---------------------------------------------------------------------- > > >> >> >>> > >>> > > > >> >> >>> > >>> > Message: 1 > > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > > >> >> >>> > >>> > From: Josh Ayers <jos...@gm...> > > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 > > >> using > > >> >> >>> PyTables > > >> >> >>> > >>> > To: Discussion list for PyTables > > >> >> >>> > >>> > <pyt...@li...> > > >> >> >>> > >>> > Message-ID: > > >> >> >>> > >>> > < > > >> >> >>> > >>> > > > >> >> >>> > > CAC...@ma... > > >> > > > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >>> > > > >> >> >>> > >>> > David, > > >> >> >>> > >>> > > > >> >> >>> > >>> > The change in issue 27 was only for iteration over a > > >> >> >>> tables.Column > > >> >> >>> > >>> > instance. To use it, tweak Anthony's code as follows. > > >> This > > >> >> will > > >> >> >>> > >>> iterate > > >> >> >>> > >>> > over the "element" column, as in your original example. > > >> >> >>> > >>> > > > >> >> >>> > >>> > Note also that this will only work with the development > > >> >> version > > >> >> >>> of > > >> >> >>> > >>> PyTables > > >> >> >>> > >>> > available on github. It will be very slow using the > > >> released > > >> >> >>> v2.4.0. > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > from itertools import izip > > >> >> >>> > >>> > > > >> >> >>> > >>> > with tb.openFile(...) as f: > > >> >> >>> > >>> > data = f.root.data.cols.element > > >> >> >>> > >>> > data_i = iter(data) > > >> >> >>> > >>> > data_j = iter(data) > > >> >> >>> > >>> > data_i.next() # throw the first value away > > >> >> >>> > >>> > for i, j in izip(data_i, data_j): > > >> >> >>> > >>> > compare(i, j) > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > Hope that helps, > > >> >> >>> > >>> > Josh > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > > >> >> >>> sc...@gm...> > > >> >> >>> > >>> wrote: > > >> >> >>> > >>> > > > >> >> >>> > >>> > > HI David, > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Tables and table column iteration have been > overhauled > > >> >> fairly > > >> >> >>> > >>> recently > > >> >> >>> > >>> > > [1]. So you might try creating two iterators, offset > > by > > >> >> one, > > >> >> >>> and > > >> >> >>> > >>> then > > >> >> >>> > >>> > > doing the comparison. I am hacking this out super > > quick > > >> so > > >> >> >>> please > > >> >> >>> > >>> > forgive > > >> >> >>> > >>> > > me: > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > from itertools import izip > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > with tb.openFile(...) as f: > > >> >> >>> > >>> > > data = f.root.data > > >> >> >>> > >>> > > data_i = iter(data) > > >> >> >>> > >>> > > data_j = iter(data) > > >> >> >>> > >>> > > data_i.next() # throw the first value away > > >> >> >>> > >>> > > for i, j in izip(data_i, data_j): > > >> >> >>> > >>> > > compare(i, j) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > You get the idea ;) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Be Well > > >> >> >>> > >>> > > Anthony > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > > >> >> >>> dav...@gm... > > >> >> >>> > > > > >> >> >>> > >>> > wrote: > > >> >> >>> > >>> > > > > >> >> >>> > >>> > >> I was hoping someone could help me out here. > > >> >> >>> > >>> > >> > > >> >> >>> > >>> > >> This is from a post I put up on StackOverflow, > > >> >> >>> > >>> > >> > > >> >> >>> > >>> > >> I am have a fairly large dataset that I store in > HDF5 > > >> and > > >> >> >>> access > > >> >> >>> > >>> using > > >> >> >>> > >>> > >> PyTables. One operation I need to do on this dataset > > are > > >> >> >>> pairwise > > >> >> >>> > >>> > >> comparisons between each of the elements. This > > requires > > >> 2 > > >> >> >>> loops, > > >> >> >>> > >>> one to > > >> >> >>> > >>> > >> iterate over each element, and an inner loop to > > iterate > > >> >> over > > >> >> >>> every > > >> >> >>> > >>> other > > >> >> >>> > >>> > >> element. This operation thus looks at N(N-1)/2 > > >> comparisons. > > >> >> >>> > >>> > >> > > >> >> >>> > >>> > >> For fairly small sets I found it to be faster to > dump > > >> the > > >> >> >>> contents > > >> >> >>> > >>> into > > >> >> >>> > >>> > a > > >> >> >>> > >>> > >> multdimensional numpy array and then do my > iteration. > > I > > >> run > > >> >> >>> into > > >> >> >>> > >>> > problems > > >> >> >>> > >>> > >> with large sets because of memory issues and need to > > >> access > > >> >> >>> each > > >> >> >>> > >>> > element of > > >> >> >>> > >>> > >> the dataset at run time. > > >> >> >>> > >>> > >> > > >> >> >>> > >>> > >> Putting the elements into an array gives me ... [truncated message content] |
From: Anthony S. <sc...@gm...> - 2013-02-04 16:16:55
|
On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> wrote: > Hi Josh, > > Here is my __iter__ code: > > def __iter__(self): > table = self.table > itemsize = self.dtype.itemsize > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize > max_row = len(self) > for start_row in xrange(0, len(self), nrowsinbuf): > end_row = min([start_row + nrowsinbuf, max_row]) > buf = table.read(start_row, end_row, 1, field=self.pathname) > for row in buf: > yield row > > It does look different, I will try swapping in the code from github and > see what happens. > Yes, please let us know how that goes! Otherwise send the list both the test data generator script and the script that fails. Be Well Anthony > > > On Mon, Feb 4, 2013 at 9:59 AM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 1 Feb 2013 14:08:47 -0800 >> From: Josh Ayers <jos...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> <CACOB4aPG4NZ6b2a3v= >> 1Ue...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> David, >> >> You added a custom version of table.Column.__iter__, correct? Could you >> also include that along with the script to reproduce the error? >> >> It seems like the problem may be in the 'nrowsinbuf' calculation - see >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using the >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6 >> chunks. Instead, it's reading the entire table. >> >> [1]: >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 >> >> >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> >> wrote: >> >> > >> > >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> >> wrote: >> > >> >> at the error: >> >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> >> I'm not sure what that means as a dtype, but thats what it is. >> >> >> >> Forgive me if I'm being totally naive, but I thought the whole point of >> >> __iter__ with pyttables was to do iteration on the fly, so there is no >> >> preallocation. >> >> >> > >> > Nope you are not being naive at all. That is the point. >> > >> > >> >> If you have any ideas on this I'm all ears. >> >> >> > >> > If you could send a minimal script which reproduces this error, that >> would >> > help a lot. >> > >> > Be Well >> > Anthony >> > >> > >> >> >> >> >> >> Thanks again. >> >> >> >> Dave >> >> >> >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < >> >> pyt...@li...> wrote: >> >> >> >>> Send Pytables-users mailing list submissions to >> >>> pyt...@li... >> >>> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> or, via email, send a message with subject or body 'help' to >> >>> pyt...@li... >> >>> >> >>> You can reach the person managing the list at >> >>> pyt...@li... >> >>> >> >>> When replying, please edit your Subject line so it is more specific >> >>> than "Re: Contents of Pytables-users digest..." >> >>> >> >>> >> >>> Today's Topics: >> >>> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >> >>> >> >>> >> >>> ---------------------------------------------------------------------- >> >>> >> >>> Message: 1 >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >>> From: Anthony Scopatz <sc...@gm...> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 >> >>> To: Discussion list for PyTables >> >>> <pyt...@li...> >> >>> Message-ID: >> >>> < >> >>> CAP...@ma...> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> >> >>> wrote: >> >>> >> >>> > Hi Anthony, >> >>> > >> >>> > Thanks for the reply. >> >>> > >> >>> > I honestly don't know how to monitor my Python memory usage, but I'm >> >>> sure >> >>> > that its caused by out of memory. >> >>> > >> >>> >> >>> Well, I would just run top or process monitor or something while >> running >> >>> the python script to see what happens to memory usage as the script >> chugs >> >>> along... >> >>> >> >>> >> >>> > I'm just trying to find out how to fix it. My HDF5 table has 4620 >> >>> rows >> >>> > and the column I'm iterating over is a 17x9600 boolean matrix. The >> >>> > __iter__ method is preallocating an array that is this size which >> >>> appears >> >>> > to be root of the error. I was hoping there is a fix somewhere in >> >>> here to >> >>> > not have to do this preallocation. >> >>> > >> >>> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of >> >>> these is ~760 MB. If you have 2 GB of memory and you are iterating >> over >> >>> 2 >> >>> of these (templates & masks) it is conceivable that you are just >> running >> >>> out of memory. Maybe there is a way that __iter__ could not >> preallocate >> >>> something that is basically a temporary. What is the dtype of the >> >>> templates array? >> >>> >> >>> Be Well >> >>> Anthony >> >>> >> >>> >> >>> > >> >>> > Thanks again. >> >>> >> >>> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 4 Feb 2013 09:58:53 -0500 >> From: David Reed <dav...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 >> To: pyt...@li... >> Message-ID: >> <CAM6XA7= >> h50...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi Anthony, >> >> Sorry to just get back to you. I can send a script, should I send a script >> that creates some fake data as well? >> >> -Dave >> >> >> On Fri, Feb 1, 2013 at 4:50 PM, < >> pyt...@li...> wrote: >> >> > Send Pytables-users mailing list submissions to >> > pyt...@li... >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > or, via email, send a message with subject or body 'help' to >> > pyt...@li... >> > >> > You can reach the person managing the list at >> > pyt...@li... >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Pytables-users digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 >> > From: Anthony Scopatz <sc...@gm...> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > < >> > CAP...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> >> wrote: >> > >> > > at the error: >> > > >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) >> > > >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> > > >> > > I'm not sure what that means as a dtype, but thats what it is. >> > > >> > > Forgive me if I'm being totally naive, but I thought the whole point >> of >> > > __iter__ with pyttables was to do iteration on the fly, so there is no >> > > preallocation. >> > > >> > >> > Nope you are not being naive at all. That is the point. >> > >> > >> > > If you have any ideas on this I'm all ears. >> > > >> > >> > If you could send a minimal script which reproduces this error, that >> would >> > help a lot. >> > >> > Be Well >> > Anthony >> > >> > >> > > >> > > >> > > Thanks again. >> > > >> > > Dave >> > > >> > > >> > > On Fri, Feb 1, 2013 at 3:45 PM, < >> > > pyt...@li...> wrote: >> > > >> > >> Send Pytables-users mailing list submissions to >> > >> pyt...@li... >> > >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> or, via email, send a message with subject or body 'help' to >> > >> pyt...@li... >> > >> >> > >> You can reach the person managing the list at >> > >> pyt...@li... >> > >> >> > >> When replying, please edit your Subject line so it is more specific >> > >> than "Re: Contents of Pytables-users digest..." >> > >> >> > >> >> > >> Today's Topics: >> > >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >> > >> >> > >> >> > >> >> ---------------------------------------------------------------------- >> > >> >> > >> Message: 1 >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> > >> From: Anthony Scopatz <sc...@gm...> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 >> > >> To: Discussion list for PyTables >> > >> <pyt...@li...> >> > >> Message-ID: >> > >> < >> > >> CAP...@ma...> >> > >> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> >> > >> wrote: >> > >> >> > >> > Hi Anthony, >> > >> > >> > >> > Thanks for the reply. >> > >> > >> > >> > I honestly don't know how to monitor my Python memory usage, but >> I'm >> > >> sure >> > >> > that its caused by out of memory. >> > >> > >> > >> >> > >> Well, I would just run top or process monitor or something while >> running >> > >> the python script to see what happens to memory usage as the script >> > chugs >> > >> along... >> > >> >> > >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table has 4620 >> > rows >> > >> > and the column I'm iterating over is a 17x9600 boolean matrix. The >> > >> > __iter__ method is preallocating an array that is this size which >> > >> appears >> > >> > to be root of the error. I was hoping there is a fix somewhere in >> > here >> > >> to >> > >> > not have to do this preallocation. >> > >> > >> > >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 >> of >> > >> these is ~760 MB. If you have 2 GB of memory and you are iterating >> > over 2 >> > >> of these (templates & masks) it is conceivable that you are just >> running >> > >> out of memory. Maybe there is a way that __iter__ could not >> preallocate >> > >> something that is basically a temporary. What is the dtype of the >> > >> templates array? >> > >> >> > >> Be Well >> > >> Anthony >> > >> >> > >> >> > >> > >> > >> > Thanks again. >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < >> > >> > pyt...@li...> wrote: >> > >> > >> > >> >> Send Pytables-users mailing list submissions to >> > >> >> pyt...@li... >> > >> >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> or, via email, send a message with subject or body 'help' to >> > >> >> pyt...@li... >> > >> >> >> > >> >> You can reach the person managing the list at >> > >> >> pyt...@li... >> > >> >> >> > >> >> When replying, please edit your Subject line so it is more >> specific >> > >> >> than "Re: Contents of Pytables-users digest..." >> > >> >> >> > >> >> >> > >> >> Today's Topics: >> > >> >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) >> > >> >> >> > >> >> >> > >> >> >> > ---------------------------------------------------------------------- >> > >> >> >> > >> >> Message: 1 >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> > >> >> From: Anthony Scopatz <sc...@gm...> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> Issue 9 >> > >> >> To: Discussion list for PyTables >> > >> >> <pyt...@li...> >> > >> >> Message-ID: >> > >> >> < >> > >> >> >> CAP...@ma...> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >> > >> >> Hi David, >> > >> >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to be >> getting >> > a >> > >> >> memory error on creating a numpy array. This kind of thing >> typically >> > >> >> happens when you are out of memory. Does this seem to be the case >> > with >> > >> >> you? When this dies, is your memory usage at 100%? If so, this >> > >> algorithm >> > >> >> might require a little tweaking... >> > >> >> >> > >> >> Be Well >> > >> >> Anthony >> > >> >> >> > >> >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < >> dav...@gm...> >> > >> >> wrote: >> > >> >> >> > >> >> > I'm still having problems with this one. I can't tell if this >> > >> something >> > >> >> > dumb Im doing with itertools, or if its something in pytables. >> > >> >> > >> > >> >> > Would appreciate any help. >> > >> >> > >> > >> >> > Thanks >> > >> >> > >> > >> >> > >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < >> > dav...@gm... >> > >> >> >wrote: >> > >> >> > >> > >> >> >> I think I have to reopen this issue. I have been running fine >> for >> > >> >> awhile >> > >> >> >> using the combinations method from itertools, but have recently >> > run >> > >> >> into a >> > >> >> >> memory since I have recently quadrupled the size of the hdf >> file. >> > >> >> >> >> > >> >> >> Here is my code again: >> > >> >> >> >> > >> >> >> from itertools import combinations, izip >> > >> >> >> with tb.openFile(h5_all, 'r') as f: >> > >> >> >> irises = f.root.irises >> > >> >> >> >> > >> >> >> templates = f.root.irises.cols.templates >> > >> >> >> masks = f.root.irises.cols.masks1 >> > >> >> >> >> > >> >> >> N_irises = len(irises) >> > >> >> >> index = np.ones((20 * 480), np.bool) >> > >> >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> > >> >> >> D = np.empty((N_irises, N_irises)) >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, >> > >> masks, >> > >> >> >> range(N_irises)), 2): >> > >> >> >> # print ii >> > >> >> >> D[ii, jj] = ham_dist( >> > >> >> >> t1[8, index], >> > >> >> >> t2[:, index], >> > >> >> >> m1[8, index], >> > >> >> >> m2[:, index], >> > >> >> >> ) >> > >> >> >> >> > >> >> >> And here is the error: >> > >> >> >> >> > >> >> >> In [10]: get_hd3() >> > >> >> >> 10669890 Comparisons >> > >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> --------------------------------------------------------------------------- >> > >> >> >> MemoryError Traceback (most >> recent >> > >> call >> > >> >> >> last) >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> > >> >> >> ----> 1 get_hd3() >> > >> >> >> >> > >> >> >> >> > >> >> >> 118 print '%i Comparisons' % >> > >> (N_irises*(N_irises - >> > >> >> >> 1)/2) >> > >> >> >> 119 D = np.empty((N_irises, N_irises)) >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in >> > >> >> >> combinations(izip(temp >> > >> >> >> lates, masks, range(N_irises)), 2): >> > >> >> >> 121 # print ii >> > >> >> >> 122 D[ii, jj] = ham_dist( >> > >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> __iter__(self) >> > >> >> >> 3274 for start_row in xrange(0, len(self), >> nrowsinbuf): >> > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, >> > max_row]) >> > >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, >> > >> >> >> field=self.pathname) >> > >> >> >> >> > >> >> >> 3277 for row in buf: >> > >> >> >> 3278 yield row >> > >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, >> > start, >> > >> >> stop, >> > >> >> >> step, >> > >> >> >> field) >> > >> >> >> 1772 (start, stop, step) = >> > self._processRangeRead(start, >> > >> >> stop, >> > >> >> >> step) >> > >> >> >> 1773 >> > >> >> >> -> 1774 arr = self._read(start, stop, step, field) >> > >> >> >> 1775 return internal_to_flavor(arr, self.flavor) >> > >> >> >> 1776 >> > >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, >> > start, >> > >> >> >> stop, step, >> > >> >> >> field) >> > >> >> >> 1719 if field: >> > >> >> >> 1720 # Create a container for the results >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> > >> dtype=dtypeField) >> > >> >> >> 1722 else: >> > >> >> >> 1723 # Recarray case >> > >> >> >> >> > >> >> >> MemoryError: >> > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() >> > >> >> >> 1720 # Create a container for the results >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> > >> dtype=dtypeField) >> > >> >> >> 1722 else: >> > >> >> >> >> > >> >> >> Also, if you guys see any performance problems in my code, >> please >> > >> let >> > >> >> me >> > >> >> >> know. >> > >> >> >> >> > >> >> >> Thank you so much for the help. >> > >> >> >> >> > >> >> >> -Dave >> > >> >> >> >> > >> >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> > >> >> >> pyt...@li...> wrote: >> > >> >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to >> > >> >> >>> pyt...@li... >> > >> >> >>> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> > >> >> >>> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> or, via email, send a message with subject or body 'help' to >> > >> >> >>> pyt...@li... >> > >> >> >>> >> > >> >> >>> You can reach the person managing the list at >> > >> >> >>> pyt...@li... >> > >> >> >>> >> > >> >> >>> When replying, please edit your Subject line so it is more >> > specific >> > >> >> >>> than "Re: Contents of Pytables-users digest..." >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> Today's Topics: >> > >> >> >>> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> ---------------------------------------------------------------------- >> > >> >> >>> >> > >> >> >>> Message: 1 >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> > >> >> >>> From: David Reed <dav...@gm...> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> > Issue >> > >> 8 >> > >> >> >>> To: pyt...@li... >> > >> >> >>> Message-ID: >> > >> >> >>> < >> > >> >> >>> >> > CAM...@ma... >> > >> > >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> >> > >> >> >>> I can't thank you guys enough for the help. I was able to add >> > the >> > >> >> >>> __iter__ >> > >> >> >>> function to the table.py file and everything seems to be >> working >> > >> >> great! >> > >> >> >>> I'm not quite as fast as I was with iterating right of a >> matrix >> > >> but >> > >> >> >>> pretty >> > >> >> >>> close. I was at 555 comparisons per second, and now im at >> 420. >> > >> >> >>> >> > >> >> >>> I handled the problem I mentioned earlier by doing this, and >> it >> > >> seems >> > >> >> to >> > >> >> >>> work great: >> > >> >> >>> >> > >> >> >>> A = f.root.data.cols.A >> > >> >> >>> B = f.root.data.cols.B >> > >> >> >>> >> > >> >> >>> D = np.empty((len(A), len(A)) >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, >> > >> >> range(len(A))), >> > >> >> >>> 2): >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> > >> >> >>> >> > >> >> >>> Again, thanks a lot. >> > >> >> >>> >> > >> >> >>> -Dave >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> > >> >> >>> pyt...@li...> wrote: >> > >> >> >>> >> > >> >> >>> > Send Pytables-users mailing list submissions to >> > >> >> >>> > pyt...@li... >> > >> >> >>> > >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> > >> >> >>> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > or, via email, send a message with subject or body 'help' to >> > >> >> >>> > pyt...@li... >> > >> >> >>> > >> > >> >> >>> > You can reach the person managing the list at >> > >> >> >>> > pyt...@li... >> > >> >> >>> > >> > >> >> >>> > When replying, please edit your Subject line so it is more >> > >> specific >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > Today's Topics: >> > >> >> >>> > >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony >> > >> Scopatz) >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony >> > >> Scopatz) >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >> > ---------------------------------------------------------------------- >> > >> >> >>> > >> > >> >> >>> > Message: 1 >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> > >> Issue 3 >> > >> >> >>> > To: Discussion list for PyTables >> > >> >> >>> > <pyt...@li...> >> > >> >> >>> > Message-ID: >> > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> > >> >> >>> > Gz...@ma...> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < >> > >> dav...@gm...> >> > >> >> >>> wrote: >> > >> >> >>> > >> > >> >> >>> > > Thanks a lot for the help so far guys! >> > >> >> >>> > > >> > >> >> >>> > > Looking at itertools, I found what I believe to be the >> > perfect >> > >> >> >>> function >> > >> >> >>> > > for what I need, itertools.combinations. This appears to >> be a >> > >> >> valid >> > >> >> >>> > > replacement to the method proposed. >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> >>> > Yes, combinations is awesome! >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > > >> > >> >> >>> > > There is a small problem that I didn't mention is that my >> > >> compare >> > >> >> >>> > function >> > >> >> >>> > > actually takes as inputs 2 columns from the table. Like >> so: >> > >> >> >>> > > >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) >> > >> >> >>> > > for ii in xrange(N_elements): >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], >> > >> >> >>> > data['element1'][jj],data['element2'][ii], >> > >> >> >>> > > data['element2'][jj]) >> > >> >> >>> > > >> > >> >> >>> > > Is there an efficient way of using itertools with this >> > >> structure? >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> >>> > You can always make two other iterators for each column. >> Since >> > >> you >> > >> >> >>> have >> > >> >> >>> > two columns you would have 4 iterators. I am not sure how >> fast >> > >> >> this is >> > >> >> >>> > going to be but I am confident that there is definitely a >> way >> > to >> > >> do >> > >> >> >>> this in >> > >> >> >>> > one for-loop, which is going to be way faster than nested >> > loops. >> > >> >> >>> > >> > >> >> >>> > Be Well >> > >> >> >>> > Anthony >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> > >> >> >>> > > pyt...@li...> wrote: >> > >> >> >>> > > >> > >> >> >>> > >> Send Pytables-users mailing list submissions to >> > >> >> >>> > >> pyt...@li... >> > >> >> >>> > >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit >> > >> >> >>> > >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >> or, via email, send a message with subject or body >> 'help' to >> > >> >> >>> > >> pyt...@li... >> > >> >> >>> > >> >> > >> >> >>> > >> You can reach the person managing the list at >> > >> >> >>> > >> pyt...@li... >> > >> >> >>> > >> >> > >> >> >>> > >> When replying, please edit your Subject line so it is >> more >> > >> >> specific >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> Today's Topics: >> > >> >> >>> > >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh >> > Ayers) >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> >> > >> >> ---------------------------------------------------------------------- >> > >> >> >>> > >> >> > >> >> >>> > >> Message: 1 >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> using >> > >> >> >>> PyTables >> > >> >> >>> > >> To: Discussion list for PyTables >> > >> >> >>> > >> <pyt...@li...> >> > >> >> >>> > >> Message-ID: >> > >> >> >>> > >> < >> > >> >> >>> > >> >> > >> >> >> CAC...@ma...> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >> >> > >> >> >>> > >> David, >> > >> >> >>> > >> >> > >> >> >>> > >> The change in issue 27 was only for iteration over a >> > >> >> tables.Column >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. >> This >> > >> will >> > >> >> >>> > iterate >> > >> >> >>> > >> over the "element" column, as in your original example. >> > >> >> >>> > >> >> > >> >> >>> > >> Note also that this will only work with the development >> > >> version >> > >> >> of >> > >> >> >>> > >> PyTables >> > >> >> >>> > >> available on github. It will be very slow using the >> > released >> > >> >> >>> v2.4.0. >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> from itertools import izip >> > >> >> >>> > >> >> > >> >> >>> > >> with tb.openFile(...) as f: >> > >> >> >>> > >> data = f.root.data.cols.element >> > >> >> >>> > >> data_i = iter(data) >> > >> >> >>> > >> data_j = iter(data) >> > >> >> >>> > >> data_i.next() # throw the first value away >> > >> >> >>> > >> for i, j in izip(data_i, data_j): >> > >> >> >>> > >> compare(i, j) >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> Hope that helps, >> > >> >> >>> > >> Josh >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> > >> >> sc...@gm...> >> > >> >> >>> > >> wrote: >> > >> >> >>> > >> >> > >> >> >>> > >> > HI David, >> > >> >> >>> > >> > >> > >> >> >>> > >> > Tables and table column iteration have been overhauled >> > >> fairly >> > >> >> >>> recently >> > >> >> >>> > >> > [1]. So you might try creating two iterators, offset >> by >> > >> one, >> > >> >> and >> > >> >> >>> then >> > >> >> >>> > >> > doing the comparison. I am hacking this out super >> quick >> > so >> > >> >> please >> > >> >> >>> > >> forgive >> > >> >> >>> > >> > me: >> > >> >> >>> > >> > >> > >> >> >>> > >> > from itertools import izip >> > >> >> >>> > >> > >> > >> >> >>> > >> > with tb.openFile(...) as f: >> > >> >> >>> > >> > data = f.root.data >> > >> >> >>> > >> > data_i = iter(data) >> > >> >> >>> > >> > data_j = iter(data) >> > >> >> >>> > >> > data_i.next() # throw the first value away >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): >> > >> >> >>> > >> > compare(i, j) >> > >> >> >>> > >> > >> > >> >> >>> > >> > You get the idea ;) >> > >> >> >>> > >> > >> > >> >> >>> > >> > Be Well >> > >> >> >>> > >> > Anthony >> > >> >> >>> > >> > >> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> > >> >> >>> dav...@gm...> >> > >> >> >>> > >> wrote: >> > >> >> >>> > >> > >> > >> >> >>> > >> >> I was hoping someone could help me out here. >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 >> and >> > >> >> access >> > >> >> >>> > using >> > >> >> >>> > >> >> PyTables. One operation I need to do on this dataset >> are >> > >> >> pairwise >> > >> >> >>> > >> >> comparisons between each of the elements. This >> requires 2 >> > >> >> loops, >> > >> >> >>> one >> > >> >> >>> > to >> > >> >> >>> > >> >> iterate over each element, and an inner loop to >> iterate >> > >> over >> > >> >> >>> every >> > >> >> >>> > >> other >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 >> > comparisons. >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster to dump >> the >> > >> >> >>> contents >> > >> >> >>> > >> into a >> > >> >> >>> > >> >> multdimensional numpy array and then do my iteration. >> I >> > run >> > >> >> into >> > >> >> >>> > >> problems >> > >> >> >>> > >> >> with large sets because of memory issues and need to >> > access >> > >> >> each >> > >> >> >>> > >> element of >> > >> >> >>> > >> >> the dataset at run time. >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> Putting the elements into an array gives me about 600 >> > >> >> >>> comparisons per >> > >> >> >>> > >> >> second, while operating on hdf5 data itself gives me >> > about >> > >> 300 >> > >> >> >>> > >> comparisons >> > >> >> >>> > >> >> per second. >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> Is there a way to speed this process up? >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> Example follows (this is not my real code, just an >> > >> example): >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> *Small Set*: >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> > >> >> >>> > >> >> data = f.root.data >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): >> > >> >> >>> > >> >> elements[ii] = data['element'] >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> > >> >> xrange(N_elements): >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], >> elements[jj]) >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> *Large Set*: >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> > >> >> >>> > >> >> data = f.root.data >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> > >> >> >>> > >> >> for ii in xrange(N_elements): >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], >> > >> >> >>> > >> data['element'][jj]) >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> > >> >> HTML5, >> > >> >> >>> CSS, >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep >> your >> > >> >> skills >> > >> >> >>> > current >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials >> by >> > >> >> >>> Microsoft >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn >> more >> > at: >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >> >> _______________________________________________ >> > >> >> >>> > >> >> Pytables-users mailing list >> > >> >> >>> > >> >> Pyt...@li... >> > >> >> >>> > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> > >> >> HTML5, >> > >> >> >>> CSS, >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep >> your >> > >> skills >> > >> >> >>> > current >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials >> by >> > >> >> Microsoft >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more >> > at: >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >> > _______________________________________________ >> > >> >> >>> > >> > Pytables-users mailing list >> > >> >> >>> > >> > Pyt...@li... >> > >> >> >>> > >> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> -------------- next part -------------- >> > >> >> >>> > >> An HTML attachment was scrubbed... >> > >> >> >>> > >> >> > >> >> >>> > >> ------------------------------ >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> > >> HTML5, >> > >> >> >>> CSS, >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> > >> skills >> > >> >> >>> current >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> > >> >> Microsoft >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more >> at: >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >> >> > >> >> >>> > >> ------------------------------ >> > >> >> >>> > >> >> > >> >> >>> > >> _______________________________________________ >> > >> >> >>> > >> Pytables-users mailing list >> > >> >> >>> > >> Pyt...@li... >> > >> >> >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> > >> >> >>> > >> ********************************************* >> > >> >> >>> > >> >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> > >> HTML5, >> > >> >> CSS, >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> > skills >> > >> >> >>> current >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> > >> Microsoft >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more >> at: >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > > _______________________________________________ >> > >> >> >>> > > Pytables-users mailing list >> > >> >> >>> > > Pyt...@li... >> > >> >> >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > -------------- next part -------------- >> > >> >> >>> > An HTML attachment was scrubbed... >> > >> >> >>> > >> > >> >> >>> > ------------------------------ >> > >> >> >>> > >> > >> >> >>> > Message: 2 >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> > >> Issue 4 >> > >> >> >>> > To: Discussion list for PyTables >> > >> >> >>> > <pyt...@li...> >> > >> >> >>> > Message-ID: >> > >> >> >>> > < >> > >> >> >>> > >> > >> CAP...@ma...> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >> > >> >> >>> > Josh is right that you can just edit the code by hand (which >> > >> works >> > >> >> but >> > >> >> >>> > sucks). >> > >> >> >>> > >> > >> >> >>> > However, on Windows -- on the rare occasion when I also >> have to >> > >> >> >>> develop on >> > >> >> >>> > it -- I typically use a distribution that includes a >> compiler, >> > >> >> cython, >> > >> >> >>> > hdf5, and pytables already and then I install my development >> > >> version >> > >> >> >>> from >> > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, >> though >> > >> other >> > >> >> >>> > distributions listed here [1] might also work. >> > >> >> >>> > >> > >> >> >>> > Be well >> > >> >> >>> > Anthony >> > >> >> >>> > >> > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < >> > jos...@gm... >> > >> > >> > >> >> >>> wrote: >> > >> >> >>> > >> > >> >> >>> > > The change was in pure Python code, so you should be able >> to >> > >> just >> > >> >> >>> paste >> > >> >> >>> > in >> > >> >> >>> > > the changes to your local copy. Start with the >> > >> >> table.Column.__iter__ >> > >> >> >>> > > method (lines 3296-3310) here. >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> > >> >> >>> > > >> > >> >> >>> > > It needs to be modified slightly because it uses some >> > >> additional >> > >> >> >>> features >> > >> >> >>> > > that aren't available in the released version (the >> > >> out=buf_slice >> > >> >> >>> argument >> > >> >> >>> > > to table.read). The following should work. >> > >> >> >>> > > >> > >> >> >>> > > def __iter__(self): >> > >> >> >>> > > table = self.table >> > >> >> >>> > > itemsize = self.dtype.itemsize >> > >> >> >>> > > nrowsinbuf = >> table._v_file.params['IO_BUFFER_SIZE'] >> > // >> > >> >> >>> itemsize >> > >> >> >>> > > max_row = len(self) >> > >> >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): >> > >> >> >>> > > end_row = min([start_row + nrowsinbuf, >> max_row]) >> > >> >> >>> > > buf = table.read(start_row, end_row, 1, >> > >> >> >>> field=self.pathname) >> > >> >> >>> > > for row in buf: >> > >> >> >>> > > yield row >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > I haven't tested this, but I think it will work. >> > >> >> >>> > > >> > >> >> >>> > > Josh >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < >> > >> >> dav...@gm...> >> > >> >> >>> > wrote: >> > >> >> >>> > > >> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm >> > forced >> > >> to >> > >> >> >>> work on >> > >> >> >>> > >> Windows 7 at work and have never had luck compiling >> python >> > >> source >> > >> >> >>> > >> successfully. I have had to rely on precompiled binaries >> > and >> > >> now >> > >> >> >>> its >> > >> >> >>> > >> biting me in the butt. >> > >> >> >>> > >> >> > >> >> >>> > >> Is there any quick fix I can do to improve this iteration >> > >> using >> > >> >> >>> v2.4.0? >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> > >> >> >>> > >> pyt...@li...> wrote: >> > >> >> >>> > >> >> > >> >> >>> > >>> Send Pytables-users mailing list submissions to >> > >> >> >>> > >>> pyt...@li... >> > >> >> >>> > >>> >> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, >> visit >> > >> >> >>> > >>> >> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> or, via email, send a message with subject or body >> 'help' >> > to >> > >> >> >>> > >>> pyt...@li... >> > >> >> >>> > >>> >> > >> >> >>> > >>> You can reach the person managing the list at >> > >> >> >>> > >>> pyt...@li... >> > >> >> >>> > >>> >> > >> >> >>> > >>> When replying, please edit your Subject line so it is >> more >> > >> >> specific >> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> Today's Topics: >> > >> >> >>> > >>> >> > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David >> > Reed) >> > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David >> > Reed) >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> >> > >> >> ---------------------------------------------------------------------- >> > >> >> >>> > >>> >> > >> >> >>> > >>> Message: 1 >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> > >> >> >>> > >>> From: David Reed <dav...@gm...> >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> > 80, >> > >> >> Issue >> > >> >> >>> 2 >> > >> >> >>> > >>> To: pyt...@li... >> > >> >> >>> > >>> Message-ID: >> > >> >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> > >> >> >>> > >>> ev...@ma...> >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >>> >> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't >> > think >> > >> >> that >> > >> >> >>> > method >> > >> >> >>> > >>> will work since this will only be comparing the ith >> element >> > >> with >> > >> >> >>> ith+1 >> > >> >> >>> > >>> element. I still need 2 for loops right? >> > >> >> >>> > >>> >> > >> >> >>> > >>> Using itertools might speed things up though, I've never >> > used >> > >> >> them >> > >> >> >>> so I >> > >> >> >>> > >>> will give it a shot and let you know how it goes. Looks >> > >> like I >> > >> >> >>> need to >> > >> >> >>> > >>> download the latest release before I do that too. >> Thanks >> > for >> > >> >> the >> > >> >> >>> help. >> > >> >> >>> > >>> >> > >> >> >>> > >>> -Dave >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> > >> >> >>> > >>> pyt...@li...> wrote: >> > >> >> >>> > >>> >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, >> visit >> > >> >> >>> > >>> > >> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> > or, via email, send a message with subject or body >> 'help' >> > >> to >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > You can reach the person managing the list at >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > When replying, please edit your Subject line so it is >> > more >> > >> >> >>> specific >> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Today's Topics: >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables >> > (Anthony >> > >> >> >>> Scopatz) >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >> > >> >> >> > ---------------------------------------------------------------------- >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Message: 1 >> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> >> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> > >> using >> > >> >> >>> PyTables >> > >> >> >>> > >>> > To: Discussion list for PyTables >> > >> >> >>> > >>> > <pyt...@li...> >> > >> >> >>> > >>> > Message-ID: >> > >> >> >>> > >>> > <CAPk-6T5b= >> > >> >> >>> > >>> > >> 1EG...@ma... >> > > >> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > HI David, >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Tables and table column iteration have been overhauled >> > >> fairly >> > >> >> >>> > recently >> > >> >> >>> > >>> [1]. >> > >> >> >>> > >>> > So you might try creating two iterators, offset by >> one, >> > >> and >> > >> >> then >> > >> >> >>> > >>> doing the >> > >> >> >>> > >>> > comparison. I am hacking this out super quick so >> please >> > >> >> forgive >> > >> >> >>> me: >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > from itertools import izip >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > with tb.openFile(...) as f: >> > >> >> >>> > >>> > data = f.root.data >> > >> >> >>> > >>> > data_i = iter(data) >> > >> >> >>> > >>> > data_j = iter(data) >> > >> >> >>> > >>> > data_i.next() # throw the first value away >> > >> >> >>> > >>> > for i, j in izip(data_i, data_j): >> > >> >> >>> > >>> > compare(i, j) >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > You get the idea ;) >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Be Well >> > >> >> >>> > >>> > Anthony >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> > >> >> >>> dav...@gm...> >> > >> >> >>> > >>> wrote: >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > > I was hoping someone could help me out here. >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > This is from a post I put up on StackOverflow, >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > I am have a fairly large dataset that I store in >> HDF5 >> > and >> > >> >> >>> access >> > >> >> >>> > >>> using >> > >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset >> > are >> > >> >> >>> pairwise >> > >> >> >>> > >>> > > comparisons between each of the elements. This >> > requires 2 >> > >> >> >>> loops, >> > >> >> >>> > one >> > >> >> >>> > >>> to >> > >> >> >>> > >>> > > iterate over each element, and an inner loop to >> iterate >> > >> over >> > >> >> >>> every >> > >> >> >>> > >>> other >> > >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 >> > >> comparisons. >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > For fairly small sets I found it to be faster to >> dump >> > the >> > >> >> >>> contents >> > >> >> >>> > >>> into a >> > >> >> >>> > >>> > > multdimensional numpy array and then do my >> iteration. I >> > >> run >> > >> >> >>> into >> > >> >> >>> > >>> problems >> > >> >> >>> > >>> > > with large sets because of memory issues and need to >> > >> access >> > >> >> >>> each >> > >> >> >>> > >>> element >> > >> >> >>> > >>> > of >> > >> >> >>> > >>> > > the dataset at run time. >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > Putting the elements into an array gives me about >> 600 >> > >> >> >>> comparisons >> > >> >> >>> > per >> > >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me >> > >> about >> > >> >> 300 >> > >> >> >>> > >>> > comparisons >> > >> >> >>> > >>> > > per second. >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > Is there a way to speed this process up? >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > Example follows (this is not my real code, just an >> > >> example): >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > *Small Set*: >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> > >> >> >>> > >>> > > data = f.root.data >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > N_elements = len(data) >> > >> >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > for ii, d in enumerate(data): >> > >> >> >>> > >>> > > elements[ii] = data['element'] >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in >> > >> >> >>> xrange(N_elements): >> > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >>> > > D[ii, jj] = compare(elements[ii], >> elements[jj]) >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > *Large Set*: >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> > >> >> >>> > >>> > > data = f.root.data >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > N_elements = len(data) >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) >> > >> >> >>> > >>> > > for ii in xrange(N_elements): >> > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >>> > > D[ii, jj] = >> compare(data['element'][ii], >> > >> >> >>> > >>> > data['element'][jj]) >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> > 2012, >> > >> >> >>> HTML5, >> > >> >> >>> > CSS, >> > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep >> > your >> > >> >> skills >> > >> >> >>> > >>> current >> > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video >> tutorials >> > by >> > >> >> >>> Microsoft >> > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn >> more >> > >> at: >> > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >>> > > _______________________________________________ >> > >> >> >>> > >>> > > Pytables-users mailing list >> > >> >> >>> > >>> > > Pyt...@li... >> > >> >> >>> > >>> > > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > -------------- next part -------------- >> > >> >> >>> > >>> > An HTML attachment was scrubbed... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > ------------------------------ >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> > >> >> HTML5, >> > >> >> >>> CSS, >> > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep >> your >> > >> >> skills >> > >> >> >>> > current >> > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials >> by >> > >> >> >>> Microsoft >> > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn >> more >> > at: >> > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > ------------------------------ >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > _______________________________________________ >> > >> >> >>> > >>> > Pytables-users mailing list >> > >> >> >>> > >>> > Pyt...@li... >> > >> >> >>> > >>> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> > >> >> >>> > >>> > ********************************************* >> > >> >> >>> > >>> > >> > >> >> >>> > >>> -------------- next part -------------- >> > >> >> >>> > >>> An HTML attachment was scrubbed... >> > >> >> >>> > >>> >> > >> >> >>> > >>> ------------------------------ >> > >> >> >>> > >>> >> > >> >> >>> > >>> Message: 2 >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> > >> >> >>> > >>> From: David Reed <dav...@gm...> >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> > 80, >> > >> >> Issue >> > >> >> >>> 3 >> > >> >> >>> > >>> To: pyt...@li... >> > >> >> >>> > >>> Message-ID: >> > >> >> >>> > >>> < >> > >> >> >>> > >>> >> > >> >> >> CAM...@ma... >> > >> >> >>> > >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >>> >> > >> >> >>> > >>> Thanks a lot for the help so far guys! >> > >> >> >>> > >>> >> > >> >> >>> > >>> Looking at itertools, I found what I believe to be the >> > >> perfect >> > >> >> >>> function >> > >> >> >>> > >>> for >> > >> >> >>> > >>> what I need, itertools.combinations. This appears to be >> a >> > >> valid >> > >> >> >>> > >>> replacement >> > >> >> >>> > >>> to the method proposed. >> > >> >> >>> > >>> >> > >> >> >>> > >>> There is a small problem that I didn't mention is that >> my >> > >> >> compare >> > >> >> >>> > >>> function >> > >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like >> so: >> > >> >> >>> > >>> >> > >> >> >>> > >>> D = np.empty((N_irises, N_irises)) >> > >> >> >>> > >>> for ii in xrange(N_elements): >> > >> >> >>> > >>> for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], >> > >> >> >>> > >>> data['element1'][jj],data['element2'][ii], >> > >> >> >>> > >>> data['element2'][jj]) >> > >> >> >>> > >>> >> > >> >> >>> > >>> Is there an efficient way of using itertools with this >> > >> >> structure? >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> > >> >> >>> > >>> pyt...@li...> wrote: >> > >> >> >>> > >>> >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, >> visit >> > >> >> >>> > >>> > >> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> > or, via email, send a message with subject or body >> 'help' >> > >> to >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > You can reach the person managing the list at >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > When replying, please edit your Subject line so it is >> > more >> > >> >> >>> specific >> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> > >> > > ... > > [Message clipped] > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |