From: David R. <dav...@gm...> - 2013-02-27 20:25:47
|
Thanks for getting back Anthony, So I originally published to this list looking for an efficient way of doing pairwise comparisons on an HDF5 table that had only about 700 elements. You guys sorta guided me in the direction of itertools while also notifying me of a bug fix that was recently pushed which had more efficient iteration. This worked great! and really sped up my comparisons, and I was flying high for quite awhile. Things started breaking though when I upped the # of elements to about 5000. I gave some code that created some sim data and you were getting the same error on your machine. I put this code up as Gist here: https://gist.github.com/dvreed77/fa3060b18257008df383 Again, if you can think of any thing, I'll try to do the leg work as best as I can. On Wed, Feb 27, 2013 at 3:06 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 11 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 27 Feb 2013 14:05:38 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 11 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hi David, > > Sorry about the delay. I have mostly forgotten what exactly this issue > was. I am pretty swamped this week so I could throw out some WAGs but I > don't think I'll be able to do any real work myself on it. > > Be Well > Anthony > > > On Mon, Feb 25, 2013 at 2:15 PM, David Reed <dav...@gm...> > wrote: > > > Anthony, > > > > I've had a chance recently to revisit this problem and am not getting > > anywhere. I was hoping I might be able to get more support in getting > this > > working. If you have some ideas, through them out and I can do the leg > > work and see what I can come up with. > > > > -David > > > > > > On Mon, Feb 4, 2013 at 3:44 PM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 9 (Anthony Scopatz) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Mon, 4 Feb 2013 14:43:37 -0600 > >> From: Anthony Scopatz <sc...@gm...> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 9 > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> < > >> CAP...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> Hey David, > >> > >> I am getting the following error now: > >> > >> scopatz@ares ~ $ python t.py > >> 10669890 Comparisons > >> Traceback (most recent call last): > >> File "t.py", line 61, in <module> > >> get_hd() > >> File "t.py", line 54, in get_hd > >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, > masks, > >> range(N_irises)), 2): > >> File > "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > >> line 3308, in __iter__ > >> out=buf_slice) > >> File > "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > >> line 1807, in read > >> arr = self._read(start, stop, step, field, out) > >> File > "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > >> line 1732, in _read > >> bytes_required)) > >> ValueError: output array size invalid, got 4620 bytes, need 753984000 > >> bytes > >> > >> And I had to change the phasors line to ths following: > >> > >> r['phasors'] = np.empty((17, 20*240), complex) > >> > >> Thanks. > >> Be Well > >> Anthony > >> > >> > >> > >> On Mon, Feb 4, 2013 at 1:41 PM, David Reed <dav...@gm...> > >> wrote: > >> > >> > I didn't have any luck. I replaced that __iter__ function which led > to > >> me > >> > replacing the read function which lead to me replaceing the _read > >> function > >> > and I eventually got another error. > >> > > >> > Below are 2 functions and my HDF5 Table class declaration. They > should > >> be > >> > self explanatory. I wasn't sure if attachments would go through and > >> this > >> > is pretty small, so I figured it would be ok just to post. I > apologize > >> if > >> > this is a bit cluttered. I would also appreciate any comments on how > I > >> > assign the results to the matrix D, this does not seem very pythonic > at > >> all > >> > and could use some advice there if its easy. (the ii*jj is just a > place > >> > holder for a more sophisticated measure). Thanks again! > >> > > >> > import numpy as np > >> > import tables as tb > >> > > >> > class Iris(tb.IsDescription): > >> > subject_id = tb.IntCol() > >> > iris_id = tb.IntCol() > >> > database = tb.StringCol(5) > >> > is_left = tb.BoolCol() > >> > is_flipped = tb.BoolCol() > >> > templates = tb.BoolCol(shape=(17, 20*480)) > >> > masks1 = tb.BoolCol(shape=(17, 20*480)) > >> > phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) > >> > masks2 = tb.BoolCol(shape=(17, 20*240)) > >> > > >> > > >> > def create_hdf5(): > >> > """ > >> > """ > >> > with tb.openFile('test.h5', 'w') as f: > >> > > >> > # Create and fill the table of irises", > >> > irises = f.createTable(f.root, 'irises', Iris, 'Irises', > >> > filters=tb.Filters(1)) > >> > for ii in range(4620): > >> > > >> > r = irises.row > >> > r['subject_id'] = ii > >> > r['iris_id'] = 0 > >> > r['database'] = 'test' > >> > r['is_left'] = True > >> > r['is_flipped'] = False > >> > r['templates'] = np.empty((17, 20*480), np.bool8) > >> > r['masks1'] = np.empty((17, 20*480), np.bool8) > >> > r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) > >> > r['masks2'] = np.empty((17, 20*240), np.bool8) > >> > r.append() > >> > > >> > irises.flush() > >> > > >> > def get_hd(): > >> > """ > >> > """ > >> > from itertools import combinations, izip > >> > with tb.openFile('test.h5') as f: > >> > irises = f.root.irises > >> > > >> > templates = f.root.irises.cols.templates > >> > masks = f.root.irises.cols.masks1 > >> > > >> > N_irises = len(irises) > >> > > >> > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> > D = np.empty((N_irises, N_irises)) > >> > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > >> > range(N_irises)), 2): > >> > D[ii, jj] = ii*jj > >> > > >> > np.save('test', D) > >> > > >> > > >> > > >> > > >> > On Mon, Feb 4, 2013 at 11:16 AM, < > >> > pyt...@li...> wrote: > >> > > >> >> Send Pytables-users mailing list submissions to > >> >> pyt...@li... > >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> or, via email, send a message with subject or body 'help' to > >> >> pyt...@li... > >> >> > >> >> You can reach the person managing the list at > >> >> pyt...@li... > >> >> > >> >> When replying, please edit your Subject line so it is more specific > >> >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> > >> >> Today's Topics: > >> >> > >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) > >> >> > >> >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> Message: 1 > >> >> Date: Mon, 4 Feb 2013 10:16:24 -0600 > >> >> From: Anthony Scopatz <sc...@gm...> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 > >> >> To: Discussion list for PyTables > >> >> <pyt...@li...> > >> >> Message-ID: > >> >> < > >> >> CAP...@ma...> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> > >> >> wrote: > >> >> > >> >> > Hi Josh, > >> >> > > >> >> > Here is my __iter__ code: > >> >> > > >> >> > def __iter__(self): > >> >> > table = self.table > >> >> > itemsize = self.dtype.itemsize > >> >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // > >> itemsize > >> >> > max_row = len(self) > >> >> > for start_row in xrange(0, len(self), nrowsinbuf): > >> >> > end_row = min([start_row + nrowsinbuf, max_row]) > >> >> > buf = table.read(start_row, end_row, 1, > >> field=self.pathname) > >> >> > for row in buf: > >> >> > yield row > >> >> > > >> >> > It does look different, I will try swapping in the code from github > >> and > >> >> > see what happens. > >> >> > > >> >> > >> >> Yes, please let us know how that goes! Otherwise send the list both > >> the > >> >> test data generator script and the script that fails. > >> >> > >> >> Be Well > >> >> Anthony > >> >> > >> >> > >> >> > > >> >> > > >> >> > On Mon, Feb 4, 2013 at 9:59 AM, < > >> >> > pyt...@li...> wrote: > >> >> > > >> >> >> Send Pytables-users mailing list submissions to > >> >> >> pyt...@li... > >> >> >> > >> >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> or, via email, send a message with subject or body 'help' to > >> >> >> pyt...@li... > >> >> >> > >> >> >> You can reach the person managing the list at > >> >> >> pyt...@li... > >> >> >> > >> >> >> When replying, please edit your Subject line so it is more > specific > >> >> >> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> >> > >> >> >> Today's Topics: > >> >> >> > >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) > >> >> >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) > >> >> >> > >> >> >> > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> > >> >> >> Message: 1 > >> >> >> Date: Fri, 1 Feb 2013 14:08:47 -0800 > >> >> >> From: Josh Ayers <jos...@gm...> > >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > Issue 4 > >> >> >> To: Discussion list for PyTables > >> >> >> <pyt...@li...> > >> >> >> Message-ID: > >> >> >> <CACOB4aPG4NZ6b2a3v= > >> >> >> 1Ue...@ma...> > >> >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >> David, > >> >> >> > >> >> >> You added a custom version of table.Column.__iter__, correct? > Could > >> >> you > >> >> >> also include that along with the script to reproduce the error? > >> >> >> > >> >> >> It seems like the problem may be in the 'nrowsinbuf' calculation - > >> see > >> >> >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're > >> using > >> >> the > >> >> >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows > >> of 6 > >> >> >> chunks. Instead, it's reading the entire table. > >> >> >> > >> >> >> [1]: > >> >> >> > >> >> > >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz < > sc...@gm...> > >> >> >> wrote: > >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < > >> dav...@gm...> > >> >> >> wrote: > >> >> >> > > >> >> >> >> at the error: > >> >> >> >> > >> >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> >> >> > >> >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> >> >> > >> >> >> >> I'm not sure what that means as a dtype, but thats what it is. > >> >> >> >> > >> >> >> >> Forgive me if I'm being totally naive, but I thought the whole > >> >> point of > >> >> >> >> __iter__ with pyttables was to do iteration on the fly, so > there > >> is > >> >> no > >> >> >> >> preallocation. > >> >> >> >> > >> >> >> > > >> >> >> > Nope you are not being naive at all. That is the point. > >> >> >> > > >> >> >> > > >> >> >> >> If you have any ideas on this I'm all ears. > >> >> >> >> > >> >> >> > > >> >> >> > If you could send a minimal script which reproduces this error, > >> that > >> >> >> would > >> >> >> > help a lot. > >> >> >> > > >> >> >> > Be Well > >> >> >> > Anthony > >> >> >> > > >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> Thanks again. > >> >> >> >> > >> >> >> >> Dave > >> >> >> >> > >> >> >> >> > >> >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> >> >> pyt...@li...> wrote: > >> >> >> >> > >> >> >> >>> Send Pytables-users mailing list submissions to > >> >> >> >>> pyt...@li... > >> >> >> >>> > >> >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> >>> or, via email, send a message with subject or body 'help' to > >> >> >> >>> pyt...@li... > >> >> >> >>> > >> >> >> >>> You can reach the person managing the list at > >> >> >> >>> pyt...@li... > >> >> >> >>> > >> >> >> >>> When replying, please edit your Subject line so it is more > >> specific > >> >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> Today's Topics: > >> >> >> >>> > >> >> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony > >> Scopatz) > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> > ---------------------------------------------------------------------- > >> >> >> >>> > >> >> >> >>> Message: 1 > >> >> >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >> >> >>> From: Anthony Scopatz <sc...@gm...> > >> >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > >> Issue > >> >> 2 > >> >> >> >>> To: Discussion list for PyTables > >> >> >> >>> <pyt...@li...> > >> >> >> >>> Message-ID: > >> >> >> >>> < > >> >> >> >>> > >> CAP...@ma... > >> >> > > >> >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> >>> > >> >> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > >> >> dav...@gm...> > >> >> >> >>> wrote: > >> >> >> >>> > >> >> >> >>> > Hi Anthony, > >> >> >> >>> > > >> >> >> >>> > Thanks for the reply. > >> >> >> >>> > > >> >> >> >>> > I honestly don't know how to monitor my Python memory usage, > >> but > >> >> I'm > >> >> >> >>> sure > >> >> >> >>> > that its caused by out of memory. > >> >> >> >>> > > >> >> >> >>> > >> >> >> >>> Well, I would just run top or process monitor or something > while > >> >> >> running > >> >> >> >>> the python script to see what happens to memory usage as the > >> script > >> >> >> chugs > >> >> >> >>> along... > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > I'm just trying to find out how to fix it. My HDF5 table > has > >> >> 4620 > >> >> >> >>> rows > >> >> >> >>> > and the column I'm iterating over is a 17x9600 boolean > matrix. > >> >> The > >> >> >> >>> > __iter__ method is preallocating an array that is this size > >> which > >> >> >> >>> appears > >> >> >> >>> > to be root of the error. I was hoping there is a fix > >> somewhere > >> >> in > >> >> >> >>> here to > >> >> >> >>> > not have to do this preallocation. > >> >> >> >>> > > >> >> >> >>> > >> >> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. > >> >> 4620 of > >> >> >> >>> these is ~760 MB. If you have 2 GB of memory and you are > >> iterating > >> >> >> over > >> >> >> >>> 2 > >> >> >> >>> of these (templates & masks) it is conceivable that you are > just > >> >> >> running > >> >> >> >>> out of memory. Maybe there is a way that __iter__ could not > >> >> >> preallocate > >> >> >> >>> something that is basically a temporary. What is the dtype of > >> the > >> >> >> >>> templates array? > >> >> >> >>> > >> >> >> >>> Be Well > >> >> >> >>> Anthony > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > > >> >> >> >>> > Thanks again. > >> >> >> >>> > >> >> >> >>> > >> >> >> -------------- next part -------------- > >> >> >> An HTML attachment was scrubbed... > >> >> >> > >> >> >> ------------------------------ > >> >> >> > >> >> >> Message: 2 > >> >> >> Date: Mon, 4 Feb 2013 09:58:53 -0500 > >> >> >> From: David Reed <dav...@gm...> > >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > Issue 6 > >> >> >> To: pyt...@li... > >> >> >> Message-ID: > >> >> >> <CAM6XA7= > >> >> >> h50...@ma...> > >> >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >> Hi Anthony, > >> >> >> > >> >> >> Sorry to just get back to you. I can send a script, should I send > a > >> >> script > >> >> >> that creates some fake data as well? > >> >> >> > >> >> >> -Dave > >> >> >> > >> >> >> > >> >> >> On Fri, Feb 1, 2013 at 4:50 PM, < > >> >> >> pyt...@li...> wrote: > >> >> >> > >> >> >> > Send Pytables-users mailing list submissions to > >> >> >> > pyt...@li... > >> >> >> > > >> >> >> > To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > or, via email, send a message with subject or body 'help' to > >> >> >> > pyt...@li... > >> >> >> > > >> >> >> > You can reach the person managing the list at > >> >> >> > pyt...@li... > >> >> >> > > >> >> >> > When replying, please edit your Subject line so it is more > >> specific > >> >> >> > than "Re: Contents of Pytables-users digest..." > >> >> >> > > >> >> >> > > >> >> >> > Today's Topics: > >> >> >> > > >> >> >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony > Scopatz) > >> >> >> > > >> >> >> > > >> >> >> > > >> >> > ---------------------------------------------------------------------- > >> >> >> > > >> >> >> > Message: 1 > >> >> >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 > >> >> >> > From: Anthony Scopatz <sc...@gm...> > >> >> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > >> Issue 4 > >> >> >> > To: Discussion list for PyTables > >> >> >> > <pyt...@li...> > >> >> >> > Message-ID: > >> >> >> > < > >> >> >> > > >> CAP...@ma...> > >> >> >> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > > >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < > >> dav...@gm...> > >> >> >> wrote: > >> >> >> > > >> >> >> > > at the error: > >> >> >> > > > >> >> >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> >> > > > >> >> >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> >> > > > >> >> >> > > I'm not sure what that means as a dtype, but thats what it is. > >> >> >> > > > >> >> >> > > Forgive me if I'm being totally naive, but I thought the whole > >> >> point > >> >> >> of > >> >> >> > > __iter__ with pyttables was to do iteration on the fly, so > there > >> >> is no > >> >> >> > > preallocation. > >> >> >> > > > >> >> >> > > >> >> >> > Nope you are not being naive at all. That is the point. > >> >> >> > > >> >> >> > > >> >> >> > > If you have any ideas on this I'm all ears. > >> >> >> > > > >> >> >> > > >> >> >> > If you could send a minimal script which reproduces this error, > >> that > >> >> >> would > >> >> >> > help a lot. > >> >> >> > > >> >> >> > Be Well > >> >> >> > Anthony > >> >> >> > > >> >> >> > > >> >> >> > > > >> >> >> > > > >> >> >> > > Thanks again. > >> >> >> > > > >> >> >> > > Dave > >> >> >> > > > >> >> >> > > > >> >> >> > > On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> >> > > pyt...@li...> wrote: > >> >> >> > > > >> >> >> > >> Send Pytables-users mailing list submissions to > >> >> >> > >> pyt...@li... > >> >> >> > >> > >> >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> > >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> or, via email, send a message with subject or body 'help' to > >> >> >> > >> pyt...@li... > >> >> >> > >> > >> >> >> > >> You can reach the person managing the list at > >> >> >> > >> pyt...@li... > >> >> >> > >> > >> >> >> > >> When replying, please edit your Subject line so it is more > >> >> specific > >> >> >> > >> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> Today's Topics: > >> >> >> > >> > >> >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony > >> Scopatz) > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> > >> > >> >> >> > >> Message: 1 > >> >> >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >> >> > >> From: Anthony Scopatz <sc...@gm...> > >> >> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > >> >> Issue 2 > >> >> >> > >> To: Discussion list for PyTables > >> >> >> > >> <pyt...@li...> > >> >> >> > >> Message-ID: > >> >> >> > >> < > >> >> >> > >> > >> >> CAP...@ma...> > >> >> >> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> > >> >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > >> >> dav...@gm...> > >> >> >> > >> wrote: > >> >> >> > >> > >> >> >> > >> > Hi Anthony, > >> >> >> > >> > > >> >> >> > >> > Thanks for the reply. > >> >> >> > >> > > >> >> >> > >> > I honestly don't know how to monitor my Python memory > usage, > >> but > >> >> >> I'm > >> >> >> > >> sure > >> >> >> > >> > that its caused by out of memory. > >> >> >> > >> > > >> >> >> > >> > >> >> >> > >> Well, I would just run top or process monitor or something > >> while > >> >> >> running > >> >> >> > >> the python script to see what happens to memory usage as the > >> >> script > >> >> >> > chugs > >> >> >> > >> along... > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table > >> has > >> >> 4620 > >> >> >> > rows > >> >> >> > >> > and the column I'm iterating over is a 17x9600 boolean > >> matrix. > >> >> The > >> >> >> > >> > __iter__ method is preallocating an array that is this size > >> >> which > >> >> >> > >> appears > >> >> >> > >> > to be root of the error. I was hoping there is a fix > >> somewhere > >> >> in > >> >> >> > here > >> >> >> > >> to > >> >> >> > >> > not have to do this preallocation. > >> >> >> > >> > > >> >> >> > >> > >> >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. > >> >> 4620 > >> >> >> of > >> >> >> > >> these is ~760 MB. If you have 2 GB of memory and you are > >> >> iterating > >> >> >> > over 2 > >> >> >> > >> of these (templates & masks) it is conceivable that you are > >> just > >> >> >> running > >> >> >> > >> out of memory. Maybe there is a way that __iter__ could not > >> >> >> preallocate > >> >> >> > >> something that is basically a temporary. What is the dtype > of > >> the > >> >> >> > >> templates array? > >> >> >> > >> > >> >> >> > >> Be Well > >> >> >> > >> Anthony > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > > >> >> >> > >> > Thanks again. > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < > >> >> >> > >> > pyt...@li...> wrote: > >> >> >> > >> > > >> >> >> > >> >> Send Pytables-users mailing list submissions to > >> >> >> > >> >> pyt...@li... > >> >> >> > >> >> > >> >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> > >> >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> or, via email, send a message with subject or body 'help' > to > >> >> >> > >> >> pyt...@li... > >> >> >> > >> >> > >> >> >> > >> >> You can reach the person managing the list at > >> >> >> > >> >> pyt...@li... > >> >> >> > >> >> > >> >> >> > >> >> When replying, please edit your Subject line so it is more > >> >> >> specific > >> >> >> > >> >> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > >> >> Today's Topics: > >> >> >> > >> >> > >> >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony > >> >> Scopatz) > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > > >> >> > ---------------------------------------------------------------------- > >> >> >> > >> >> > >> >> >> > >> >> Message: 1 > >> >> >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > >> >> >> > >> >> From: Anthony Scopatz <sc...@gm...> > >> >> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > 80, > >> >> >> Issue 9 > >> >> >> > >> >> To: Discussion list for PyTables > >> >> >> > >> >> <pyt...@li...> > >> >> >> > >> >> Message-ID: > >> >> >> > >> >> < > >> >> >> > >> >> > >> >> >> > CAP...@ma...> > >> >> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> > >> >> >> > >> >> Hi David, > >> >> >> > >> >> > >> >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to > be > >> >> >> getting > >> >> >> > a > >> >> >> > >> >> memory error on creating a numpy array. This kind of > thing > >> >> >> typically > >> >> >> > >> >> happens when you are out of memory. Does this seem to be > >> the > >> >> case > >> >> >> > with > >> >> >> > >> >> you? When this dies, is your memory usage at 100%? If > so, > >> >> this > >> >> >> > >> algorithm > >> >> >> > >> >> might require a little tweaking... > >> >> >> > >> >> > >> >> >> > >> >> Be Well > >> >> >> > >> >> Anthony > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < > >> >> >> dav...@gm...> > >> >> >> > >> >> wrote: > >> >> >> > >> >> > >> >> >> > >> >> > I'm still having problems with this one. I can't tell > if > >> >> this > >> >> >> > >> something > >> >> >> > >> >> > dumb Im doing with itertools, or if its something in > >> >> pytables. > >> >> >> > >> >> > > >> >> >> > >> >> > Would appreciate any help. > >> >> >> > >> >> > > >> >> >> > >> >> > Thanks > >> >> >> > >> >> > > >> >> >> > >> >> > > >> >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < > >> >> >> > dav...@gm... > >> >> >> > >> >> >wrote: > >> >> >> > >> >> > > >> >> >> > >> >> >> I think I have to reopen this issue. I have been > running > >> >> fine > >> >> >> for > >> >> >> > >> >> awhile > >> >> >> > >> >> >> using the combinations method from itertools, but have > >> >> recently > >> >> >> > run > >> >> >> > >> >> into a > >> >> >> > >> >> >> memory since I have recently quadrupled the size of the > >> hdf > >> >> >> file. > >> >> >> > >> >> >> > >> >> >> > >> >> >> Here is my code again: > >> >> >> > >> >> >> > >> >> >> > >> >> >> from itertools import combinations, izip > >> >> >> > >> >> >> with tb.openFile(h5_all, 'r') as f: > >> >> >> > >> >> >> irises = f.root.irises > >> >> >> > >> >> >> > >> >> >> > >> >> >> templates = f.root.irises.cols.templates > >> >> >> > >> >> >> masks = f.root.irises.cols.masks1 > >> >> >> > >> >> >> > >> >> >> > >> >> >> N_irises = len(irises) > >> >> >> > >> >> >> index = np.ones((20 * 480), np.bool) > >> >> >> > >> >> >> > >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> >> >> > >> >> >> D = np.empty((N_irises, N_irises)) > >> >> >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in > >> >> combinations(izip(templates, > >> >> >> > >> masks, > >> >> >> > >> >> >> range(N_irises)), 2): > >> >> >> > >> >> >> # print ii > >> >> >> > >> >> >> D[ii, jj] = ham_dist( > >> >> >> > >> >> >> t1[8, index], > >> >> >> > >> >> >> t2[:, index], > >> >> >> > >> >> >> m1[8, index], > >> >> >> > >> >> >> m2[:, index], > >> >> >> > >> >> >> ) > >> >> >> > >> >> >> > >> >> >> > >> >> >> And here is the error: > >> >> >> > >> >> >> > >> >> >> > >> >> >> In [10]: get_hd3() > >> >> >> > >> >> >> 10669890 Comparisons > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > --------------------------------------------------------------------------- > >> >> >> > >> >> >> MemoryError Traceback > (most > >> >> >> recent > >> >> >> > >> call > >> >> >> > >> >> >> last) > >> >> >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > >> >> >> > >> >> >> ----> 1 get_hd3() > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> 118 print '%i Comparisons' % > >> >> >> > >> (N_irises*(N_irises - > >> >> >> > >> >> >> 1)/2) > >> >> >> > >> >> >> 119 D = np.empty((N_irises, > >> N_irises)) > >> >> >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) > in > >> >> >> > >> >> >> combinations(izip(temp > >> >> >> > >> >> >> lates, masks, range(N_irises)), 2): > >> >> >> > >> >> >> 121 # print ii > >> >> >> > >> >> >> 122 D[ii, jj] = ham_dist( > >> >> >> > >> >> >> > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> >> >> __iter__(self) > >> >> >> > >> >> >> 3274 for start_row in xrange(0, len(self), > >> >> >> nrowsinbuf): > >> >> >> > >> >> >> 3275 end_row = min([start_row + > >> nrowsinbuf, > >> >> >> > max_row]) > >> >> >> > >> >> >> -> 3276 buf = table.read(start_row, > end_row, > >> 1, > >> >> >> > >> >> >> field=self.pathname) > >> >> >> > >> >> >> > >> >> >> > >> >> >> 3277 for row in buf: > >> >> >> > >> >> >> 3278 yield row > >> >> >> > >> >> >> > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> read(self, > >> >> >> > start, > >> >> >> > >> >> stop, > >> >> >> > >> >> >> step, > >> >> >> > >> >> >> field) > >> >> >> > >> >> >> 1772 (start, stop, step) = > >> >> >> > self._processRangeRead(start, > >> >> >> > >> >> stop, > >> >> >> > >> >> >> step) > >> >> >> > >> >> >> 1773 > >> >> >> > >> >> >> -> 1774 arr = self._read(start, stop, step, > >> field) > >> >> >> > >> >> >> 1775 return internal_to_flavor(arr, > >> self.flavor) > >> >> >> > >> >> >> 1776 > >> >> >> > >> >> >> > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> >> _read(self, > >> >> >> > start, > >> >> >> > >> >> >> stop, step, > >> >> >> > >> >> >> field) > >> >> >> > >> >> >> 1719 if field: > >> >> >> > >> >> >> 1720 # Create a container for the > results > >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> >> >> > >> dtype=dtypeField) > >> >> >> > >> >> >> 1722 else: > >> >> >> > >> >> >> 1723 # Recarray case > >> >> >> > >> >> >> > >> >> >> > >> >> >> MemoryError: > >> >> >> > >> >> >> > > >> c:\python27\lib\site-packages\tables\table.py(1721)_read() > >> >> >> > >> >> >> 1720 # Create a container for the > results > >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> >> >> > >> dtype=dtypeField) > >> >> >> > >> >> >> 1722 else: > >> >> >> > >> >> >> > >> >> >> > >> >> >> Also, if you guys see any performance problems in my > >> code, > >> >> >> please > >> >> >> > >> let > >> >> >> > >> >> me > >> >> >> > >> >> >> know. > >> >> >> > >> >> >> > >> >> >> > >> >> >> Thank you so much for the help. > >> >> >> > >> >> >> > >> >> >> > >> >> >> -Dave > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > >> >> >> > >> >> >> pyt...@li...> wrote: > >> >> >> > >> >> >> > >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to > >> >> >> > >> >> >>> pyt...@li... > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, > >> visit > >> >> >> > >> >> >>> > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> or, via email, send a message with subject or body > >> 'help' > >> >> to > >> >> >> > >> >> >>> pyt...@li... > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> You can reach the person managing the list at > >> >> >> > >> >> >>> pyt...@li... > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> When replying, please edit your Subject line so it is > >> more > >> >> >> > specific > >> >> >> > >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> Today's Topics: > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 > (David > >> >> Reed) > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> Message: 1 > >> >> >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > >> >> >> > >> >> >>> From: David Reed <dav...@gm...> > >> >> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, > Vol > >> >> 80, > >> >> >> > Issue > >> >> >> > >> 8 > >> >> >> > >> >> >>> To: pyt...@li... > >> >> >> > >> >> >>> Message-ID: > >> >> >> > >> >> >>> < > >> >> >> > >> >> >>> > >> >> >> > > >> CAM...@ma... > >> >> >> > >> > > >> >> >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> I can't thank you guys enough for the help. I was > able > >> to > >> >> add > >> >> >> > the > >> >> >> > >> >> >>> __iter__ > >> >> >> > >> >> >>> function to the table.py file and everything seems to > be > >> >> >> working > >> >> >> > >> >> great! > >> >> >> > >> >> >>> I'm not quite as fast as I was with iterating right > of > >> a > >> >> >> matrix > >> >> >> > >> but > >> >> >> > >> >> >>> pretty > >> >> >> > >> >> >>> close. I was at 555 comparisons per second, and now > im > >> at > >> >> >> 420. > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> I handled the problem I mentioned earlier by doing > this, > >> >> and > >> >> >> it > >> >> >> > >> seems > >> >> >> > >> >> to > >> >> >> > >> >> >>> work great: > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> A = f.root.data.cols.A > >> >> >> > >> >> >>> B = f.root.data.cols.B > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> D = np.empty((len(A), len(A)) > >> >> >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, > >> B, > >> >> >> > >> >> range(len(A))), > >> >> >> > >> >> >>> 2): > >> >> >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> Again, thanks a lot. > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> -Dave > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > >> >> >> > >> >> >>> pyt...@li...> wrote: > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Send Pytables-users mailing list submissions to > >> >> >> > >> >> >>> > pyt...@li... > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, > >> visit > >> >> >> > >> >> >>> > > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > or, via email, send a message with subject or body > >> >> 'help' to > >> >> >> > >> >> >>> > > pyt...@li... > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > You can reach the person managing the list at > >> >> >> > >> >> >>> > pyt...@li... > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > When replying, please edit your Subject line so it > is > >> >> more > >> >> >> > >> specific > >> >> >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Today's Topics: > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 > >> (Anthony > >> >> >> > >> Scopatz) > >> >> >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 > >> (Anthony > >> >> >> > >> Scopatz) > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> > >> >> >> > > >> >> > ---------------------------------------------------------------------- > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Message: 1 > >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, > >> Vol > >> >> 80, > >> >> >> > >> Issue 3 > >> >> >> > >> >> >>> > To: Discussion list for PyTables > >> >> >> > >> >> >>> > <pyt...@li...> > >> >> >> > >> >> >>> > Message-ID: > >> >> >> > >> >> >>> > > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > >> >> >> > >> >> >>> > Gz...@ma...> > >> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > >> >> >> > >> dav...@gm...> > >> >> >> > >> >> >>> wrote: > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > Thanks a lot for the help so far guys! > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > Looking at itertools, I found what I believe to be > >> the > >> >> >> > perfect > >> >> >> > >> >> >>> function > >> >> >> > >> >> >>> > > for what I need, itertools.combinations. This > >> appears > >> >> to > >> >> >> be a > >> >> >> > >> >> valid > >> >> >> > >> >> >>> > > replacement to the method proposed. > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Yes, combinations is awesome! > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > There is a small problem that I didn't mention is > >> that > >> >> my > >> >> >> > >> compare > >> >> >> > >> >> >>> > function > >> >> >> > >> >> >>> > > actually takes as inputs 2 columns from the table. > >> Like > >> >> >> so: > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) > >> >> >> > >> >> >>> > > for ii in xrange(N_elements): > >> >> >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): > >> >> >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], > >> >> >> > >> >> >>> > data['element1'][jj],data['element2'][ii], > >> >> >> > >> >> >>> > > data['element2'][jj]) > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > Is there an efficient way of using itertools with > >> this > >> >> >> > >> structure? > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > You can always make two other iterators for each > >> column. > >> >> >> Since > >> >> >> > >> you > >> >> >> > >> >> >>> have > >> >> >> > >> >> >>> > two columns you would have 4 iterators. I am not > sure > >> >> how > >> >> >> fast > >> >> >> > >> >> this is > >> >> >> > >> >> >>> > going to be but I am confident that there is > >> definitely a > >> >> >> way > >> >> >> > to > >> >> >> > >> do > >> >> >> > >> >> >>> this in > >> >> >> > >> >> >>> > one for-loop, which is going to be way faster than > >> nested > >> >> >> > loops. > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Be Well > >> >> >> > >> >> >>> > Anthony > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > >> >> >> > >> >> >>> > > pyt...@li...> > >> wrote: > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > >> Send Pytables-users mailing list submissions to > >> >> >> > >> >> >>> > >> pyt...@li... > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide > Web, > >> >> visit > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > >> or, via email, send a message with subject or > body > >> >> >> 'help' to > >> >> >> > >> >> >>> > >> > >> pyt...@li... > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> You can reach the person managing the list at > >> >> >> > >> >> >>> > >> > pyt...@li... > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> When replying, please edit your Subject line so > it > >> is > >> >> >> more > >> >> >> > >> >> specific > >> >> >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> Today's Topics: > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables > >> >> (Josh > >> >> >> > Ayers) > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> >> >> > >> > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> Message: 1 > >> >> >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> >> >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> > >> >> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of > >> HDF5 > >> >> >> using > >> >> >> > >> >> >>> PyTables > >> >> >> > >> >> >>> > >> To: Discussion list for PyTables > >> >> >> > >> >> >>> > >> <pyt...@li...> > >> >> >> > >> >> >>> > >> Message-ID: > >> >> >> > >> >> >>> > >> < > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> > >> >> >> > CAC...@ma...> > >> >> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> David, > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> The change in issue 27 was only for iteration > over > >> a > >> >> >> > >> >> tables.Column > >> >> >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as > >> follows. > >> >> >> This > >> >> >> > >> will > >> >> >> > >> >> >>> > iterate > >> >> >> > >> >> >>> > >> over the "element" column, as in your original > >> >> example. > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> Note also that this will only work with the > >> >> development > >> >> >> > >> version > >> >> >> > >> >> of > >> >> >> > >> >> >>> > >> PyTables > >> >> >> > >> >> >>> > >> available on github. It will be very slow using > >> the > >> >> >> > released > >> >> >> > >> >> >>> v2.4.0. > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> from itertools import izip > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> with tb.openFile(...) as f: > >> >> >> > >> >> >>> > >> data = f.root.data.cols.element > >> >> >> > >> >> >>> > >> data_i = iter(data) > >> >> >> > >> >> >>> > >> data_j = iter(data) > >> >> >> > >> >> >>> > >> data_i.next() # throw the first value away > >> >> >> > >> >> >>> > >> for i, j in izip(data_i, data_j): > >> >> >> > >> >> >>> > >> compare(i, j) > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> Hope that helps, > >> >> >> > >> >> >>> > >> Josh > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> >> >> > >> >> sc...@gm...> > >> >> >> > >> >> >>> > >> wrote: > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > HI David, > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > Tables and table column iteration have been > >> >> overhauled > >> >> >> > >> fairly > >> >> >> > >> >> >>> recently > >> >> >> > >> >> >>> > >> > [1]. So you might try creating two iterators, > >> >> offset > >> >> >> by > >> >> >> > >> one, > >> >> >> > >> >> and > >> >> >> > >> >> >>> then > >> >> >> > >> >> >>> > >> > doing the comparison. I am hacking this out > >> super > >> >> >> quick > >> >> >> > so > >> >> >> > >> >> please > >> >> >> > >> >> >>> > >> forgive > >> >> >> > >> >> >>> > >> > me: > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > from itertools import izip > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > with tb.openFile(...) as f: > >> >> >> > >> >> >>> > >> > data = f.root.data > >> >> >> > >> >> >>> > >> > data_i = iter(data) > >> >> >> > >> >> >>> > >> > data_j = iter(data) > >> >> >> > >> >> >>> > >> > data_i.next() # throw the first value away > >> >> >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): > >> >> >> > >> >> >>> > >> > compare(i, j) > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > You get the idea ;) > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > Be Well > >> >> >> > >> >> >>> > >> > Anthony > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > 1. > >> https://github.com/PyTables/PyTables/issues/27 > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >> >> > >> >> >>> dav...@gm...> > >> >> >> > >> >> >>> > >> wrote: > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> >> I was hoping someone could help me out here. > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store > in > >> >> HDF5 > >> >> >> and > >> >> >> > >> >> access > >> >> >> > >> >> >>> > using > >> >> >> > >> >> >>> > >> >> PyTables. One operation I need to do on this > >> >> dataset > >> >> >> are > >> >> >> > >> >> pairwise > >> >> >> > >> >> >>> > >> >> comparisons between each of the elements. This > >> >> >> requires 2 > >> >> >> > >> >> loops, > >> >> >> > >> >> >>> one > >> >> >> > >> >> >>> > to > >> >> >> > >> >> >>> > >> >> iterate over each element, and an inner loop > to > >> >> >> iterate > >> >> >> > >> over > >> >> >> > >> >> >>> every > >> >> >> > >> >> >>> > >> other > >> >> >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 > >> >> >> > comparisons. > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster > to > >> >> dump > >> >> >> the > >> >> >> > >> >> >>> contents > >> >> >> > >> >> >>> > >> into a > >> >> >> > >> >> >>> > >> >> multdimensional numpy array and then do my > >> >> iteration. > >> >> >> I > >> >> >> > run > >> >> >> > >> >> into > >> >> >> > >> >> >>> > >> problems > >> >> >> > >> >> >>> > >> >> with large sets because of memory issues and > >> need > >> >> to > >> >> >> > access > >> >> >> > >> >> each > >> >> >> > >> >> >>> > >> element of > >> >> >> > >> >> >>> > >> >> the dataset at run time. > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> Putting the elements into an array gives me > >> about > >> >> 600 > >> >> >> > >> >> >>> comparisons per > >> >> >> > >> >> >>> > >> >> second, while operating on hdf5 data itself > >> gives > >> >> me > >> >> >> > about > >> >> >> > >> 300 > >> >> >> > >> >> >>> > >> comparisons > >> >> >> > >> >> >>> > >> >> per second. > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> Is there a way to speed this process up? > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> Example follows (this is not my real code, > just > >> an > >> >> >> > >> example): > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> *Small Set*: > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> >> > >> >> >>> > >> >> data = f.root.data > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): > >> >> >> > >> >> >>> > >> >> elements[ii] = data['element'] > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > >> >> >> > >> >> xrange(N_elements): > >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], > >> >> >> elements[jj]) > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> *Large Set*: > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> >> > >> >> >>> > >> >> data = f.root.data > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) > >> >> >> > >> >> >>> > >> >> for ii in xrange(N_elements): > >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> >> > >> >> >>> > >> >> D[ii, jj] = > >> >> compare(data['element'][ii], > >> >> >> > >> >> >>> > >> data['element'][jj]) > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, > ASP.NET, > >> C# > >> >> >> 2012, > >> >> >> > >> >> HTML5, > >> >> >> > >> >> >>> CSS, > >> >> >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. > >> Keep > >> >> >> your > >> >> >> > >> >> skills > >> >> >> > >> >> >>> > current > >> >> >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video > >> >> tutorials > >> >> >> by > >> >> >> > >> >> >>> Microsoft > >> >> >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- > >> learn > >> >> >> more > >> >> >> > at: > >> >> >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > >> >> >> > >> >> >>> > >> >> > _______________________________________________ > >> >> >> > >> >> >>> > >> >> Pytables-users mailing list > >> >> >> > >> >> >>> > >> >> Pyt...@li... > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET > , > >> C# > >> >> >> 2012, > >> >> >> > >> >> HTML5, > >> >> >> > >> >> >>> CSS, > >> >> >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. > >> Keep > >> >> >> your > >> >> >> > >> skills > >> >> >> > >> >> >>> > current > >> >> >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video > >> >> tutorials > >> >> >> by > >> >> >> > >> >> Microsoft > >> >> >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- > >> learn > >> >> more > >> >> >> > at: > >> >> >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > >> >> >> > >> >> >>> > >> > _______________________________________________ > >> >> >> > >> >> >>> > >> > Pytables-users mailing list > >> >> >> > >> >> >>> > >> > Pyt...@li... > >> >> >> > >> >> >>> > >> > > >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> -------------- next part -------------- > >> >> >> > >> >> >>> > >> An HTML attachment was scrubbed... > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> ------------------------------ > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> >> 2012, > >> >> >> > >> HTML5, > >> >> >> > >> >> >>> CSS, > >> >> >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> >> your > >> >> >> > >> skills > >> >> >> > >> >> >>> current > >> >> >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> >> by > >> >> >> > >> >> Microsoft > >> >> >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- > learn > >> >> more > >> >> >> at: > >> >> >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> ------------------------------ > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> _______________________________________________ > >> >> >> > >> >> >>> > >> Pytables-users mailing list > >> >> >> > >> >> >>> > >> Pyt...@li... > >> >> >> > >> >> >>> > >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> >> >> > >> >> >>> > >> ********************************************* > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> >> 2012, > >> >> >> > >> HTML5, > >> >> >> > >> >> CSS, > >> >> >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> >> your > >> >> >> > skills > >> >> >> > >> >> >>> current > >> >> >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> >> by > >> >> >> > >> Microsoft > >> >> >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn > >> more > >> >> >> at: > >> >> >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >> >> > >> >> >>> > > _______________________________________________ > >> >> >> > >> >> >>> > > Pytables-users mailing list > >> >> >> > >> >> >>> > > Pyt...@li... > >> >> >> > >> >> >>> > > > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > -------------- next part -------------- > >> >> >> > >> >> >>> > An HTML attachment was scrubbed... > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > ------------------------------ > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Message: 2 > >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, > >> Vol > >> >> 80, > >> >> >> > >> Issue 4 > >> >> >> > >> >> >>> > To: Discussion list for PyTables > >> >> >> > >> >> >>> > <pyt...@li...> > >> >> >> > >> >> >>> > Message-ID: > >> >> >> > >> >> >>> > < > >> >> >> > >> >> >>> > > >> >> >> > >> > >> >> CAP...@ma...> > >> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Josh is right that you can just edit the code by > hand > >> >> (which > >> >> >> > >> works > >> >> >> > >> >> but > >> >> >> > >> >> >>> > sucks). > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > However, on Windows -- on the rare occasion when I > >> also > >> >> >> have to > >> >> >> > >> >> >>> develop on > >> >> >> > >> >> >>> > it -- I typically use a distribution that includes a > >> >> >> compiler, > >> >> >> > >> >> cython, > >> >> >> > >> >> >>> > hdf5, and pytables already and then I install my > >> >> development > >> >> >> > >> version > >> >> >> > >> >> >>> from > >> >> >> > >> >> >>> > github OVER this. I recommend either EPD or > Anaconda, > >> >> >> though > >> >> >> > >> other > >> >> >> > >> >> >>> > distributions listed here [1] might also work. > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Be well > >> >> >> > >> >> >>> > Anthony > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > 1. > >> >> http://numfocus.org/projects-2/software-distributions/ > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < > >> >> >> > jos...@gm... > >> >> >> > >> > > >> >> >> > >> >> >>> wrote: > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > The change was in pure Python code, so you should > be > >> >> able > >> >> >> to > >> >> >> > >> just > >> >> >> > >> >> >>> paste > >> >> >> > >> >> >>> > in > >> >> >> > >> >> >>> > > the changes to your local copy. Start with the > >> >> >> > >> >> table.Column.__iter__ > >> >> >> > >> >> >>> > > method (lines 3296-3310) here. > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > It needs to be modified slightly because it uses > >> some > >> >> >> > >> additional > >> >> >> > >> >> >>> features > >> >> >> > >> >> >>> > > that aren't available in the released version (the > >> >> >> > >> out=buf_slice > >> >> >> > >> >> >>> argument > >> >> >> > >> >> >>> > > to table.read). The following should work. > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > def __iter__(self): > >> >> >> > >> >> >>> > > table = self.table > >> >> >> > >> >> >>> > > itemsize = self.dtype.itemsize > >> >> >> > >> >> >>> > > nrowsinbuf = > >> >> >> table._v_file.params['IO_BUFFER_SIZE'] > >> >> >> > // > >> >> >> > >> >> >>> itemsize > >> >> >> > >> >> >>> > > max_row = len(self) > >> >> >> > >> >> >>> > > for start_row in xrange(0, len(self), > >> >> nrow... [truncated message content] |