From: Anthony S. <sc...@gm...> - 2013-02-28 01:38:09
|
On Wed, Feb 27, 2013 at 2:24 PM, David Reed <dav...@gm...> wrote: > Thanks for getting back Anthony, > > So I originally published to this list looking for an efficient way of > doing pairwise comparisons on an HDF5 table that had only about 700 > elements. You guys sorta guided me in the direction of itertools while > also notifying me of a bug fix that was recently pushed which had more > efficient iteration. > > This worked great! and really sped up my comparisons, and I was flying > high for quite awhile. Things started breaking though when I upped the # > of elements to about 5000. I gave some code that created some sim data and > you were getting the same error on your machine. I put this code up as > Gist here: https://gist.github.com/dvreed77/fa3060b18257008df383 > > Again, if you can think of any thing, I'll try to do the leg work as best > as I can. > Ahh. Thanks for the reminder David. One thing I thought of was to maybe change the table chunkshape. I tried setting this to (50,) in the createTable() call, but that was clearly too low of a value. The problem for me seem that byte sizes are far too low. I am seeing the following traceback scopatz@ares ~/Downloads $ python tbl_error.py 10669890 Comparisons Traceback (most recent call last): File "tbl_error.py", line 63, in <module> get_hd() File "tbl_error.py", line 55, in get_hd print c.next() File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 3308, in __iter__ out=buf_slice) File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 1807, in read arr = self._read(start, stop, step, field, out) File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 1732, in _read bytes_required)) ValueError: output array size invalid, got 4620 bytes, need 753984000 bytes This problem is being caused by the fact that the dtype in the __iter__() method on line 3308 of table.py is NOT reading in the shape properly for some reason. Instead of interpreting masks1 as a 17x20*480 column of bools, it is interpreting it as a scalar column of bools. Unfortunately, I don't have time to look into how to fix it. Hopefully, you can! Be Well Anthony > > On Wed, Feb 27, 2013 at 3:06 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 11 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 27 Feb 2013 14:05:38 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 11 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi David, >> >> Sorry about the delay. I have mostly forgotten what exactly this issue >> was. I am pretty swamped this week so I could throw out some WAGs but I >> don't think I'll be able to do any real work myself on it. >> >> Be Well >> Anthony >> >> >> On Mon, Feb 25, 2013 at 2:15 PM, David Reed <dav...@gm...> >> wrote: >> >> > Anthony, >> > >> > I've had a chance recently to revisit this problem and am not getting >> > anywhere. I was hoping I might be able to get more support in getting >> this >> > working. If you have some ideas, through them out and I can do the leg >> > work and see what I can come up with. >> > >> > -David >> > >> > >> > On Mon, Feb 4, 2013 at 3:44 PM, < >> > pyt...@li...> wrote: >> > >> >> Send Pytables-users mailing list submissions to >> >> pyt...@li... >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> or, via email, send a message with subject or body 'help' to >> >> pyt...@li... >> >> >> >> You can reach the person managing the list at >> >> pyt...@li... >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 9 (Anthony Scopatz) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Mon, 4 Feb 2013 14:43:37 -0600 >> >> From: Anthony Scopatz <sc...@gm...> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 9 >> >> To: Discussion list for PyTables >> >> <pyt...@li...> >> >> Message-ID: >> >> < >> >> CAP...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> Hey David, >> >> >> >> I am getting the following error now: >> >> >> >> scopatz@ares ~ $ python t.py >> >> 10669890 Comparisons >> >> Traceback (most recent call last): >> >> File "t.py", line 61, in <module> >> >> get_hd() >> >> File "t.py", line 54, in get_hd >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, >> masks, >> >> range(N_irises)), 2): >> >> File >> "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> >> line 3308, in __iter__ >> >> out=buf_slice) >> >> File >> "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> >> line 1807, in read >> >> arr = self._read(start, stop, step, field, out) >> >> File >> "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> >> line 1732, in _read >> >> bytes_required)) >> >> ValueError: output array size invalid, got 4620 bytes, need 753984000 >> >> bytes >> >> >> >> And I had to change the phasors line to ths following: >> >> >> >> r['phasors'] = np.empty((17, 20*240), complex) >> >> >> >> Thanks. >> >> Be Well >> >> Anthony >> >> >> >> >> >> >> >> On Mon, Feb 4, 2013 at 1:41 PM, David Reed <dav...@gm...> >> >> wrote: >> >> >> >> > I didn't have any luck. I replaced that __iter__ function which led >> to >> >> me >> >> > replacing the read function which lead to me replaceing the _read >> >> function >> >> > and I eventually got another error. >> >> > >> >> > Below are 2 functions and my HDF5 Table class declaration. They >> should >> >> be >> >> > self explanatory. I wasn't sure if attachments would go through and >> >> this >> >> > is pretty small, so I figured it would be ok just to post. I >> apologize >> >> if >> >> > this is a bit cluttered. I would also appreciate any comments on >> how I >> >> > assign the results to the matrix D, this does not seem very pythonic >> at >> >> all >> >> > and could use some advice there if its easy. (the ii*jj is just a >> place >> >> > holder for a more sophisticated measure). Thanks again! >> >> > >> >> > import numpy as np >> >> > import tables as tb >> >> > >> >> > class Iris(tb.IsDescription): >> >> > subject_id = tb.IntCol() >> >> > iris_id = tb.IntCol() >> >> > database = tb.StringCol(5) >> >> > is_left = tb.BoolCol() >> >> > is_flipped = tb.BoolCol() >> >> > templates = tb.BoolCol(shape=(17, 20*480)) >> >> > masks1 = tb.BoolCol(shape=(17, 20*480)) >> >> > phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) >> >> > masks2 = tb.BoolCol(shape=(17, 20*240)) >> >> > >> >> > >> >> > def create_hdf5(): >> >> > """ >> >> > """ >> >> > with tb.openFile('test.h5', 'w') as f: >> >> > >> >> > # Create and fill the table of irises", >> >> > irises = f.createTable(f.root, 'irises', Iris, 'Irises', >> >> > filters=tb.Filters(1)) >> >> > for ii in range(4620): >> >> > >> >> > r = irises.row >> >> > r['subject_id'] = ii >> >> > r['iris_id'] = 0 >> >> > r['database'] = 'test' >> >> > r['is_left'] = True >> >> > r['is_flipped'] = False >> >> > r['templates'] = np.empty((17, 20*480), np.bool8) >> >> > r['masks1'] = np.empty((17, 20*480), np.bool8) >> >> > r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) >> >> > r['masks2'] = np.empty((17, 20*240), np.bool8) >> >> > r.append() >> >> > >> >> > irises.flush() >> >> > >> >> > def get_hd(): >> >> > """ >> >> > """ >> >> > from itertools import combinations, izip >> >> > with tb.openFile('test.h5') as f: >> >> > irises = f.root.irises >> >> > >> >> > templates = f.root.irises.cols.templates >> >> > masks = f.root.irises.cols.masks1 >> >> > >> >> > N_irises = len(irises) >> >> > >> >> > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> > D = np.empty((N_irises, N_irises)) >> >> > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, >> >> > range(N_irises)), 2): >> >> > D[ii, jj] = ii*jj >> >> > >> >> > np.save('test', D) >> >> > >> >> > >> >> > >> >> > >> >> > On Mon, Feb 4, 2013 at 11:16 AM, < >> >> > pyt...@li...> wrote: >> >> > >> >> >> Send Pytables-users mailing list submissions to >> >> >> pyt...@li... >> >> >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> or, via email, send a message with subject or body 'help' to >> >> >> pyt...@li... >> >> >> >> >> >> You can reach the person managing the list at >> >> >> pyt...@li... >> >> >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> >> >> >> Today's Topics: >> >> >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) >> >> >> >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> >> >> Message: 1 >> >> >> Date: Mon, 4 Feb 2013 10:16:24 -0600 >> >> >> From: Anthony Scopatz <sc...@gm...> >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 >> >> >> To: Discussion list for PyTables >> >> >> <pyt...@li...> >> >> >> Message-ID: >> >> >> < >> >> >> CAP...@ma...> >> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >> >> On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> >> >> >> wrote: >> >> >> >> >> >> > Hi Josh, >> >> >> > >> >> >> > Here is my __iter__ code: >> >> >> > >> >> >> > def __iter__(self): >> >> >> > table = self.table >> >> >> > itemsize = self.dtype.itemsize >> >> >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >> >> itemsize >> >> >> > max_row = len(self) >> >> >> > for start_row in xrange(0, len(self), nrowsinbuf): >> >> >> > end_row = min([start_row + nrowsinbuf, max_row]) >> >> >> > buf = table.read(start_row, end_row, 1, >> >> field=self.pathname) >> >> >> > for row in buf: >> >> >> > yield row >> >> >> > >> >> >> > It does look different, I will try swapping in the code from >> github >> >> and >> >> >> > see what happens. >> >> >> > >> >> >> >> >> >> Yes, please let us know how that goes! Otherwise send the list both >> >> the >> >> >> test data generator script and the script that fails. >> >> >> >> >> >> Be Well >> >> >> Anthony >> >> >> >> >> >> >> >> >> > >> >> >> > >> >> >> > On Mon, Feb 4, 2013 at 9:59 AM, < >> >> >> > pyt...@li...> wrote: >> >> >> > >> >> >> >> Send Pytables-users mailing list submissions to >> >> >> >> pyt...@li... >> >> >> >> >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> or, via email, send a message with subject or body 'help' to >> >> >> >> pyt...@li... >> >> >> >> >> >> >> >> You can reach the person managing the list at >> >> >> >> pyt...@li... >> >> >> >> >> >> >> >> When replying, please edit your Subject line so it is more >> specific >> >> >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> >> >> >> >> >> >> Today's Topics: >> >> >> >> >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) >> >> >> >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) >> >> >> >> >> >> >> >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> >> >> >> >> Message: 1 >> >> >> >> Date: Fri, 1 Feb 2013 14:08:47 -0800 >> >> >> >> From: Josh Ayers <jos...@gm...> >> >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> Issue 4 >> >> >> >> To: Discussion list for PyTables >> >> >> >> <pyt...@li...> >> >> >> >> Message-ID: >> >> >> >> <CACOB4aPG4NZ6b2a3v= >> >> >> >> 1Ue...@ma...> >> >> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >> >> >> >> David, >> >> >> >> >> >> >> >> You added a custom version of table.Column.__iter__, correct? >> Could >> >> >> you >> >> >> >> also include that along with the script to reproduce the error? >> >> >> >> >> >> >> >> It seems like the problem may be in the 'nrowsinbuf' calculation >> - >> >> see >> >> >> >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're >> >> using >> >> >> the >> >> >> >> default 1MB value for IO_BUFFER_SIZE, it should be reading in >> rows >> >> of 6 >> >> >> >> chunks. Instead, it's reading the entire table. >> >> >> >> >> >> >> >> [1]: >> >> >> >> >> >> >> >> >> >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz < >> sc...@gm...> >> >> >> >> wrote: >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < >> >> dav...@gm...> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> >> at the error: >> >> >> >> >> >> >> >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> >> >> >> >> >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> >> >> >> >> >> >> >> I'm not sure what that means as a dtype, but thats what it is. >> >> >> >> >> >> >> >> >> >> Forgive me if I'm being totally naive, but I thought the whole >> >> >> point of >> >> >> >> >> __iter__ with pyttables was to do iteration on the fly, so >> there >> >> is >> >> >> no >> >> >> >> >> preallocation. >> >> >> >> >> >> >> >> >> > >> >> >> >> > Nope you are not being naive at all. That is the point. >> >> >> >> > >> >> >> >> > >> >> >> >> >> If you have any ideas on this I'm all ears. >> >> >> >> >> >> >> >> >> > >> >> >> >> > If you could send a minimal script which reproduces this error, >> >> that >> >> >> >> would >> >> >> >> > help a lot. >> >> >> >> > >> >> >> >> > Be Well >> >> >> >> > Anthony >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Thanks again. >> >> >> >> >> >> >> >> >> >> Dave >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < >> >> >> >> >> pyt...@li...> wrote: >> >> >> >> >> >> >> >> >> >>> Send Pytables-users mailing list submissions to >> >> >> >> >>> pyt...@li... >> >> >> >> >>> >> >> >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> >>> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >>> or, via email, send a message with subject or body 'help' to >> >> >> >> >>> pyt...@li... >> >> >> >> >>> >> >> >> >> >>> You can reach the person managing the list at >> >> >> >> >>> pyt...@li... >> >> >> >> >>> >> >> >> >> >>> When replying, please edit your Subject line so it is more >> >> specific >> >> >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> Today's Topics: >> >> >> >> >>> >> >> >> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony >> >> Scopatz) >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> >>> >> >> >> >> >>> Message: 1 >> >> >> >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >> >> >> >>> From: Anthony Scopatz <sc...@gm...> >> >> >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> >> Issue >> >> >> 2 >> >> >> >> >>> To: Discussion list for PyTables >> >> >> >> >>> <pyt...@li...> >> >> >> >> >>> Message-ID: >> >> >> >> >>> < >> >> >> >> >>> >> >> CAP...@ma... >> >> >> > >> >> >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >>> >> >> >> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < >> >> >> dav...@gm...> >> >> >> >> >>> wrote: >> >> >> >> >>> >> >> >> >> >>> > Hi Anthony, >> >> >> >> >>> > >> >> >> >> >>> > Thanks for the reply. >> >> >> >> >>> > >> >> >> >> >>> > I honestly don't know how to monitor my Python memory >> usage, >> >> but >> >> >> I'm >> >> >> >> >>> sure >> >> >> >> >>> > that its caused by out of memory. >> >> >> >> >>> > >> >> >> >> >>> >> >> >> >> >>> Well, I would just run top or process monitor or something >> while >> >> >> >> running >> >> >> >> >>> the python script to see what happens to memory usage as the >> >> script >> >> >> >> chugs >> >> >> >> >>> along... >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> > I'm just trying to find out how to fix it. My HDF5 table >> has >> >> >> 4620 >> >> >> >> >>> rows >> >> >> >> >>> > and the column I'm iterating over is a 17x9600 boolean >> matrix. >> >> >> The >> >> >> >> >>> > __iter__ method is preallocating an array that is this size >> >> which >> >> >> >> >>> appears >> >> >> >> >>> > to be root of the error. I was hoping there is a fix >> >> somewhere >> >> >> in >> >> >> >> >>> here to >> >> >> >> >>> > not have to do this preallocation. >> >> >> >> >>> > >> >> >> >> >>> >> >> >> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. >> >> >> 4620 of >> >> >> >> >>> these is ~760 MB. If you have 2 GB of memory and you are >> >> iterating >> >> >> >> over >> >> >> >> >>> 2 >> >> >> >> >>> of these (templates & masks) it is conceivable that you are >> just >> >> >> >> running >> >> >> >> >>> out of memory. Maybe there is a way that __iter__ could not >> >> >> >> preallocate >> >> >> >> >>> something that is basically a temporary. What is the dtype >> of >> >> the >> >> >> >> >>> templates array? >> >> >> >> >>> >> >> >> >> >>> Be Well >> >> >> >> >>> Anthony >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> > >> >> >> >> >>> > Thanks again. >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> -------------- next part -------------- >> >> >> >> An HTML attachment was scrubbed... >> >> >> >> >> >> >> >> ------------------------------ >> >> >> >> >> >> >> >> Message: 2 >> >> >> >> Date: Mon, 4 Feb 2013 09:58:53 -0500 >> >> >> >> From: David Reed <dav...@gm...> >> >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> Issue 6 >> >> >> >> To: pyt...@li... >> >> >> >> Message-ID: >> >> >> >> <CAM6XA7= >> >> >> >> h50...@ma...> >> >> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >> >> >> >> Hi Anthony, >> >> >> >> >> >> >> >> Sorry to just get back to you. I can send a script, should I >> send a >> >> >> script >> >> >> >> that creates some fake data as well? >> >> >> >> >> >> >> >> -Dave >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 4:50 PM, < >> >> >> >> pyt...@li...> wrote: >> >> >> >> >> >> >> >> > Send Pytables-users mailing list submissions to >> >> >> >> > pyt...@li... >> >> >> >> > >> >> >> >> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > or, via email, send a message with subject or body 'help' to >> >> >> >> > pyt...@li... >> >> >> >> > >> >> >> >> > You can reach the person managing the list at >> >> >> >> > pyt...@li... >> >> >> >> > >> >> >> >> > When replying, please edit your Subject line so it is more >> >> specific >> >> >> >> > than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >> >> > >> >> >> >> > Today's Topics: >> >> >> >> > >> >> >> >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony >> Scopatz) >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >> >> > Message: 1 >> >> >> >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 >> >> >> >> > From: Anthony Scopatz <sc...@gm...> >> >> >> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> >> Issue 4 >> >> >> >> > To: Discussion list for PyTables >> >> >> >> > <pyt...@li...> >> >> >> >> > Message-ID: >> >> >> >> > < >> >> >> >> > >> >> CAP...@ma...> >> >> >> >> > Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < >> >> dav...@gm...> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> > > at the error: >> >> >> >> > > >> >> >> >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> >> > > >> >> >> >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> >> > > >> >> >> >> > > I'm not sure what that means as a dtype, but thats what it >> is. >> >> >> >> > > >> >> >> >> > > Forgive me if I'm being totally naive, but I thought the >> whole >> >> >> point >> >> >> >> of >> >> >> >> > > __iter__ with pyttables was to do iteration on the fly, so >> there >> >> >> is no >> >> >> >> > > preallocation. >> >> >> >> > > >> >> >> >> > >> >> >> >> > Nope you are not being naive at all. That is the point. >> >> >> >> > >> >> >> >> > >> >> >> >> > > If you have any ideas on this I'm all ears. >> >> >> >> > > >> >> >> >> > >> >> >> >> > If you could send a minimal script which reproduces this error, >> >> that >> >> >> >> would >> >> >> >> > help a lot. >> >> >> >> > >> >> >> >> > Be Well >> >> >> >> > Anthony >> >> >> >> > >> >> >> >> > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > Thanks again. >> >> >> >> > > >> >> >> >> > > Dave >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > On Fri, Feb 1, 2013 at 3:45 PM, < >> >> >> >> > > pyt...@li...> wrote: >> >> >> >> > > >> >> >> >> > >> Send Pytables-users mailing list submissions to >> >> >> >> > >> pyt...@li... >> >> >> >> > >> >> >> >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> > >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> or, via email, send a message with subject or body 'help' to >> >> >> >> > >> pyt...@li... >> >> >> >> > >> >> >> >> >> > >> You can reach the person managing the list at >> >> >> >> > >> pyt...@li... >> >> >> >> > >> >> >> >> >> > >> When replying, please edit your Subject line so it is more >> >> >> specific >> >> >> >> > >> than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> Today's Topics: >> >> >> >> > >> >> >> >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony >> >> Scopatz) >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >> >> >> > >> Message: 1 >> >> >> >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >> >> >> > >> From: Anthony Scopatz <sc...@gm...> >> >> >> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> >> >> Issue 2 >> >> >> >> > >> To: Discussion list for PyTables >> >> >> >> > >> <pyt...@li...> >> >> >> >> > >> Message-ID: >> >> >> >> > >> < >> >> >> >> > >> >> >> >> CAP...@ma...> >> >> >> >> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >> >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < >> >> >> dav...@gm...> >> >> >> >> > >> wrote: >> >> >> >> > >> >> >> >> >> > >> > Hi Anthony, >> >> >> >> > >> > >> >> >> >> > >> > Thanks for the reply. >> >> >> >> > >> > >> >> >> >> > >> > I honestly don't know how to monitor my Python memory >> usage, >> >> but >> >> >> >> I'm >> >> >> >> > >> sure >> >> >> >> > >> > that its caused by out of memory. >> >> >> >> > >> > >> >> >> >> > >> >> >> >> >> > >> Well, I would just run top or process monitor or something >> >> while >> >> >> >> running >> >> >> >> > >> the python script to see what happens to memory usage as the >> >> >> script >> >> >> >> > chugs >> >> >> >> > >> along... >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table >> >> has >> >> >> 4620 >> >> >> >> > rows >> >> >> >> > >> > and the column I'm iterating over is a 17x9600 boolean >> >> matrix. >> >> >> The >> >> >> >> > >> > __iter__ method is preallocating an array that is this >> size >> >> >> which >> >> >> >> > >> appears >> >> >> >> > >> > to be root of the error. I was hoping there is a fix >> >> somewhere >> >> >> in >> >> >> >> > here >> >> >> >> > >> to >> >> >> >> > >> > not have to do this preallocation. >> >> >> >> > >> > >> >> >> >> > >> >> >> >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in >> space. >> >> >> 4620 >> >> >> >> of >> >> >> >> > >> these is ~760 MB. If you have 2 GB of memory and you are >> >> >> iterating >> >> >> >> > over 2 >> >> >> >> > >> of these (templates & masks) it is conceivable that you are >> >> just >> >> >> >> running >> >> >> >> > >> out of memory. Maybe there is a way that __iter__ could not >> >> >> >> preallocate >> >> >> >> > >> something that is basically a temporary. What is the dtype >> of >> >> the >> >> >> >> > >> templates array? >> >> >> >> > >> >> >> >> >> > >> Be Well >> >> >> >> > >> Anthony >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> > >> >> >> >> > >> > Thanks again. >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < >> >> >> >> > >> > pyt...@li...> wrote: >> >> >> >> > >> > >> >> >> >> > >> >> Send Pytables-users mailing list submissions to >> >> >> >> > >> >> pyt...@li... >> >> >> >> > >> >> >> >> >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> > >> >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> or, via email, send a message with subject or body >> 'help' to >> >> >> >> > >> >> pyt...@li... >> >> >> >> > >> >> >> >> >> >> > >> >> You can reach the person managing the list at >> >> >> >> > >> >> pyt...@li... >> >> >> >> > >> >> >> >> >> >> > >> >> When replying, please edit your Subject line so it is >> more >> >> >> >> specific >> >> >> >> > >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> Today's Topics: >> >> >> >> > >> >> >> >> >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony >> >> >> Scopatz) >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >> >> >> >> > >> >> Message: 1 >> >> >> >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> >> >> >> > >> >> From: Anthony Scopatz <sc...@gm...> >> >> >> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> 80, >> >> >> >> Issue 9 >> >> >> >> > >> >> To: Discussion list for PyTables >> >> >> >> > >> >> <pyt...@li...> >> >> >> >> > >> >> Message-ID: >> >> >> >> > >> >> < >> >> >> >> > >> >> >> >> >> >> >> CAP...@ma...> >> >> >> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >> >> >> >> > >> >> Hi David, >> >> >> >> > >> >> >> >> >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem >> to be >> >> >> >> getting >> >> >> >> > a >> >> >> >> > >> >> memory error on creating a numpy array. This kind of >> thing >> >> >> >> typically >> >> >> >> > >> >> happens when you are out of memory. Does this seem to be >> >> the >> >> >> case >> >> >> >> > with >> >> >> >> > >> >> you? When this dies, is your memory usage at 100%? If >> so, >> >> >> this >> >> >> >> > >> algorithm >> >> >> >> > >> >> might require a little tweaking... >> >> >> >> > >> >> >> >> >> >> > >> >> Be Well >> >> >> >> > >> >> Anthony >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < >> >> >> >> dav...@gm...> >> >> >> >> > >> >> wrote: >> >> >> >> > >> >> >> >> >> >> > >> >> > I'm still having problems with this one. I can't tell >> if >> >> >> this >> >> >> >> > >> something >> >> >> >> > >> >> > dumb Im doing with itertools, or if its something in >> >> >> pytables. >> >> >> >> > >> >> > >> >> >> >> > >> >> > Would appreciate any help. >> >> >> >> > >> >> > >> >> >> >> > >> >> > Thanks >> >> >> >> > >> >> > >> >> >> >> > >> >> > >> >> >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < >> >> >> >> > dav...@gm... >> >> >> >> > >> >> >wrote: >> >> >> >> > >> >> > >> >> >> >> > >> >> >> I think I have to reopen this issue. I have been >> running >> >> >> fine >> >> >> >> for >> >> >> >> > >> >> awhile >> >> >> >> > >> >> >> using the combinations method from itertools, but have >> >> >> recently >> >> >> >> > run >> >> >> >> > >> >> into a >> >> >> >> > >> >> >> memory since I have recently quadrupled the size of >> the >> >> hdf >> >> >> >> file. >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> Here is my code again: >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> from itertools import combinations, izip >> >> >> >> > >> >> >> with tb.openFile(h5_all, 'r') as f: >> >> >> >> > >> >> >> irises = f.root.irises >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> templates = f.root.irises.cols.templates >> >> >> >> > >> >> >> masks = f.root.irises.cols.masks1 >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> N_irises = len(irises) >> >> >> >> > >> >> >> index = np.ones((20 * 480), np.bool) >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> >> >> > >> >> >> D = np.empty((N_irises, N_irises)) >> >> >> >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in >> >> >> combinations(izip(templates, >> >> >> >> > >> masks, >> >> >> >> > >> >> >> range(N_irises)), 2): >> >> >> >> > >> >> >> # print ii >> >> >> >> > >> >> >> D[ii, jj] = ham_dist( >> >> >> >> > >> >> >> t1[8, index], >> >> >> >> > >> >> >> t2[:, index], >> >> >> >> > >> >> >> m1[8, index], >> >> >> >> > >> >> >> m2[:, index], >> >> >> >> > >> >> >> ) >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> And here is the error: >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> In [10]: get_hd3() >> >> >> >> > >> >> >> 10669890 Comparisons >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------------- >> >> >> >> > >> >> >> MemoryError Traceback >> (most >> >> >> >> recent >> >> >> >> > >> call >> >> >> >> > >> >> >> last) >> >> >> >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> >> >> >> > >> >> >> ----> 1 get_hd3() >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> 118 print '%i Comparisons' % >> >> >> >> > >> (N_irises*(N_irises - >> >> >> >> > >> >> >> 1)/2) >> >> >> >> > >> >> >> 119 D = np.empty((N_irises, >> >> N_irises)) >> >> >> >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, >> jj) in >> >> >> >> > >> >> >> combinations(izip(temp >> >> >> >> > >> >> >> lates, masks, range(N_irises)), 2): >> >> >> >> > >> >> >> 121 # print ii >> >> >> >> > >> >> >> 122 D[ii, jj] = ham_dist( >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> >> >> __iter__(self) >> >> >> >> > >> >> >> 3274 for start_row in xrange(0, len(self), >> >> >> >> nrowsinbuf): >> >> >> >> > >> >> >> 3275 end_row = min([start_row + >> >> nrowsinbuf, >> >> >> >> > max_row]) >> >> >> >> > >> >> >> -> 3276 buf = table.read(start_row, >> end_row, >> >> 1, >> >> >> >> > >> >> >> field=self.pathname) >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> 3277 for row in buf: >> >> >> >> > >> >> >> 3278 yield row >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> read(self, >> >> >> >> > start, >> >> >> >> > >> >> stop, >> >> >> >> > >> >> >> step, >> >> >> >> > >> >> >> field) >> >> >> >> > >> >> >> 1772 (start, stop, step) = >> >> >> >> > self._processRangeRead(start, >> >> >> >> > >> >> stop, >> >> >> >> > >> >> >> step) >> >> >> >> > >> >> >> 1773 >> >> >> >> > >> >> >> -> 1774 arr = self._read(start, stop, step, >> >> field) >> >> >> >> > >> >> >> 1775 return internal_to_flavor(arr, >> >> self.flavor) >> >> >> >> > >> >> >> 1776 >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> >> _read(self, >> >> >> >> > start, >> >> >> >> > >> >> >> stop, step, >> >> >> >> > >> >> >> field) >> >> >> >> > >> >> >> 1719 if field: >> >> >> >> > >> >> >> 1720 # Create a container for the >> results >> >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> >> >> >> > >> dtype=dtypeField) >> >> >> >> > >> >> >> 1722 else: >> >> >> >> > >> >> >> 1723 # Recarray case >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> MemoryError: >> >> >> >> > >> >> >> > >> >> c:\python27\lib\site-packages\tables\table.py(1721)_read() >> >> >> >> > >> >> >> 1720 # Create a container for the >> results >> >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> >> >> >> > >> dtype=dtypeField) >> >> >> >> > >> >> >> 1722 else: >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> Also, if you guys see any performance problems in my >> >> code, >> >> >> >> please >> >> >> >> > >> let >> >> >> >> > >> >> me >> >> >> >> > >> >> >> know. >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> Thank you so much for the help. >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> -Dave >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> >> >> >> > >> >> >> pyt...@li...> wrote: >> >> >> >> > >> >> >> >> >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to >> >> >> >> > >> >> >>> pyt...@li... >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, >> >> visit >> >> >> >> > >> >> >>> >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> or, via email, send a message with subject or body >> >> 'help' >> >> >> to >> >> >> >> > >> >> >>> pyt...@li... >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> You can reach the person managing the list at >> >> >> >> > >> >> >>> pyt...@li... >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> When replying, please edit your Subject line so it is >> >> more >> >> >> >> > specific >> >> >> >> > >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> Today's Topics: >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 >> (David >> >> >> Reed) >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> Message: 1 >> >> >> >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> >> >> >> > >> >> >>> From: David Reed <dav...@gm...> >> >> >> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, >> Vol >> >> >> 80, >> >> >> >> > Issue >> >> >> >> > >> 8 >> >> >> >> > >> >> >>> To: pyt...@li... >> >> >> >> > >> >> >>> Message-ID: >> >> >> >> > >> >> >>> < >> >> >> >> > >> >> >>> >> >> >> >> > >> >> CAM...@ma... >> >> >> >> > >> > >> >> >> >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> I can't thank you guys enough for the help. I was >> able >> >> to >> >> >> add >> >> >> >> > the >> >> >> >> > >> >> >>> __iter__ >> >> >> >> > >> >> >>> function to the table.py file and everything seems >> to be >> >> >> >> working >> >> >> >> > >> >> great! >> >> >> >> > >> >> >>> I'm not quite as fast as I was with iterating right >> of >> >> a >> >> >> >> matrix >> >> >> >> > >> but >> >> >> >> > >> >> >>> pretty >> >> >> >> > >> >> >>> close. I was at 555 comparisons per second, and now >> im >> >> at >> >> >> >> 420. >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> I handled the problem I mentioned earlier by doing >> this, >> >> >> and >> >> >> >> it >> >> >> >> > >> seems >> >> >> >> > >> >> to >> >> >> >> > >> >> >>> work great: >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> A = f.root.data.cols.A >> >> >> >> > >> >> >>> B = f.root.data.cols.B >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> D = np.empty((len(A), len(A)) >> >> >> >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in >> combinations(izip(A, >> >> B, >> >> >> >> > >> >> range(len(A))), >> >> >> >> > >> >> >>> 2): >> >> >> >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> Again, thanks a lot. >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> -Dave >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> >> >> >> > >> >> >>> pyt...@li...> wrote: >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> > Send Pytables-users mailing list submissions to >> >> >> >> > >> >> >>> > pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, >> >> visit >> >> >> >> > >> >> >>> > >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > or, via email, send a message with subject or body >> >> >> 'help' to >> >> >> >> > >> >> >>> > >> pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > You can reach the person managing the list at >> >> >> >> > >> >> >>> > pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > When replying, please edit your Subject line so it >> is >> >> >> more >> >> >> >> > >> specific >> >> >> >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Today's Topics: >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 >> >> (Anthony >> >> >> >> > >> Scopatz) >> >> >> >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 >> >> (Anthony >> >> >> >> > >> Scopatz) >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Message: 1 >> >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users >> Digest, >> >> Vol >> >> >> 80, >> >> >> >> > >> Issue 3 >> >> >> >> > >> >> >>> > To: Discussion list for PyTables >> >> >> >> > >> >> >>> > <pyt...@li...> >> >> >> >> > >> >> >>> > Message-ID: >> >> >> >> > >> >> >>> > >> <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> >> >> >> > >> >> >>> > Gz...@ma...> >> >> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < >> >> >> >> > >> dav...@gm...> >> >> >> >> > >> >> >>> wrote: >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > > Thanks a lot for the help so far guys! >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > Looking at itertools, I found what I believe to >> be >> >> the >> >> >> >> > perfect >> >> >> >> > >> >> >>> function >> >> >> >> > >> >> >>> > > for what I need, itertools.combinations. This >> >> appears >> >> >> to >> >> >> >> be a >> >> >> >> > >> >> valid >> >> >> >> > >> >> >>> > > replacement to the method proposed. >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Yes, combinations is awesome! >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > There is a small problem that I didn't mention is >> >> that >> >> >> my >> >> >> >> > >> compare >> >> >> >> > >> >> >>> > function >> >> >> >> > >> >> >>> > > actually takes as inputs 2 columns from the >> table. >> >> Like >> >> >> >> so: >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) >> >> >> >> > >> >> >>> > > for ii in xrange(N_elements): >> >> >> >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): >> >> >> >> > >> >> >>> > > D[ii, jj] = >> compare(data['element1'][ii], >> >> >> >> > >> >> >>> > data['element1'][jj],data['element2'][ii], >> >> >> >> > >> >> >>> > > data['element2'][jj]) >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > Is there an efficient way of using itertools with >> >> this >> >> >> >> > >> structure? >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > You can always make two other iterators for each >> >> column. >> >> >> >> Since >> >> >> >> > >> you >> >> >> >> > >> >> >>> have >> >> >> >> > >> >> >>> > two columns you would have 4 iterators. I am not >> sure >> >> >> how >> >> >> >> fast >> >> >> >> > >> >> this is >> >> >> >> > >> >> >>> > going to be but I am confident that there is >> >> definitely a >> >> >> >> way >> >> >> >> > to >> >> >> >> > >> do >> >> >> >> > >> >> >>> this in >> >> >> >> > >> >> >>> > one for-loop, which is going to be way faster than >> >> nested >> >> >> >> > loops. >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Be Well >> >> >> >> > >> >> >>> > Anthony >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> >> >> >> > >> >> >>> > > pyt...@li...> >> >> wrote: >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > >> Send Pytables-users mailing list submissions to >> >> >> >> > >> >> >>> > >> pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide >> Web, >> >> >> visit >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > >> or, via email, send a message with subject or >> body >> >> >> >> 'help' to >> >> >> >> > >> >> >>> > >> >> >> pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> You can reach the person managing the list at >> >> >> >> > >> >> >>> > >> >> pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> When replying, please edit your Subject line so >> it >> >> is >> >> >> >> more >> >> >> >> > >> >> specific >> >> >> >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> Today's Topics: >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using >> PyTables >> >> >> (Josh >> >> >> >> > Ayers) >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> Message: 1 >> >> >> >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> >> >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> >> >> >> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration >> of >> >> HDF5 >> >> >> >> using >> >> >> >> > >> >> >>> PyTables >> >> >> >> > >> >> >>> > >> To: Discussion list for PyTables >> >> >> >> > >> >> >>> > >> <pyt...@li...> >> >> >> >> > >> >> >>> > >> Message-ID: >> >> >> >> > >> >> >>> > >> < >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >> >> >> >> >> CAC...@ma...> >> >> >> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> David, >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> The change in issue 27 was only for iteration >> over >> >> a >> >> >> >> > >> >> tables.Column >> >> >> >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as >> >> follows. >> >> >> >> This >> >> >> >> > >> will >> >> >> >> > >> >> >>> > iterate >> >> >> >> > >> >> >>> > >> over the "element" column, as in your original >> >> >> example. >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> Note also that this will only work with the >> >> >> development >> >> >> >> > >> version >> >> >> >> > >> >> of >> >> >> >> > >> >> >>> > >> PyTables >> >> >> >> > >> >> >>> > >> available on github. It will be very slow using >> >> the >> >> >> >> > released >> >> >> >> > >> >> >>> v2.4.0. >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> from itertools import izip >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> with tb.openFile(...) as f: >> >> >> >> > >> >> >>> > >> data = f.root.data.cols.element >> >> >> >> > >> >> >>> > >> data_i = iter(data) >> >> >> >> > >> >> >>> > >> data_j = iter(data) >> >> >> >> > >> >> >>> > >> data_i.next() # throw the first value away >> >> >> >> > >> >> >>> > >> for i, j in izip(data_i, data_j): >> >> >> >> > >> >> >>> > >> compare(i, j) >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> Hope that helps, >> >> >> >> > >> >> >>> > >> Josh >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz >> < >> >> >> >> > >> >> sc...@gm...> >> >> >> >> > >> >> >>> > >> wrote: >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> > HI David, >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > Tables and table column iteration have been >> >> >> overhauled >> >> >> >> > >> fairly >> >> >> >> > >> >> >>> recently >> >> >> >> > >> >> >>> > >> > [1]. So you might try creating two iterators, >> >> >> offset >> >> >> >> by >> >> >> >> > >> one, >> >> >> >> > >> >> and >> >> >> >> > >> >> >>> then >> >> >> >> > >> >> >>> > >> > doing the comparison. I am hacking this out >> >> super >> >> >> >> quick >> >> >> >> > so >> >> >> >> > >> >> please >> >> >> >> > >> >> >>> > >> forgive >> >> >> >> > >> >> >>> > >> > me: >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > from itertools import izip >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > with tb.openFile(...) as f: >> >> >> >> > >> >> >>> > >> > data = f.root.data >> >> >> >> > >> >> >>> > >> > data_i = iter(data) >> >> >> >> > >> >> >>> > >> > data_j = iter(data) >> >> >> >> > >> >> >>> > >> > data_i.next() # throw the first value away >> >> >> >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): >> >> >> >> > >> >> >>> > >> > compare(i, j) >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > You get the idea ;) >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > Be Well >> >> >> >> > >> >> >>> > >> > Anthony >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > 1. >> >> https://github.com/PyTables/PyTables/issues/27 >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >> >> > >> >> >>> dav...@gm...> >> >> >> >> > >> >> >>> > >> wrote: >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> >> I was hoping someone could help me out here. >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> This is from a post I put up on >> StackOverflow, >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I >> store in >> >> >> HDF5 >> >> >> >> and >> >> >> >> > >> >> access >> >> >> >> > >> >> >>> > using >> >> >> >> > >> >> >>> > >> >> PyTables. One operation I need to do on this >> >> >> dataset >> >> >> >> are >> >> >> >> > >> >> pairwise >> >> >> >> > >> >> >>> > >> >> comparisons between each of the elements. >> This >> >> >> >> requires 2 >> >> >> >> > >> >> loops, >> >> >> >> > >> >> >>> one >> >> >> >> > >> >> >>> > to >> >> >> >> > >> >> >>> > >> >> iterate over each element, and an inner loop >> to >> >> >> >> iterate >> >> >> >> > >> over >> >> >> >> > >> >> >>> every >> >> >> >> > >> >> >>> > >> other >> >> >> >> > >> >> >>> > >> >> element. This operation thus looks at >> N(N-1)/2 >> >> >> >> > comparisons. >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> For fairly small sets I found it to be >> faster to >> >> >> dump >> >> >> >> the >> >> >> >> > >> >> >>> contents >> >> >> >> > >> >> >>> > >> into a >> >> >> >> > >> >> >>> > >> >> multdimensional numpy array and then do my >> >> >> iteration. >> >> >> >> I >> >> >> >> > run >> >> >> >> > >> >> into >> >> >> >> > >> >> >>> > >> problems >> >> >> >> > >> >> >>> > >> >> with large sets because of memory issues and >> >> need >> >> >> to >> >> >> >> > access >> >> >> >> > >> >> each >> >> >> >> > >> >> >>> > >> element of >> >> >> >> > >> >> >>> > >> >> the dataset at run time. >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> Putting the elements into an array gives me >> >> about >> >> >> 600 >> >> >> >> > >> >> >>> comparisons per >> >> >> >> > >> >> >>> > >> >> second, while operating on hdf5 data itself >> >> gives >> >> >> me >> >> >> >> > about >> >> >> >> > >> 300 >> >> >> >> > >> >> >>> > >> comparisons >> >> >> >> > >> >> >>> > >> >> per second. >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> Is there a way to speed this process up? >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> Example follows (this is not my real code, >> just >> >> an >> >> >> >> > >> example): >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> *Small Set*: >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> >> > >> >> >>> > >> >> data = f.root.data >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> >> >> >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): >> >> >> >> > >> >> >>> > >> >> elements[ii] = data['element'] >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> >> >> >> > >> >> xrange(N_elements): >> >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >> >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], >> >> >> >> elements[jj]) >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> *Large Set*: >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> >> > >> >> >>> > >> >> data = f.root.data >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> >> >> >> > >> >> >>> > >> >> for ii in xrange(N_elements): >> >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >> >> > >> >> >>> > >> >> D[ii, jj] = >> >> >> compare(data['element'][ii], >> >> >> >> > >> >> >>> > >> data['element'][jj]) >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, >> ASP.NET, >> >> C# >> >> >> >> 2012, >> >> >> >> > >> >> HTML5, >> >> >> >> > >> >> >>> CSS, >> >> >> >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much >> more. >> >> Keep >> >> >> >> your >> >> >> >> > >> >> skills >> >> >> >> > >> >> >>> > current >> >> >> >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video >> >> >> tutorials >> >> >> >> by >> >> >> >> > >> >> >>> Microsoft >> >> >> >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- >> >> learn >> >> >> >> more >> >> >> >> > at: >> >> >> >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >> >> > >> >> >>> > >> >> >> _______________________________________________ >> >> >> >> > >> >> >>> > >> >> Pytables-users mailing list >> >> >> >> > >> >> >>> > >> >> Pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, >> ASP.NET, >> >> C# >> >> >> >> 2012, >> >> >> >> > >> >> HTML5, >> >> >> >> > >> >> >>> CSS, >> >> >> >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. >> >> Keep >> >> >> >> your >> >> >> >> > >> skills >> >> >> >> > >> >> >>> > current >> >> >> >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video >> >> >> tutorials >> >> >> >> by >> >> >> >> > >> >> Microsoft >> >> >> >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- >> >> learn >> >> >> more >> >> >> >> > at: >> >> >> >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> >> >> >> > >> >> >>> > >> > >> _______________________________________________ >> >> >> >> > >> >> >>> > >> > Pytables-users mailing list >> >> >> >> > >> >> >>> > >> > Pyt...@li... >> >> >> >> > >> >> >>> > >> > >> >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> -------------- next part -------------- >> >> >> >> > >> >> >>> > >> An HTML attachment was scrubbed... >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> ------------------------------ >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, >> C# >> >> >> 2012, >> >> >> >> > >> HTML5, >> >> >> >> > >> >> >>> CSS, >> >> >> >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. >> Keep >> >> >> your >> >> >> >> > >> skills >> >> >> >> > >> >> >>> current >> >> >> >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video >> >> tutorials >> >> >> by >> >> >> >> > >> >> Microsoft >> >> >> >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- >> learn >> >> >> more >> >> >> >> at: >> >> >> >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> ------------------------------ >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> _______________________________________________ >> >> >> >> > >> >> >>> > >> Pytables-users mailing list >> >> >> >> > >> >> >>> > >> Pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> >> >> >> > >> >> >>> > >> ********************************************* >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, >> C# >> >> >> 2012, >> >> >> >> > >> HTML5, >> >> >> >> > >> >> CSS, >> >> >> >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. >> Keep >> >> >> your >> >> >> >> > skills >> >> >> >> > >> >> >>> current >> >> >> >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video >> >> tutorials >> >> >> by >> >> >> >> > >> Microsoft >> >> >> >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- >> learn >> >> more >> >> >> >> at: >> >> >> >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >> >> > >> >> >>> > > _______________________________________________ >> >> >> >> > >> >> >>> > > Pytables-users mailing list >> >> >> >> > >> >> >>> > > Pyt...@li... >> >> >> >> > >> >> >>> > > >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > -------------- next part -------------- >> >> >> >> > >> >> >>> > An HTML attachment was scrubbed... >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > ------------------------------ >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Message: 2 >> >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users >> Digest, >> >> Vol >> >> >> 80, >> >> >> >> > >> Issue 4 >> >> >> >> > >> ... [truncated message content] |