You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: David R. <dav...@gm...> - 2013-02-04 15:54:22
|
Hi Josh, Here is my __iter__ code: def __iter__(self): table = self.table itemsize = self.dtype.itemsize nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize max_row = len(self) for start_row in xrange(0, len(self), nrowsinbuf): end_row = min([start_row + nrowsinbuf, max_row]) buf = table.read(start_row, end_row, 1, field=self.pathname) for row in buf: yield row It does look different, I will try swapping in the code from github and see what happens. On Mon, Feb 4, 2013 at 9:59 AM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) > 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 1 Feb 2013 14:08:47 -0800 > From: Josh Ayers <jos...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > <CACOB4aPG4NZ6b2a3v= > 1Ue...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > David, > > You added a custom version of table.Column.__iter__, correct? Could you > also include that along with the script to reproduce the error? > > It seems like the problem may be in the 'nrowsinbuf' calculation - see > [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using the > default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6 > chunks. Instead, it's reading the entire table. > > [1]: > https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 > > > > On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> wrote: > > > > > > > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> > wrote: > > > >> at the error: > >> > >> result = numpy.empty(shape=nrows, dtype=dtypeField) > >> > >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> > >> I'm not sure what that means as a dtype, but thats what it is. > >> > >> Forgive me if I'm being totally naive, but I thought the whole point of > >> __iter__ with pyttables was to do iteration on the fly, so there is no > >> preallocation. > >> > > > > Nope you are not being naive at all. That is the point. > > > > > >> If you have any ideas on this I'm all ears. > >> > > > > If you could send a minimal script which reproduces this error, that > would > > help a lot. > > > > Be Well > > Anthony > > > > > >> > >> > >> Thanks again. > >> > >> Dave > >> > >> > >> On Fri, Feb 1, 2013 at 3:45 PM, < > >> pyt...@li...> wrote: > >> > >>> Send Pytables-users mailing list submissions to > >>> pyt...@li... > >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> or, via email, send a message with subject or body 'help' to > >>> pyt...@li... > >>> > >>> You can reach the person managing the list at > >>> pyt...@li... > >>> > >>> When replying, please edit your Subject line so it is more specific > >>> than "Re: Contents of Pytables-users digest..." > >>> > >>> > >>> Today's Topics: > >>> > >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) > >>> > >>> > >>> ---------------------------------------------------------------------- > >>> > >>> Message: 1 > >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >>> From: Anthony Scopatz <sc...@gm...> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 > >>> To: Discussion list for PyTables > >>> <pyt...@li...> > >>> Message-ID: > >>> < > >>> CAP...@ma...> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> > >>> wrote: > >>> > >>> > Hi Anthony, > >>> > > >>> > Thanks for the reply. > >>> > > >>> > I honestly don't know how to monitor my Python memory usage, but I'm > >>> sure > >>> > that its caused by out of memory. > >>> > > >>> > >>> Well, I would just run top or process monitor or something while > running > >>> the python script to see what happens to memory usage as the script > chugs > >>> along... > >>> > >>> > >>> > I'm just trying to find out how to fix it. My HDF5 table has 4620 > >>> rows > >>> > and the column I'm iterating over is a 17x9600 boolean matrix. The > >>> > __iter__ method is preallocating an array that is this size which > >>> appears > >>> > to be root of the error. I was hoping there is a fix somewhere in > >>> here to > >>> > not have to do this preallocation. > >>> > > >>> > >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of > >>> these is ~760 MB. If you have 2 GB of memory and you are iterating > over > >>> 2 > >>> of these (templates & masks) it is conceivable that you are just > running > >>> out of memory. Maybe there is a way that __iter__ could not > preallocate > >>> something that is basically a temporary. What is the dtype of the > >>> templates array? > >>> > >>> Be Well > >>> Anthony > >>> > >>> > >>> > > >>> > Thanks again. > >>> > >>> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Mon, 4 Feb 2013 09:58:53 -0500 > From: David Reed <dav...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 > To: pyt...@li... > Message-ID: > <CAM6XA7= > h50...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Anthony, > > Sorry to just get back to you. I can send a script, should I send a script > that creates some fake data as well? > > -Dave > > > On Fri, Feb 1, 2013 at 4:50 PM, < > pyt...@li...> wrote: > > > Send Pytables-users mailing list submissions to > > pyt...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > or, via email, send a message with subject or body 'help' to > > pyt...@li... > > > > You can reach the person managing the list at > > pyt...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Pytables-users digest..." > > > > > > Today's Topics: > > > > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Fri, 1 Feb 2013 15:50:11 -0600 > > From: Anthony Scopatz <sc...@gm...> > > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > < > > CAP...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> > wrote: > > > > > at the error: > > > > > > result = numpy.empty(shape=nrows, dtype=dtypeField) > > > > > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > > > > > > I'm not sure what that means as a dtype, but thats what it is. > > > > > > Forgive me if I'm being totally naive, but I thought the whole point of > > > __iter__ with pyttables was to do iteration on the fly, so there is no > > > preallocation. > > > > > > > Nope you are not being naive at all. That is the point. > > > > > > > If you have any ideas on this I'm all ears. > > > > > > > If you could send a minimal script which reproduces this error, that > would > > help a lot. > > > > Be Well > > Anthony > > > > > > > > > > > > > Thanks again. > > > > > > Dave > > > > > > > > > On Fri, Feb 1, 2013 at 3:45 PM, < > > > pyt...@li...> wrote: > > > > > >> Send Pytables-users mailing list submissions to > > >> pyt...@li... > > >> > > >> To subscribe or unsubscribe via the World Wide Web, visit > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> or, via email, send a message with subject or body 'help' to > > >> pyt...@li... > > >> > > >> You can reach the person managing the list at > > >> pyt...@li... > > >> > > >> When replying, please edit your Subject line so it is more specific > > >> than "Re: Contents of Pytables-users digest..." > > >> > > >> > > >> Today's Topics: > > >> > > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) > > >> > > >> > > >> ---------------------------------------------------------------------- > > >> > > >> Message: 1 > > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 > > >> From: Anthony Scopatz <sc...@gm...> > > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 > > >> To: Discussion list for PyTables > > >> <pyt...@li...> > > >> Message-ID: > > >> < > > >> CAP...@ma...> > > >> Content-Type: text/plain; charset="iso-8859-1" > > >> > > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> > > >> wrote: > > >> > > >> > Hi Anthony, > > >> > > > >> > Thanks for the reply. > > >> > > > >> > I honestly don't know how to monitor my Python memory usage, but I'm > > >> sure > > >> > that its caused by out of memory. > > >> > > > >> > > >> Well, I would just run top or process monitor or something while > running > > >> the python script to see what happens to memory usage as the script > > chugs > > >> along... > > >> > > >> > > >> > I'm just trying to find out how to fix it. My HDF5 table has 4620 > > rows > > >> > and the column I'm iterating over is a 17x9600 boolean matrix. The > > >> > __iter__ method is preallocating an array that is this size which > > >> appears > > >> > to be root of the error. I was hoping there is a fix somewhere in > > here > > >> to > > >> > not have to do this preallocation. > > >> > > > >> > > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of > > >> these is ~760 MB. If you have 2 GB of memory and you are iterating > > over 2 > > >> of these (templates & masks) it is conceivable that you are just > running > > >> out of memory. Maybe there is a way that __iter__ could not > preallocate > > >> something that is basically a temporary. What is the dtype of the > > >> templates array? > > >> > > >> Be Well > > >> Anthony > > >> > > >> > > >> > > > >> > Thanks again. > > >> > > > >> > > > >> > > > >> > > > >> > On Fri, Feb 1, 2013 at 11:12 AM, < > > >> > pyt...@li...> wrote: > > >> > > > >> >> Send Pytables-users mailing list submissions to > > >> >> pyt...@li... > > >> >> > > >> >> To subscribe or unsubscribe via the World Wide Web, visit > > >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> or, via email, send a message with subject or body 'help' to > > >> >> pyt...@li... > > >> >> > > >> >> You can reach the person managing the list at > > >> >> pyt...@li... > > >> >> > > >> >> When replying, please edit your Subject line so it is more specific > > >> >> than "Re: Contents of Pytables-users digest..." > > >> >> > > >> >> > > >> >> Today's Topics: > > >> >> > > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) > > >> >> > > >> >> > > >> >> > > ---------------------------------------------------------------------- > > >> >> > > >> >> Message: 1 > > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > > >> >> From: Anthony Scopatz <sc...@gm...> > > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue > 9 > > >> >> To: Discussion list for PyTables > > >> >> <pyt...@li...> > > >> >> Message-ID: > > >> >> < > > >> >> CAP...@ma... > > > > >> >> Content-Type: text/plain; charset="iso-8859-1" > > >> >> > > >> >> Hi David, > > >> >> > > >> >> Sorry, I haven't had a ton of time recently. You seem to be > getting > > a > > >> >> memory error on creating a numpy array. This kind of thing > typically > > >> >> happens when you are out of memory. Does this seem to be the case > > with > > >> >> you? When this dies, is your memory usage at 100%? If so, this > > >> algorithm > > >> >> might require a little tweaking... > > >> >> > > >> >> Be Well > > >> >> Anthony > > >> >> > > >> >> > > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm... > > > > >> >> wrote: > > >> >> > > >> >> > I'm still having problems with this one. I can't tell if this > > >> something > > >> >> > dumb Im doing with itertools, or if its something in pytables. > > >> >> > > > >> >> > Would appreciate any help. > > >> >> > > > >> >> > Thanks > > >> >> > > > >> >> > > > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < > > dav...@gm... > > >> >> >wrote: > > >> >> > > > >> >> >> I think I have to reopen this issue. I have been running fine > for > > >> >> awhile > > >> >> >> using the combinations method from itertools, but have recently > > run > > >> >> into a > > >> >> >> memory since I have recently quadrupled the size of the hdf > file. > > >> >> >> > > >> >> >> Here is my code again: > > >> >> >> > > >> >> >> from itertools import combinations, izip > > >> >> >> with tb.openFile(h5_all, 'r') as f: > > >> >> >> irises = f.root.irises > > >> >> >> > > >> >> >> templates = f.root.irises.cols.templates > > >> >> >> masks = f.root.irises.cols.masks1 > > >> >> >> > > >> >> >> N_irises = len(irises) > > >> >> >> index = np.ones((20 * 480), np.bool) > > >> >> >> > > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > > >> >> >> D = np.empty((N_irises, N_irises)) > > >> >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, > > >> masks, > > >> >> >> range(N_irises)), 2): > > >> >> >> # print ii > > >> >> >> D[ii, jj] = ham_dist( > > >> >> >> t1[8, index], > > >> >> >> t2[:, index], > > >> >> >> m1[8, index], > > >> >> >> m2[:, index], > > >> >> >> ) > > >> >> >> > > >> >> >> And here is the error: > > >> >> >> > > >> >> >> In [10]: get_hd3() > > >> >> >> 10669890 Comparisons > > >> >> >> > > >> >> >> > > >> >> > > >> > > > --------------------------------------------------------------------------- > > >> >> >> MemoryError Traceback (most recent > > >> call > > >> >> >> last) > > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > > >> >> >> ----> 1 get_hd3() > > >> >> >> > > >> >> >> > > >> >> >> 118 print '%i Comparisons' % > > >> (N_irises*(N_irises - > > >> >> >> 1)/2) > > >> >> >> 119 D = np.empty((N_irises, N_irises)) > > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in > > >> >> >> combinations(izip(temp > > >> >> >> lates, masks, range(N_irises)), 2): > > >> >> >> 121 # print ii > > >> >> >> 122 D[ii, jj] = ham_dist( > > >> >> >> > > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) > > >> >> >> 3274 for start_row in xrange(0, len(self), > nrowsinbuf): > > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, > > max_row]) > > >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, > > >> >> >> field=self.pathname) > > >> >> >> > > >> >> >> 3277 for row in buf: > > >> >> >> 3278 yield row > > >> >> >> > > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, > > start, > > >> >> stop, > > >> >> >> step, > > >> >> >> field) > > >> >> >> 1772 (start, stop, step) = > > self._processRangeRead(start, > > >> >> stop, > > >> >> >> step) > > >> >> >> 1773 > > >> >> >> -> 1774 arr = self._read(start, stop, step, field) > > >> >> >> 1775 return internal_to_flavor(arr, self.flavor) > > >> >> >> 1776 > > >> >> >> > > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, > > start, > > >> >> >> stop, step, > > >> >> >> field) > > >> >> >> 1719 if field: > > >> >> >> 1720 # Create a container for the results > > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > > >> dtype=dtypeField) > > >> >> >> 1722 else: > > >> >> >> 1723 # Recarray case > > >> >> >> > > >> >> >> MemoryError: > > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() > > >> >> >> 1720 # Create a container for the results > > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > > >> dtype=dtypeField) > > >> >> >> 1722 else: > > >> >> >> > > >> >> >> Also, if you guys see any performance problems in my code, > please > > >> let > > >> >> me > > >> >> >> know. > > >> >> >> > > >> >> >> Thank you so much for the help. > > >> >> >> > > >> >> >> -Dave > > >> >> >> > > >> >> >> > > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > > >> >> >> pyt...@li...> wrote: > > >> >> >> > > >> >> >>> Send Pytables-users mailing list submissions to > > >> >> >>> pyt...@li... > > >> >> >>> > > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > > >> >> >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> or, via email, send a message with subject or body 'help' to > > >> >> >>> pyt...@li... > > >> >> >>> > > >> >> >>> You can reach the person managing the list at > > >> >> >>> pyt...@li... > > >> >> >>> > > >> >> >>> When replying, please edit your Subject line so it is more > > specific > > >> >> >>> than "Re: Contents of Pytables-users digest..." > > >> >> >>> > > >> >> >>> > > >> >> >>> Today's Topics: > > >> >> >>> > > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) > > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> ---------------------------------------------------------------------- > > >> >> >>> > > >> >> >>> Message: 1 > > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > > >> >> >>> From: David Reed <dav...@gm...> > > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > > Issue > > >> 8 > > >> >> >>> To: pyt...@li... > > >> >> >>> Message-ID: > > >> >> >>> < > > >> >> >>> > > CAM...@ma... > > >> > > > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > > >> >> >>> I can't thank you guys enough for the help. I was able to add > > the > > >> >> >>> __iter__ > > >> >> >>> function to the table.py file and everything seems to be > working > > >> >> great! > > >> >> >>> I'm not quite as fast as I was with iterating right of a > matrix > > >> but > > >> >> >>> pretty > > >> >> >>> close. I was at 555 comparisons per second, and now im at 420. > > >> >> >>> > > >> >> >>> I handled the problem I mentioned earlier by doing this, and it > > >> seems > > >> >> to > > >> >> >>> work great: > > >> >> >>> > > >> >> >>> A = f.root.data.cols.A > > >> >> >>> B = f.root.data.cols.B > > >> >> >>> > > >> >> >>> D = np.empty((len(A), len(A)) > > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, > > >> >> range(len(A))), > > >> >> >>> 2): > > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > > >> >> >>> > > >> >> >>> Again, thanks a lot. > > >> >> >>> > > >> >> >>> -Dave > > >> >> >>> > > >> >> >>> > > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > > >> >> >>> pyt...@li...> wrote: > > >> >> >>> > > >> >> >>> > Send Pytables-users mailing list submissions to > > >> >> >>> > pyt...@li... > > >> >> >>> > > > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit > > >> >> >>> > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > or, via email, send a message with subject or body 'help' to > > >> >> >>> > pyt...@li... > > >> >> >>> > > > >> >> >>> > You can reach the person managing the list at > > >> >> >>> > pyt...@li... > > >> >> >>> > > > >> >> >>> > When replying, please edit your Subject line so it is more > > >> specific > > >> >> >>> > than "Re: Contents of Pytables-users digest..." > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > Today's Topics: > > >> >> >>> > > > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony > > >> Scopatz) > > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony > > >> Scopatz) > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> > > ---------------------------------------------------------------------- > > >> >> >>> > > > >> >> >>> > Message: 1 > > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > > >> Issue 3 > > >> >> >>> > To: Discussion list for PyTables > > >> >> >>> > <pyt...@li...> > > >> >> >>> > Message-ID: > > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > > >> >> >>> > Gz...@ma...> > > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > > > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > > >> dav...@gm...> > > >> >> >>> wrote: > > >> >> >>> > > > >> >> >>> > > Thanks a lot for the help so far guys! > > >> >> >>> > > > > >> >> >>> > > Looking at itertools, I found what I believe to be the > > perfect > > >> >> >>> function > > >> >> >>> > > for what I need, itertools.combinations. This appears to > be a > > >> >> valid > > >> >> >>> > > replacement to the method proposed. > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > Yes, combinations is awesome! > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > > >> >> >>> > > There is a small problem that I didn't mention is that my > > >> compare > > >> >> >>> > function > > >> >> >>> > > actually takes as inputs 2 columns from the table. Like so: > > >> >> >>> > > > > >> >> >>> > > D = np.empty((N_irises, N_irises)) > > >> >> >>> > > for ii in xrange(N_elements): > > >> >> >>> > > for jj in xrange(ii+1, N_elements): > > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], > > >> >> >>> > data['element1'][jj],data['element2'][ii], > > >> >> >>> > > data['element2'][jj]) > > >> >> >>> > > > > >> >> >>> > > Is there an efficient way of using itertools with this > > >> structure? > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > You can always make two other iterators for each column. > Since > > >> you > > >> >> >>> have > > >> >> >>> > two columns you would have 4 iterators. I am not sure how > fast > > >> >> this is > > >> >> >>> > going to be but I am confident that there is definitely a way > > to > > >> do > > >> >> >>> this in > > >> >> >>> > one for-loop, which is going to be way faster than nested > > loops. > > >> >> >>> > > > >> >> >>> > Be Well > > >> >> >>> > Anthony > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > > >> >> >>> > > pyt...@li...> wrote: > > >> >> >>> > > > > >> >> >>> > >> Send Pytables-users mailing list submissions to > > >> >> >>> > >> pyt...@li... > > >> >> >>> > >> > > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit > > >> >> >>> > >> > > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >> or, via email, send a message with subject or body 'help' > to > > >> >> >>> > >> pyt...@li... > > >> >> >>> > >> > > >> >> >>> > >> You can reach the person managing the list at > > >> >> >>> > >> pyt...@li... > > >> >> >>> > >> > > >> >> >>> > >> When replying, please edit your Subject line so it is more > > >> >> specific > > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> Today's Topics: > > >> >> >>> > >> > > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh > > Ayers) > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > > >> ---------------------------------------------------------------------- > > >> >> >>> > >> > > >> >> >>> > >> Message: 1 > > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > > >> >> >>> > >> From: Josh Ayers <jos...@gm...> > > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 > using > > >> >> >>> PyTables > > >> >> >>> > >> To: Discussion list for PyTables > > >> >> >>> > >> <pyt...@li...> > > >> >> >>> > >> Message-ID: > > >> >> >>> > >> < > > >> >> >>> > >> > > >> >> CAC...@ma... > > > > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >> > > >> >> >>> > >> David, > > >> >> >>> > >> > > >> >> >>> > >> The change in issue 27 was only for iteration over a > > >> >> tables.Column > > >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. > This > > >> will > > >> >> >>> > iterate > > >> >> >>> > >> over the "element" column, as in your original example. > > >> >> >>> > >> > > >> >> >>> > >> Note also that this will only work with the development > > >> version > > >> >> of > > >> >> >>> > >> PyTables > > >> >> >>> > >> available on github. It will be very slow using the > > released > > >> >> >>> v2.4.0. > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> from itertools import izip > > >> >> >>> > >> > > >> >> >>> > >> with tb.openFile(...) as f: > > >> >> >>> > >> data = f.root.data.cols.element > > >> >> >>> > >> data_i = iter(data) > > >> >> >>> > >> data_j = iter(data) > > >> >> >>> > >> data_i.next() # throw the first value away > > >> >> >>> > >> for i, j in izip(data_i, data_j): > > >> >> >>> > >> compare(i, j) > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> Hope that helps, > > >> >> >>> > >> Josh > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > > >> >> sc...@gm...> > > >> >> >>> > >> wrote: > > >> >> >>> > >> > > >> >> >>> > >> > HI David, > > >> >> >>> > >> > > > >> >> >>> > >> > Tables and table column iteration have been overhauled > > >> fairly > > >> >> >>> recently > > >> >> >>> > >> > [1]. So you might try creating two iterators, offset by > > >> one, > > >> >> and > > >> >> >>> then > > >> >> >>> > >> > doing the comparison. I am hacking this out super quick > > so > > >> >> please > > >> >> >>> > >> forgive > > >> >> >>> > >> > me: > > >> >> >>> > >> > > > >> >> >>> > >> > from itertools import izip > > >> >> >>> > >> > > > >> >> >>> > >> > with tb.openFile(...) as f: > > >> >> >>> > >> > data = f.root.data > > >> >> >>> > >> > data_i = iter(data) > > >> >> >>> > >> > data_j = iter(data) > > >> >> >>> > >> > data_i.next() # throw the first value away > > >> >> >>> > >> > for i, j in izip(data_i, data_j): > > >> >> >>> > >> > compare(i, j) > > >> >> >>> > >> > > > >> >> >>> > >> > You get the idea ;) > > >> >> >>> > >> > > > >> >> >>> > >> > Be Well > > >> >> >>> > >> > Anthony > > >> >> >>> > >> > > > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > > >> >> >>> dav...@gm...> > > >> >> >>> > >> wrote: > > >> >> >>> > >> > > > >> >> >>> > >> >> I was hoping someone could help me out here. > > >> >> >>> > >> >> > > >> >> >>> > >> >> This is from a post I put up on StackOverflow, > > >> >> >>> > >> >> > > >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 > and > > >> >> access > > >> >> >>> > using > > >> >> >>> > >> >> PyTables. One operation I need to do on this dataset > are > > >> >> pairwise > > >> >> >>> > >> >> comparisons between each of the elements. This > requires 2 > > >> >> loops, > > >> >> >>> one > > >> >> >>> > to > > >> >> >>> > >> >> iterate over each element, and an inner loop to iterate > > >> over > > >> >> >>> every > > >> >> >>> > >> other > > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 > > comparisons. > > >> >> >>> > >> >> > > >> >> >>> > >> >> For fairly small sets I found it to be faster to dump > the > > >> >> >>> contents > > >> >> >>> > >> into a > > >> >> >>> > >> >> multdimensional numpy array and then do my iteration. I > > run > > >> >> into > > >> >> >>> > >> problems > > >> >> >>> > >> >> with large sets because of memory issues and need to > > access > > >> >> each > > >> >> >>> > >> element of > > >> >> >>> > >> >> the dataset at run time. > > >> >> >>> > >> >> > > >> >> >>> > >> >> Putting the elements into an array gives me about 600 > > >> >> >>> comparisons per > > >> >> >>> > >> >> second, while operating on hdf5 data itself gives me > > about > > >> 300 > > >> >> >>> > >> comparisons > > >> >> >>> > >> >> per second. > > >> >> >>> > >> >> > > >> >> >>> > >> >> Is there a way to speed this process up? > > >> >> >>> > >> >> > > >> >> >>> > >> >> Example follows (this is not my real code, just an > > >> example): > > >> >> >>> > >> >> > > >> >> >>> > >> >> *Small Set*: > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > > >> >> >>> > >> >> data = f.root.data > > >> >> >>> > >> >> > > >> >> >>> > >> >> N_elements = len(data) > > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > > >> >> >>> > >> >> > > >> >> >>> > >> >> for ii, d in enumerate(data): > > >> >> >>> > >> >> elements[ii] = data['element'] > > >> >> >>> > >> >> > > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > > >> >> xrange(N_elements): > > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > > >> >> >>> > >> >> > > >> >> >>> > >> >> *Large Set*: > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > > >> >> >>> > >> >> data = f.root.data > > >> >> >>> > >> >> > > >> >> >>> > >> >> N_elements = len(data) > > >> >> >>> > >> >> > > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) > > >> >> >>> > >> >> for ii in xrange(N_elements): > > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > > >> >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], > > >> >> >>> > >> data['element'][jj]) > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > > >> >> HTML5, > > >> >> >>> CSS, > > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep > your > > >> >> skills > > >> >> >>> > current > > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials > by > > >> >> >>> Microsoft > > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn more > > at: > > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >> >> _______________________________________________ > > >> >> >>> > >> >> Pytables-users mailing list > > >> >> >>> > >> >> Pyt...@li... > > >> >> >>> > >> >> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >> >> > > >> >> >>> > >> >> > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > > >> >> HTML5, > > >> >> >>> CSS, > > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your > > >> skills > > >> >> >>> > current > > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by > > >> >> Microsoft > > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more > > at: > > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >> > _______________________________________________ > > >> >> >>> > >> > Pytables-users mailing list > > >> >> >>> > >> > Pyt...@li... > > >> >> >>> > >> > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> -------------- next part -------------- > > >> >> >>> > >> An HTML attachment was scrubbed... > > >> >> >>> > >> > > >> >> >>> > >> ------------------------------ > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > > >> HTML5, > > >> >> >>> CSS, > > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your > > >> skills > > >> >> >>> current > > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > > >> >> Microsoft > > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more > at: > > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >> > > >> >> >>> > >> ------------------------------ > > >> >> >>> > >> > > >> >> >>> > >> _______________________________________________ > > >> >> >>> > >> Pytables-users mailing list > > >> >> >>> > >> Pyt...@li... > > >> >> >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > > >> >> >>> > >> ********************************************* > > >> >> >>> > >> > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > > >> HTML5, > > >> >> CSS, > > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > > skills > > >> >> >>> current > > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > > >> Microsoft > > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: > > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > > _______________________________________________ > > >> >> >>> > > Pytables-users mailing list > > >> >> >>> > > Pyt...@li... > > >> >> >>> > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > -------------- next part -------------- > > >> >> >>> > An HTML attachment was scrubbed... > > >> >> >>> > > > >> >> >>> > ------------------------------ > > >> >> >>> > > > >> >> >>> > Message: 2 > > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > > >> Issue 4 > > >> >> >>> > To: Discussion list for PyTables > > >> >> >>> > <pyt...@li...> > > >> >> >>> > Message-ID: > > >> >> >>> > < > > >> >> >>> > > > >> CAP...@ma...> > > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > > > >> >> >>> > Josh is right that you can just edit the code by hand (which > > >> works > > >> >> but > > >> >> >>> > sucks). > > >> >> >>> > > > >> >> >>> > However, on Windows -- on the rare occasion when I also have > to > > >> >> >>> develop on > > >> >> >>> > it -- I typically use a distribution that includes a > compiler, > > >> >> cython, > > >> >> >>> > hdf5, and pytables already and then I install my development > > >> version > > >> >> >>> from > > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, though > > >> other > > >> >> >>> > distributions listed here [1] might also work. > > >> >> >>> > > > >> >> >>> > Be well > > >> >> >>> > Anthony > > >> >> >>> > > > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < > > jos...@gm... > > >> > > > >> >> >>> wrote: > > >> >> >>> > > > >> >> >>> > > The change was in pure Python code, so you should be able > to > > >> just > > >> >> >>> paste > > >> >> >>> > in > > >> >> >>> > > the changes to your local copy. Start with the > > >> >> table.Column.__iter__ > > >> >> >>> > > method (lines 3296-3310) here. > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > > >> >> >>> > > > > >> >> >>> > > It needs to be modified slightly because it uses some > > >> additional > > >> >> >>> features > > >> >> >>> > > that aren't available in the released version (the > > >> out=buf_slice > > >> >> >>> argument > > >> >> >>> > > to table.read). The following should work. > > >> >> >>> > > > > >> >> >>> > > def __iter__(self): > > >> >> >>> > > table = self.table > > >> >> >>> > > itemsize = self.dtype.itemsize > > >> >> >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] > > // > > >> >> >>> itemsize > > >> >> >>> > > max_row = len(self) > > >> >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): > > >> >> >>> > > end_row = min([start_row + nrowsinbuf, > max_row]) > > >> >> >>> > > buf = table.read(start_row, end_row, 1, > > >> >> >>> field=self.pathname) > > >> >> >>> > > for row in buf: > > >> >> >>> > > yield row > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > I haven't tested this, but I think it will work. > > >> >> >>> > > > > >> >> >>> > > Josh > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < > > >> >> dav...@gm...> > > >> >> >>> > wrote: > > >> >> >>> > > > > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm > > forced > > >> to > > >> >> >>> work on > > >> >> >>> > >> Windows 7 at work and have never had luck compiling python > > >> source > > >> >> >>> > >> successfully. I have had to rely on precompiled binaries > > and > > >> now > > >> >> >>> its > > >> >> >>> > >> biting me in the butt. > > >> >> >>> > >> > > >> >> >>> > >> Is there any quick fix I can do to improve this iteration > > >> using > > >> >> >>> v2.4.0? > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > > >> >> >>> > >> pyt...@li...> wrote: > > >> >> >>> > >> > > >> >> >>> > >>> Send Pytables-users mailing list submissions to > > >> >> >>> > >>> pyt...@li... > > >> >> >>> > >>> > > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > > >> >> >>> > >>> > > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> or, via email, send a message with subject or body 'help' > > to > > >> >> >>> > >>> pyt...@li... > > >> >> >>> > >>> > > >> >> >>> > >>> You can reach the person managing the list at > > >> >> >>> > >>> pyt...@li... > > >> >> >>> > >>> > > >> >> >>> > >>> When replying, please edit your Subject line so it is > more > > >> >> specific > > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> Today's Topics: > > >> >> >>> > >>> > > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David > > Reed) > > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David > > Reed) > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > > >> ---------------------------------------------------------------------- > > >> >> >>> > >>> > > >> >> >>> > >>> Message: 1 > > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > > >> >> >>> > >>> From: David Reed <dav...@gm...> > > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > > 80, > > >> >> Issue > > >> >> >>> 2 > > >> >> >>> > >>> To: pyt...@li... > > >> >> >>> > >>> Message-ID: > > >> >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > > >> >> >>> > >>> ev...@ma...> > > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >>> > > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't > > think > > >> >> that > > >> >> >>> > method > > >> >> >>> > >>> will work since this will only be comparing the ith > element > > >> with > > >> >> >>> ith+1 > > >> >> >>> > >>> element. I still need 2 for loops right? > > >> >> >>> > >>> > > >> >> >>> > >>> Using itertools might speed things up though, I've never > > used > > >> >> them > > >> >> >>> so I > > >> >> >>> > >>> will give it a shot and let you know how it goes. Looks > > >> like I > > >> >> >>> need to > > >> >> >>> > >>> download the latest release before I do that too. Thanks > > for > > >> >> the > > >> >> >>> help. > > >> >> >>> > >>> > > >> >> >>> > >>> -Dave > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > > >> >> >>> > >>> pyt...@li...> wrote: > > >> >> >>> > >>> > > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, > visit > > >> >> >>> > >>> > > > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> > or, via email, send a message with subject or body > 'help' > > >> to > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > You can reach the person managing the list at > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > When replying, please edit your Subject line so it is > > more > > >> >> >>> specific > > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > Today's Topics: > > >> >> >>> > >>> > > > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables > > (Anthony > > >> >> >>> Scopatz) > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > > > >> >> > > ---------------------------------------------------------------------- > > >> >> >>> > >>> > > > >> >> >>> > >>> > Message: 1 > > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> > > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 > > >> using > > >> >> >>> PyTables > > >> >> >>> > >>> > To: Discussion list for PyTables > > >> >> >>> > >>> > <pyt...@li...> > > >> >> >>> > >>> > Message-ID: > > >> >> >>> > >>> > <CAPk-6T5b= > > >> >> >>> > >>> > > 1EG...@ma... > > > > > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >>> > > > >> >> >>> > >>> > HI David, > > >> >> >>> > >>> > > > >> >> >>> > >>> > Tables and table column iteration have been overhauled > > >> fairly > > >> >> >>> > recently > > >> >> >>> > >>> [1]. > > >> >> >>> > >>> > So you might try creating two iterators, offset by > one, > > >> and > > >> >> then > > >> >> >>> > >>> doing the > > >> >> >>> > >>> > comparison. I am hacking this out super quick so > please > > >> >> forgive > > >> >> >>> me: > > >> >> >>> > >>> > > > >> >> >>> > >>> > from itertools import izip > > >> >> >>> > >>> > > > >> >> >>> > >>> > with tb.openFile(...) as f: > > >> >> >>> > >>> > data = f.root.data > > >> >> >>> > >>> > data_i = iter(data) > > >> >> >>> > >>> > data_j = iter(data) > > >> >> >>> > >>> > data_i.next() # throw the first value away > > >> >> >>> > >>> > for i, j in izip(data_i, data_j): > > >> >> >>> > >>> > compare(i, j) > > >> >> >>> > >>> > > > >> >> >>> > >>> > You get the idea ;) > > >> >> >>> > >>> > > > >> >> >>> > >>> > Be Well > > >> >> >>> > >>> > Anthony > > >> >> >>> > >>> > > > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > > >> >> >>> dav...@gm...> > > >> >> >>> > >>> wrote: > > >> >> >>> > >>> > > > >> >> >>> > >>> > > I was hoping someone could help me out here. > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > This is from a post I put up on StackOverflow, > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > I am have a fairly large dataset that I store in HDF5 > > and > > >> >> >>> access > > >> >> >>> > >>> using > > >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset > > are > > >> >> >>> pairwise > > >> >> >>> > >>> > > comparisons between each of the elements. This > > requires 2 > > >> >> >>> loops, > > >> >> >>> > one > > >> >> >>> > >>> to > > >> >> >>> > >>> > > iterate over each element, and an inner loop to > iterate > > >> over > > >> >> >>> every > > >> >> >>> > >>> other > > >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 > > >> comparisons. > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > For fairly small sets I found it to be faster to dump > > the > > >> >> >>> contents > > >> >> >>> > >>> into a > > >> >> >>> > >>> > > multdimensional numpy array and then do my > iteration. I > > >> run > > >> >> >>> into > > >> >> >>> > >>> problems > > >> >> >>> > >>> > > with large sets because of memory issues and need to > > >> access > > >> >> >>> each > > >> >> >>> > >>> element > > >> >> >>> > >>> > of > > >> >> >>> > >>> > > the dataset at run time. > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Putting the elements into an array gives me about 600 > > >> >> >>> comparisons > > >> >> >>> > per > > >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me > > >> about > > >> >> 300 > > >> >> >>> > >>> > comparisons > > >> >> >>> > >>> > > per second. > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Is there a way to speed this process up? > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Example follows (this is not my real code, just an > > >> example): > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > *Small Set*: > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > > >> >> >>> > >>> > > data = f.root.data > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > N_elements = len(data) > > >> >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > for ii, d in enumerate(data): > > >> >> >>> > >>> > > elements[ii] = data['element'] > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in > > >> >> >>> xrange(N_elements): > > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > > >> >> >>> > >>> > > D[ii, jj] = compare(elements[ii], > elements[jj]) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > *Large Set*: > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > > >> >> >>> > >>> > > data = f.root.data > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > N_elements = len(data) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) > > >> >> >>> > >>> > > for ii in xrange(N_elements): > > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > > >> >> >>> > >>> > > D[ii, jj] = compare(data['element'][ii], > > >> >> >>> > >>> > data['element'][jj]) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > > 2012, > > >> >> >>> HTML5, > > >> >> >>> > CSS, > > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep > > your > > >> >> skills > > >> >> >>> > >>> current > > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials > > by > > >> >> >>> Microsoft > > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn > more > > >> at: > > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >>> > > _______________________________________________ > > >> >> >>> > >>> > > Pytables-users mailing list > > >> >> >>> > >>> > > Pyt...@li... > > >> >> >>> > >>> > > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > -------------- next part -------------- > > >> >> >>> > >>> > An HTML attachment was scrubbed... > > >> >> >>> > >>> > > > >> >> >>> > >>> > ------------------------------ > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > > >> >> HTML5, > > >> >> >>> CSS, > > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep > your > > >> >> skills > > >> >> >>> > current > > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials > by > > >> >> >>> Microsoft > > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more > > at: > > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > > >> >> >>> > >>> > > > >> >> >>> > >>> > ------------------------------ > > >> >> >>> > >>> > > > >> >> >>> > >>> > _______________________________________________ > > >> >> >>> > >>> > Pytables-users mailing list > > >> >> >>> > >>> > Pyt...@li... > > >> >> >>> > >>> > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > > >> >> >>> > >>> > ********************************************* > > >> >> >>> > >>> > > > >> >> >>> > >>> -------------- next part -------------- > > >> >> >>> > >>> An HTML attachment was scrubbed... > > >> >> >>> > >>> > > >> >> >>> > >>> ------------------------------ > > >> >> >>> > >>> > > >> >> >>> > >>> Message: 2 > > >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > > >> >> >>> > >>> From: David Reed <dav...@gm...> > > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > > 80, > > >> >> Issue > > >> >> >>> 3 > > >> >> >>> > >>> To: pyt...@li... > > >> >> >>> > >>> Message-ID: > > >> >> >>> > >>> < > > >> >> >>> > >>> > > >> >> CAM...@ma... > > >> >> >>> > > > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >>> > > >> >> >>> > >>> Thanks a lot for the help so far guys! > > >> >> >>> > >>> > > >> >> >>> > >>> Looking at itertools, I found what I believe to be the > > >> perfect > > >> >> >>> function > > >> >> >>> > >>> for > > >> >> >>> > >>> what I need, itertools.combinations. This appears to be a > > >> valid > > >> >> >>> > >>> replacement > > >> >> >>> > >>> to the method proposed. > > >> >> >>> > >>> > > >> >> >>> > >>> There is a small problem that I didn't mention is that my > > >> >> compare > > >> >> >>> > >>> function > > >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like > so: > > >> >> >>> > >>> > > >> >> >>> > >>> D = np.empty((N_irises, N_irises)) > > >> >> >>> > >>> for ii in xrange(N_elements): > > >> >> >>> > >>> for jj in xrange(ii+1, N_elements): > > >> >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], > > >> >> >>> > >>> data['element1'][jj],data['element2'][ii], > > >> >> >>> > >>> data['element2'][jj]) > > >> >> >>> > >>> > > >> >> >>> > >>> Is there an efficient way of using itertools with this > > >> >> structure? > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > > >> >> >>> > >>> pyt...@li...> wrote: > > >> >> >>> > >>> > > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, > visit > > >> >> >>> > >>> > > > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> >>> > >>> > or, via email, send a message with subject or body > 'help' > > >> to > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > You can reach the person managing the list at > > >> >> >>> > >>> > pyt...@li... > > >> >> >>> > >>> > > > >> >> >>> > >>> > When replying, please edit your Subject line so it is > > more > > >> >> >>> specific > > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > Today's Topics: > > >> >> >>> > >>> > > > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh > > >> Ayers) > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > > > >> >> > > ---------------------------------------------------------------------- > > >> >> >>> > >>> > > > >> >> >>> > >>> > Message: 1 > > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > > >> >> >>> > >>> > From: Josh Ayers <jos...@gm...> > > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 > > >> using > > >> >> >>> PyTables > > >> >> >>> > >>> > To: Discussion list for PyTables > > >> >> >>> > >>> > <pyt...@li...> > > >> >> >>> > >>> > Message-ID: > > >> >> >>> > >>> > < > > >> >> >>> > >>> > > > >> >> >>> > > CAC...@ma... > > >> > > > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > > >> >> >>> > >>> > > > >> >> >>> > >>> > David, > > >> >> >>> > >>> > > > >> >> >>> > >>> > The change in issue 27 was only for iteration over a > > >> >> >>> tables.Column > > >> >> >>> > >>> > instance. To use it, tweak Anthony's code as follows. > > >> This > > >> >> will > > >> >> >>> > >>> iterate > > >> >> >>> > >>> > over the "element" column, as in your original example. > > >> >> >>> > >>> > > > >> >> >>> > >>> > Note also that this will only work with the development > > >> >> version > > >> >> >>> of > > >> >> >>> > >>> PyTables > > >> >> >>> > >>> > available on github. It will be very slow using the > > >> released > > >> >> >>> v2.4.0. > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > from itertools import izip > > >> >> >>> > >>> > > > >> >> >>> > >>> > with tb.openFile(...) as f: > > >> >> >>> > >>> > data = f.root.data.cols.element > > >> >> >>> > >>> > data_i = iter(data) > > >> >> >>> > >>> > data_j = iter(data) > > >> >> >>> > >>> > data_i.next() # throw the first value away > > >> >> >>> > >>> > for i, j in izip(data_i, data_j): > > >> >> >>> > >>> > compare(i, j) > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > Hope that helps, > > >> >> >>> > >>> > Josh > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > > >> >> >>> sc...@gm...> > > >> >> >>> > >>> wrote: > > >> >> >>> > >>> > > > >> >> >>> > >>> > > HI David, > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Tables and table column iteration have been > overhauled > > >> >> fairly > > >> >> >>> > >>> recently > > >> >> >>> > >>> > > [1]. So you might try creating two iterators, offset > > by > > >> >> one, > > >> >> >>> and > > >> >> >>> > >>> then > > >> >> >>> > >>> > > doing the comparison. I am hacking this out super > > quick > > >> so > > >> >> >>> please > > >> >> >>> > >>> > forgive > > >> >> >>> > >>> > > me: > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > from itertools import izip > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > with tb.openFile(...) as f: > > >> >> >>> > >>> > > data = f.root.data > > >> >> >>> > >>> > > data_i = iter(data) > > >> >> >>> > >>> > > data_j = iter(data) > > >> >> >>> > >>> > > data_i.next() # throw the first value away > > >> >> >>> > >>> > > for i, j in izip(data_i, data_j): > > >> >> >>> > >>> > > compare(i, j) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > You get the idea ;) > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > Be Well > > >> >> >>> > >>> > > Anthony > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > > > >> >> >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > > >> >> >>> dav...@gm... > > >> >> >>> > > > > >> >> >>> > >>> > wrote: > > >> >> >>> > >>> > > > > >> >> >>> > >>> > >> I was hoping someone could help me out here. > > >> >> >>> > >>> > >> > > >> >> >>> > >>> > >> This is from a post I put up on StackOverflow, > > >> >> >>> > >>> > >> > > >> >> >>> > >>> > >> I am have a fairly large dataset that I store in > HDF5 > > >> and > > >> >> >>> access > > >> >> >>> > >>> using > > >> >> >>> > >>> > >> PyTables. One operation I need to do on this dataset > > are > > >> >> >>> pairwise > > >> >> >>> > >>> > >> comparisons between each of the elements. This > > requires > > >> 2 > > >> >> >>> loops, > > >> >> >>> > >>> one to > > >> >> >>> > >>> > >> iterate over each element, and an inner loop to > > iterate > > >> >> over > > >> >> >>> every > > >> >> >>> > >>> other > > >> >> >>> > >>> > >> element. This operation thus looks at N(N-1)/2 > > >> comparisons. > > >> >> >>> > >>> > >> > > >> >> >>> > >>> > >> For fairly small sets I found it to be faster to > dump > > >> the > > >> >> >>> contents > > >> >> >>> > >>> into > > >> >> >>> > >>> > a > > >> >> >>> > >>> > >> multdimensional numpy array and then do my > iteration. > > I > > >> run > > >> >> >>> into > > >> >> >>> > >>> > problems > > >> >> >>> > >>> > >> with large sets because of memory issues and need to > > >> access > > >> >> >>> each > > >> >> >>> > >>> > element of > > >> >> >>> > >>> > >> the dataset at run time. > > >> >> >>> > >>> > >> > > >> >> >>> > >>> > >> Putting the elements into an array gives me ... [truncated message content] |
From: David R. <dav...@gm...> - 2013-02-04 14:59:25
|
Hi Anthony, Sorry to just get back to you. I can send a script, should I send a script that creates some fake data as well? -Dave On Fri, Feb 1, 2013 at 4:50 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 1 Feb 2013 15:50:11 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> wrote: > > > at the error: > > > > result = numpy.empty(shape=nrows, dtype=dtypeField) > > > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > > > > I'm not sure what that means as a dtype, but thats what it is. > > > > Forgive me if I'm being totally naive, but I thought the whole point of > > __iter__ with pyttables was to do iteration on the fly, so there is no > > preallocation. > > > > Nope you are not being naive at all. That is the point. > > > > If you have any ideas on this I'm all ears. > > > > If you could send a minimal script which reproduces this error, that would > help a lot. > > Be Well > Anthony > > > > > > > > Thanks again. > > > > Dave > > > > > > On Fri, Feb 1, 2013 at 3:45 PM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> From: Anthony Scopatz <sc...@gm...> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> < > >> CAP...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> > >> wrote: > >> > >> > Hi Anthony, > >> > > >> > Thanks for the reply. > >> > > >> > I honestly don't know how to monitor my Python memory usage, but I'm > >> sure > >> > that its caused by out of memory. > >> > > >> > >> Well, I would just run top or process monitor or something while running > >> the python script to see what happens to memory usage as the script > chugs > >> along... > >> > >> > >> > I'm just trying to find out how to fix it. My HDF5 table has 4620 > rows > >> > and the column I'm iterating over is a 17x9600 boolean matrix. The > >> > __iter__ method is preallocating an array that is this size which > >> appears > >> > to be root of the error. I was hoping there is a fix somewhere in > here > >> to > >> > not have to do this preallocation. > >> > > >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of > >> these is ~760 MB. If you have 2 GB of memory and you are iterating > over 2 > >> of these (templates & masks) it is conceivable that you are just running > >> out of memory. Maybe there is a way that __iter__ could not preallocate > >> something that is basically a temporary. What is the dtype of the > >> templates array? > >> > >> Be Well > >> Anthony > >> > >> > >> > > >> > Thanks again. > >> > > >> > > >> > > >> > > >> > On Fri, Feb 1, 2013 at 11:12 AM, < > >> > pyt...@li...> wrote: > >> > > >> >> Send Pytables-users mailing list submissions to > >> >> pyt...@li... > >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> or, via email, send a message with subject or body 'help' to > >> >> pyt...@li... > >> >> > >> >> You can reach the person managing the list at > >> >> pyt...@li... > >> >> > >> >> When replying, please edit your Subject line so it is more specific > >> >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> > >> >> Today's Topics: > >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) > >> >> > >> >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> Message: 1 > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > >> >> From: Anthony Scopatz <sc...@gm...> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 9 > >> >> To: Discussion list for PyTables > >> >> <pyt...@li...> > >> >> Message-ID: > >> >> < > >> >> CAP...@ma...> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> Hi David, > >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to be getting > a > >> >> memory error on creating a numpy array. This kind of thing typically > >> >> happens when you are out of memory. Does this seem to be the case > with > >> >> you? When this dies, is your memory usage at 100%? If so, this > >> algorithm > >> >> might require a little tweaking... > >> >> > >> >> Be Well > >> >> Anthony > >> >> > >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm...> > >> >> wrote: > >> >> > >> >> > I'm still having problems with this one. I can't tell if this > >> something > >> >> > dumb Im doing with itertools, or if its something in pytables. > >> >> > > >> >> > Would appreciate any help. > >> >> > > >> >> > Thanks > >> >> > > >> >> > > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < > dav...@gm... > >> >> >wrote: > >> >> > > >> >> >> I think I have to reopen this issue. I have been running fine for > >> >> awhile > >> >> >> using the combinations method from itertools, but have recently > run > >> >> into a > >> >> >> memory since I have recently quadrupled the size of the hdf file. > >> >> >> > >> >> >> Here is my code again: > >> >> >> > >> >> >> from itertools import combinations, izip > >> >> >> with tb.openFile(h5_all, 'r') as f: > >> >> >> irises = f.root.irises > >> >> >> > >> >> >> templates = f.root.irises.cols.templates > >> >> >> masks = f.root.irises.cols.masks1 > >> >> >> > >> >> >> N_irises = len(irises) > >> >> >> index = np.ones((20 * 480), np.bool) > >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> >> >> D = np.empty((N_irises, N_irises)) > >> >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, > >> masks, > >> >> >> range(N_irises)), 2): > >> >> >> # print ii > >> >> >> D[ii, jj] = ham_dist( > >> >> >> t1[8, index], > >> >> >> t2[:, index], > >> >> >> m1[8, index], > >> >> >> m2[:, index], > >> >> >> ) > >> >> >> > >> >> >> And here is the error: > >> >> >> > >> >> >> In [10]: get_hd3() > >> >> >> 10669890 Comparisons > >> >> >> > >> >> >> > >> >> > >> > --------------------------------------------------------------------------- > >> >> >> MemoryError Traceback (most recent > >> call > >> >> >> last) > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > >> >> >> ----> 1 get_hd3() > >> >> >> > >> >> >> > >> >> >> 118 print '%i Comparisons' % > >> (N_irises*(N_irises - > >> >> >> 1)/2) > >> >> >> 119 D = np.empty((N_irises, N_irises)) > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in > >> >> >> combinations(izip(temp > >> >> >> lates, masks, range(N_irises)), 2): > >> >> >> 121 # print ii > >> >> >> 122 D[ii, jj] = ham_dist( > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) > >> >> >> 3274 for start_row in xrange(0, len(self), nrowsinbuf): > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, > max_row]) > >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, > >> >> >> field=self.pathname) > >> >> >> > >> >> >> 3277 for row in buf: > >> >> >> 3278 yield row > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, > start, > >> >> stop, > >> >> >> step, > >> >> >> field) > >> >> >> 1772 (start, stop, step) = > self._processRangeRead(start, > >> >> stop, > >> >> >> step) > >> >> >> 1773 > >> >> >> -> 1774 arr = self._read(start, stop, step, field) > >> >> >> 1775 return internal_to_flavor(arr, self.flavor) > >> >> >> 1776 > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, > start, > >> >> >> stop, step, > >> >> >> field) > >> >> >> 1719 if field: > >> >> >> 1720 # Create a container for the results > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> dtype=dtypeField) > >> >> >> 1722 else: > >> >> >> 1723 # Recarray case > >> >> >> > >> >> >> MemoryError: > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() > >> >> >> 1720 # Create a container for the results > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> dtype=dtypeField) > >> >> >> 1722 else: > >> >> >> > >> >> >> Also, if you guys see any performance problems in my code, please > >> let > >> >> me > >> >> >> know. > >> >> >> > >> >> >> Thank you so much for the help. > >> >> >> > >> >> >> -Dave > >> >> >> > >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > >> >> >> pyt...@li...> wrote: > >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> or, via email, send a message with subject or body 'help' to > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> You can reach the person managing the list at > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> When replying, please edit your Subject line so it is more > specific > >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> >>> > >> >> >>> > >> >> >>> Today's Topics: > >> >> >>> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) > >> >> >>> > >> >> >>> > >> >> >>> > >> ---------------------------------------------------------------------- > >> >> >>> > >> >> >>> Message: 1 > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > >> >> >>> From: David Reed <dav...@gm...> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > Issue > >> 8 > >> >> >>> To: pyt...@li... > >> >> >>> Message-ID: > >> >> >>> < > >> >> >>> > CAM...@ma... > >> > > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > >> >> >>> I can't thank you guys enough for the help. I was able to add > the > >> >> >>> __iter__ > >> >> >>> function to the table.py file and everything seems to be working > >> >> great! > >> >> >>> I'm not quite as fast as I was with iterating right of a matrix > >> but > >> >> >>> pretty > >> >> >>> close. I was at 555 comparisons per second, and now im at 420. > >> >> >>> > >> >> >>> I handled the problem I mentioned earlier by doing this, and it > >> seems > >> >> to > >> >> >>> work great: > >> >> >>> > >> >> >>> A = f.root.data.cols.A > >> >> >>> B = f.root.data.cols.B > >> >> >>> > >> >> >>> D = np.empty((len(A), len(A)) > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, > >> >> range(len(A))), > >> >> >>> 2): > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > >> >> >>> > >> >> >>> Again, thanks a lot. > >> >> >>> > >> >> >>> -Dave > >> >> >>> > >> >> >>> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > >> >> >>> pyt...@li...> wrote: > >> >> >>> > >> >> >>> > Send Pytables-users mailing list submissions to > >> >> >>> > pyt...@li... > >> >> >>> > > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit > >> >> >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > or, via email, send a message with subject or body 'help' to > >> >> >>> > pyt...@li... > >> >> >>> > > >> >> >>> > You can reach the person managing the list at > >> >> >>> > pyt...@li... > >> >> >>> > > >> >> >>> > When replying, please edit your Subject line so it is more > >> specific > >> >> >>> > than "Re: Contents of Pytables-users digest..." > >> >> >>> > > >> >> >>> > > >> >> >>> > Today's Topics: > >> >> >>> > > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony > >> Scopatz) > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony > >> Scopatz) > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> > ---------------------------------------------------------------------- > >> >> >>> > > >> >> >>> > Message: 1 > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > >> Issue 3 > >> >> >>> > To: Discussion list for PyTables > >> >> >>> > <pyt...@li...> > >> >> >>> > Message-ID: > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > >> >> >>> > Gz...@ma...> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > >> dav...@gm...> > >> >> >>> wrote: > >> >> >>> > > >> >> >>> > > Thanks a lot for the help so far guys! > >> >> >>> > > > >> >> >>> > > Looking at itertools, I found what I believe to be the > perfect > >> >> >>> function > >> >> >>> > > for what I need, itertools.combinations. This appears to be a > >> >> valid > >> >> >>> > > replacement to the method proposed. > >> >> >>> > > > >> >> >>> > > >> >> >>> > Yes, combinations is awesome! > >> >> >>> > > >> >> >>> > > >> >> >>> > > > >> >> >>> > > There is a small problem that I didn't mention is that my > >> compare > >> >> >>> > function > >> >> >>> > > actually takes as inputs 2 columns from the table. Like so: > >> >> >>> > > > >> >> >>> > > D = np.empty((N_irises, N_irises)) > >> >> >>> > > for ii in xrange(N_elements): > >> >> >>> > > for jj in xrange(ii+1, N_elements): > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], > >> >> >>> > data['element1'][jj],data['element2'][ii], > >> >> >>> > > data['element2'][jj]) > >> >> >>> > > > >> >> >>> > > Is there an efficient way of using itertools with this > >> structure? > >> >> >>> > > > >> >> >>> > > >> >> >>> > You can always make two other iterators for each column. Since > >> you > >> >> >>> have > >> >> >>> > two columns you would have 4 iterators. I am not sure how fast > >> >> this is > >> >> >>> > going to be but I am confident that there is definitely a way > to > >> do > >> >> >>> this in > >> >> >>> > one for-loop, which is going to be way faster than nested > loops. > >> >> >>> > > >> >> >>> > Be Well > >> >> >>> > Anthony > >> >> >>> > > >> >> >>> > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > >> >> >>> > > pyt...@li...> wrote: > >> >> >>> > > > >> >> >>> > >> Send Pytables-users mailing list submissions to > >> >> >>> > >> pyt...@li... > >> >> >>> > >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >>> > >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >> or, via email, send a message with subject or body 'help' to > >> >> >>> > >> pyt...@li... > >> >> >>> > >> > >> >> >>> > >> You can reach the person managing the list at > >> >> >>> > >> pyt...@li... > >> >> >>> > >> > >> >> >>> > >> When replying, please edit your Subject line so it is more > >> >> specific > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> Today's Topics: > >> >> >>> > >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh > Ayers) > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> ---------------------------------------------------------------------- > >> >> >>> > >> > >> >> >>> > >> Message: 1 > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> >> >>> > >> From: Josh Ayers <jos...@gm...> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > >> >> >>> PyTables > >> >> >>> > >> To: Discussion list for PyTables > >> >> >>> > >> <pyt...@li...> > >> >> >>> > >> Message-ID: > >> >> >>> > >> < > >> >> >>> > >> > >> >> CAC...@ma...> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > >> > >> >> >>> > >> David, > >> >> >>> > >> > >> >> >>> > >> The change in issue 27 was only for iteration over a > >> >> tables.Column > >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. This > >> will > >> >> >>> > iterate > >> >> >>> > >> over the "element" column, as in your original example. > >> >> >>> > >> > >> >> >>> > >> Note also that this will only work with the development > >> version > >> >> of > >> >> >>> > >> PyTables > >> >> >>> > >> available on github. It will be very slow using the > released > >> >> >>> v2.4.0. > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> from itertools import izip > >> >> >>> > >> > >> >> >>> > >> with tb.openFile(...) as f: > >> >> >>> > >> data = f.root.data.cols.element > >> >> >>> > >> data_i = iter(data) > >> >> >>> > >> data_j = iter(data) > >> >> >>> > >> data_i.next() # throw the first value away > >> >> >>> > >> for i, j in izip(data_i, data_j): > >> >> >>> > >> compare(i, j) > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> Hope that helps, > >> >> >>> > >> Josh > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> >> sc...@gm...> > >> >> >>> > >> wrote: > >> >> >>> > >> > >> >> >>> > >> > HI David, > >> >> >>> > >> > > >> >> >>> > >> > Tables and table column iteration have been overhauled > >> fairly > >> >> >>> recently > >> >> >>> > >> > [1]. So you might try creating two iterators, offset by > >> one, > >> >> and > >> >> >>> then > >> >> >>> > >> > doing the comparison. I am hacking this out super quick > so > >> >> please > >> >> >>> > >> forgive > >> >> >>> > >> > me: > >> >> >>> > >> > > >> >> >>> > >> > from itertools import izip > >> >> >>> > >> > > >> >> >>> > >> > with tb.openFile(...) as f: > >> >> >>> > >> > data = f.root.data > >> >> >>> > >> > data_i = iter(data) > >> >> >>> > >> > data_j = iter(data) > >> >> >>> > >> > data_i.next() # throw the first value away > >> >> >>> > >> > for i, j in izip(data_i, data_j): > >> >> >>> > >> > compare(i, j) > >> >> >>> > >> > > >> >> >>> > >> > You get the idea ;) > >> >> >>> > >> > > >> >> >>> > >> > Be Well > >> >> >>> > >> > Anthony > >> >> >>> > >> > > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >> >>> dav...@gm...> > >> >> >>> > >> wrote: > >> >> >>> > >> > > >> >> >>> > >> >> I was hoping someone could help me out here. > >> >> >>> > >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, > >> >> >>> > >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 and > >> >> access > >> >> >>> > using > >> >> >>> > >> >> PyTables. One operation I need to do on this dataset are > >> >> pairwise > >> >> >>> > >> >> comparisons between each of the elements. This requires 2 > >> >> loops, > >> >> >>> one > >> >> >>> > to > >> >> >>> > >> >> iterate over each element, and an inner loop to iterate > >> over > >> >> >>> every > >> >> >>> > >> other > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 > comparisons. > >> >> >>> > >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster to dump the > >> >> >>> contents > >> >> >>> > >> into a > >> >> >>> > >> >> multdimensional numpy array and then do my iteration. I > run > >> >> into > >> >> >>> > >> problems > >> >> >>> > >> >> with large sets because of memory issues and need to > access > >> >> each > >> >> >>> > >> element of > >> >> >>> > >> >> the dataset at run time. > >> >> >>> > >> >> > >> >> >>> > >> >> Putting the elements into an array gives me about 600 > >> >> >>> comparisons per > >> >> >>> > >> >> second, while operating on hdf5 data itself gives me > about > >> 300 > >> >> >>> > >> comparisons > >> >> >>> > >> >> per second. > >> >> >>> > >> >> > >> >> >>> > >> >> Is there a way to speed this process up? > >> >> >>> > >> >> > >> >> >>> > >> >> Example follows (this is not my real code, just an > >> example): > >> >> >>> > >> >> > >> >> >>> > >> >> *Small Set*: > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> >>> > >> >> data = f.root.data > >> >> >>> > >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > >> >> >>> > >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): > >> >> >>> > >> >> elements[ii] = data['element'] > >> >> >>> > >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > >> >> xrange(N_elements): > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > >> >> >>> > >> >> > >> >> >>> > >> >> *Large Set*: > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> >>> > >> >> data = f.root.data > >> >> >>> > >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> >>> > >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) > >> >> >>> > >> >> for ii in xrange(N_elements): > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], > >> >> >>> > >> data['element'][jj]) > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> >> HTML5, > >> >> >>> CSS, > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> >> skills > >> >> >>> > current > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by > >> >> >>> Microsoft > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn more > at: > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >> >> _______________________________________________ > >> >> >>> > >> >> Pytables-users mailing list > >> >> >>> > >> >> Pyt...@li... > >> >> >>> > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> >> HTML5, > >> >> >>> CSS, > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >> >>> > current > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by > >> >> Microsoft > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more > at: > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >> > _______________________________________________ > >> >> >>> > >> > Pytables-users mailing list > >> >> >>> > >> > Pyt...@li... > >> >> >>> > >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> -------------- next part -------------- > >> >> >>> > >> An HTML attachment was scrubbed... > >> >> >>> > >> > >> >> >>> > >> ------------------------------ > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >> >>> CSS, > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >> >>> current > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > >> >> Microsoft > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >> > >> >> >>> > >> ------------------------------ > >> >> >>> > >> > >> >> >>> > >> _______________________________________________ > >> >> >>> > >> Pytables-users mailing list > >> >> >>> > >> Pyt...@li... > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> >> >>> > >> ********************************************* > >> >> >>> > >> > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >> CSS, > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > skills > >> >> >>> current > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > >> Microsoft > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > > _______________________________________________ > >> >> >>> > > Pytables-users mailing list > >> >> >>> > > Pyt...@li... > >> >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > > > >> >> >>> > > > >> >> >>> > -------------- next part -------------- > >> >> >>> > An HTML attachment was scrubbed... > >> >> >>> > > >> >> >>> > ------------------------------ > >> >> >>> > > >> >> >>> > Message: 2 > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > >> Issue 4 > >> >> >>> > To: Discussion list for PyTables > >> >> >>> > <pyt...@li...> > >> >> >>> > Message-ID: > >> >> >>> > < > >> >> >>> > > >> CAP...@ma...> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > > >> >> >>> > Josh is right that you can just edit the code by hand (which > >> works > >> >> but > >> >> >>> > sucks). > >> >> >>> > > >> >> >>> > However, on Windows -- on the rare occasion when I also have to > >> >> >>> develop on > >> >> >>> > it -- I typically use a distribution that includes a compiler, > >> >> cython, > >> >> >>> > hdf5, and pytables already and then I install my development > >> version > >> >> >>> from > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, though > >> other > >> >> >>> > distributions listed here [1] might also work. > >> >> >>> > > >> >> >>> > Be well > >> >> >>> > Anthony > >> >> >>> > > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ > >> >> >>> > > >> >> >>> > > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < > jos...@gm... > >> > > >> >> >>> wrote: > >> >> >>> > > >> >> >>> > > The change was in pure Python code, so you should be able to > >> just > >> >> >>> paste > >> >> >>> > in > >> >> >>> > > the changes to your local copy. Start with the > >> >> table.Column.__iter__ > >> >> >>> > > method (lines 3296-3310) here. > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > >> >> >>> > >> >> > >> > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > >> >> >>> > > > >> >> >>> > > It needs to be modified slightly because it uses some > >> additional > >> >> >>> features > >> >> >>> > > that aren't available in the released version (the > >> out=buf_slice > >> >> >>> argument > >> >> >>> > > to table.read). The following should work. > >> >> >>> > > > >> >> >>> > > def __iter__(self): > >> >> >>> > > table = self.table > >> >> >>> > > itemsize = self.dtype.itemsize > >> >> >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] > // > >> >> >>> itemsize > >> >> >>> > > max_row = len(self) > >> >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): > >> >> >>> > > end_row = min([start_row + nrowsinbuf, max_row]) > >> >> >>> > > buf = table.read(start_row, end_row, 1, > >> >> >>> field=self.pathname) > >> >> >>> > > for row in buf: > >> >> >>> > > yield row > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > I haven't tested this, but I think it will work. > >> >> >>> > > > >> >> >>> > > Josh > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < > >> >> dav...@gm...> > >> >> >>> > wrote: > >> >> >>> > > > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm > forced > >> to > >> >> >>> work on > >> >> >>> > >> Windows 7 at work and have never had luck compiling python > >> source > >> >> >>> > >> successfully. I have had to rely on precompiled binaries > and > >> now > >> >> >>> its > >> >> >>> > >> biting me in the butt. > >> >> >>> > >> > >> >> >>> > >> Is there any quick fix I can do to improve this iteration > >> using > >> >> >>> v2.4.0? > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > >> >> >>> > >> pyt...@li...> wrote: > >> >> >>> > >> > >> >> >>> > >>> Send Pytables-users mailing list submissions to > >> >> >>> > >>> pyt...@li... > >> >> >>> > >>> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >>> > >>> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> or, via email, send a message with subject or body 'help' > to > >> >> >>> > >>> pyt...@li... > >> >> >>> > >>> > >> >> >>> > >>> You can reach the person managing the list at > >> >> >>> > >>> pyt...@li... > >> >> >>> > >>> > >> >> >>> > >>> When replying, please edit your Subject line so it is more > >> >> specific > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> Today's Topics: > >> >> >>> > >>> > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David > Reed) > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David > Reed) > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >> ---------------------------------------------------------------------- > >> >> >>> > >>> > >> >> >>> > >>> Message: 1 > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > >> >> >>> > >>> From: David Reed <dav...@gm...> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > 80, > >> >> Issue > >> >> >>> 2 > >> >> >>> > >>> To: pyt...@li... > >> >> >>> > >>> Message-ID: > >> >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > >> >> >>> > >>> ev...@ma...> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > >>> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't > think > >> >> that > >> >> >>> > method > >> >> >>> > >>> will work since this will only be comparing the ith element > >> with > >> >> >>> ith+1 > >> >> >>> > >>> element. I still need 2 for loops right? > >> >> >>> > >>> > >> >> >>> > >>> Using itertools might speed things up though, I've never > used > >> >> them > >> >> >>> so I > >> >> >>> > >>> will give it a shot and let you know how it goes. Looks > >> like I > >> >> >>> need to > >> >> >>> > >>> download the latest release before I do that too. Thanks > for > >> >> the > >> >> >>> help. > >> >> >>> > >>> > >> >> >>> > >>> -Dave > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > >> >> >>> > >>> pyt...@li...> wrote: > >> >> >>> > >>> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > >> >> >>> > >>> > pyt...@li... > >> >> >>> > >>> > > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >> >> >>> > >>> > > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> > or, via email, send a message with subject or body 'help' > >> to > >> >> >>> > >>> > pyt...@li... > >> >> >>> > >>> > > >> >> >>> > >>> > You can reach the person managing the list at > >> >> >>> > >>> > pyt...@li... > >> >> >>> > >>> > > >> >> >>> > >>> > When replying, please edit your Subject line so it is > more > >> >> >>> specific > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > Today's Topics: > >> >> >>> > >>> > > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables > (Anthony > >> >> >>> Scopatz) > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > > >> >> > ---------------------------------------------------------------------- > >> >> >>> > >>> > > >> >> >>> > >>> > Message: 1 > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 > >> using > >> >> >>> PyTables > >> >> >>> > >>> > To: Discussion list for PyTables > >> >> >>> > >>> > <pyt...@li...> > >> >> >>> > >>> > Message-ID: > >> >> >>> > >>> > <CAPk-6T5b= > >> >> >>> > >>> > 1EG...@ma... > > > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > >>> > > >> >> >>> > >>> > HI David, > >> >> >>> > >>> > > >> >> >>> > >>> > Tables and table column iteration have been overhauled > >> fairly > >> >> >>> > recently > >> >> >>> > >>> [1]. > >> >> >>> > >>> > So you might try creating two iterators, offset by one, > >> and > >> >> then > >> >> >>> > >>> doing the > >> >> >>> > >>> > comparison. I am hacking this out super quick so please > >> >> forgive > >> >> >>> me: > >> >> >>> > >>> > > >> >> >>> > >>> > from itertools import izip > >> >> >>> > >>> > > >> >> >>> > >>> > with tb.openFile(...) as f: > >> >> >>> > >>> > data = f.root.data > >> >> >>> > >>> > data_i = iter(data) > >> >> >>> > >>> > data_j = iter(data) > >> >> >>> > >>> > data_i.next() # throw the first value away > >> >> >>> > >>> > for i, j in izip(data_i, data_j): > >> >> >>> > >>> > compare(i, j) > >> >> >>> > >>> > > >> >> >>> > >>> > You get the idea ;) > >> >> >>> > >>> > > >> >> >>> > >>> > Be Well > >> >> >>> > >>> > Anthony > >> >> >>> > >>> > > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >> >>> dav...@gm...> > >> >> >>> > >>> wrote: > >> >> >>> > >>> > > >> >> >>> > >>> > > I was hoping someone could help me out here. > >> >> >>> > >>> > > > >> >> >>> > >>> > > This is from a post I put up on StackOverflow, > >> >> >>> > >>> > > > >> >> >>> > >>> > > I am have a fairly large dataset that I store in HDF5 > and > >> >> >>> access > >> >> >>> > >>> using > >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset > are > >> >> >>> pairwise > >> >> >>> > >>> > > comparisons between each of the elements. This > requires 2 > >> >> >>> loops, > >> >> >>> > one > >> >> >>> > >>> to > >> >> >>> > >>> > > iterate over each element, and an inner loop to iterate > >> over > >> >> >>> every > >> >> >>> > >>> other > >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 > >> comparisons. > >> >> >>> > >>> > > > >> >> >>> > >>> > > For fairly small sets I found it to be faster to dump > the > >> >> >>> contents > >> >> >>> > >>> into a > >> >> >>> > >>> > > multdimensional numpy array and then do my iteration. I > >> run > >> >> >>> into > >> >> >>> > >>> problems > >> >> >>> > >>> > > with large sets because of memory issues and need to > >> access > >> >> >>> each > >> >> >>> > >>> element > >> >> >>> > >>> > of > >> >> >>> > >>> > > the dataset at run time. > >> >> >>> > >>> > > > >> >> >>> > >>> > > Putting the elements into an array gives me about 600 > >> >> >>> comparisons > >> >> >>> > per > >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me > >> about > >> >> 300 > >> >> >>> > >>> > comparisons > >> >> >>> > >>> > > per second. > >> >> >>> > >>> > > > >> >> >>> > >>> > > Is there a way to speed this process up? > >> >> >>> > >>> > > > >> >> >>> > >>> > > Example follows (this is not my real code, just an > >> example): > >> >> >>> > >>> > > > >> >> >>> > >>> > > *Small Set*: > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > >> >> >>> > >>> > > data = f.root.data > >> >> >>> > >>> > > > >> >> >>> > >>> > > N_elements = len(data) > >> >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) > >> >> >>> > >>> > > > >> >> >>> > >>> > > for ii, d in enumerate(data): > >> >> >>> > >>> > > elements[ii] = data['element'] > >> >> >>> > >>> > > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in > >> >> >>> xrange(N_elements): > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > >> >> >>> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) > >> >> >>> > >>> > > > >> >> >>> > >>> > > *Large Set*: > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > >> >> >>> > >>> > > data = f.root.data > >> >> >>> > >>> > > > >> >> >>> > >>> > > N_elements = len(data) > >> >> >>> > >>> > > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) > >> >> >>> > >>> > > for ii in xrange(N_elements): > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > >> >> >>> > >>> > > D[ii, jj] = compare(data['element'][ii], > >> >> >>> > >>> > data['element'][jj]) > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > >> >> >>> HTML5, > >> >> >>> > CSS, > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep > your > >> >> skills > >> >> >>> > >>> current > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials > by > >> >> >>> Microsoft > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more > >> at: > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >>> > > _______________________________________________ > >> >> >>> > >>> > > Pytables-users mailing list > >> >> >>> > >>> > > Pyt...@li... > >> >> >>> > >>> > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > -------------- next part -------------- > >> >> >>> > >>> > An HTML attachment was scrubbed... > >> >> >>> > >>> > > >> >> >>> > >>> > ------------------------------ > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> >> HTML5, > >> >> >>> CSS, > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> >> skills > >> >> >>> > current > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by > >> >> >>> Microsoft > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more > at: > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >>> > > >> >> >>> > >>> > ------------------------------ > >> >> >>> > >>> > > >> >> >>> > >>> > _______________________________________________ > >> >> >>> > >>> > Pytables-users mailing list > >> >> >>> > >>> > Pyt...@li... > >> >> >>> > >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > >> >> >>> > >>> > ********************************************* > >> >> >>> > >>> > > >> >> >>> > >>> -------------- next part -------------- > >> >> >>> > >>> An HTML attachment was scrubbed... > >> >> >>> > >>> > >> >> >>> > >>> ------------------------------ > >> >> >>> > >>> > >> >> >>> > >>> Message: 2 > >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > >> >> >>> > >>> From: David Reed <dav...@gm...> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > 80, > >> >> Issue > >> >> >>> 3 > >> >> >>> > >>> To: pyt...@li... > >> >> >>> > >>> Message-ID: > >> >> >>> > >>> < > >> >> >>> > >>> > >> >> CAM...@ma... > >> >> >>> > > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > >>> > >> >> >>> > >>> Thanks a lot for the help so far guys! > >> >> >>> > >>> > >> >> >>> > >>> Looking at itertools, I found what I believe to be the > >> perfect > >> >> >>> function > >> >> >>> > >>> for > >> >> >>> > >>> what I need, itertools.combinations. This appears to be a > >> valid > >> >> >>> > >>> replacement > >> >> >>> > >>> to the method proposed. > >> >> >>> > >>> > >> >> >>> > >>> There is a small problem that I didn't mention is that my > >> >> compare > >> >> >>> > >>> function > >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like so: > >> >> >>> > >>> > >> >> >>> > >>> D = np.empty((N_irises, N_irises)) > >> >> >>> > >>> for ii in xrange(N_elements): > >> >> >>> > >>> for jj in xrange(ii+1, N_elements): > >> >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], > >> >> >>> > >>> data['element1'][jj],data['element2'][ii], > >> >> >>> > >>> data['element2'][jj]) > >> >> >>> > >>> > >> >> >>> > >>> Is there an efficient way of using itertools with this > >> >> structure? > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > >> >> >>> > >>> pyt...@li...> wrote: > >> >> >>> > >>> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > >> >> >>> > >>> > pyt...@li... > >> >> >>> > >>> > > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >> >> >>> > >>> > > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> > or, via email, send a message with subject or body 'help' > >> to > >> >> >>> > >>> > pyt...@li... > >> >> >>> > >>> > > >> >> >>> > >>> > You can reach the person managing the list at > >> >> >>> > >>> > pyt...@li... > >> >> >>> > >>> > > >> >> >>> > >>> > When replying, please edit your Subject line so it is > more > >> >> >>> specific > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > Today's Topics: > >> >> >>> > >>> > > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh > >> Ayers) > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > > >> >> > ---------------------------------------------------------------------- > >> >> >>> > >>> > > >> >> >>> > >>> > Message: 1 > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> >> >>> > >>> > From: Josh Ayers <jos...@gm...> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 > >> using > >> >> >>> PyTables > >> >> >>> > >>> > To: Discussion list for PyTables > >> >> >>> > >>> > <pyt...@li...> > >> >> >>> > >>> > Message-ID: > >> >> >>> > >>> > < > >> >> >>> > >>> > > >> >> >>> > CAC...@ma... > >> > > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > >>> > > >> >> >>> > >>> > David, > >> >> >>> > >>> > > >> >> >>> > >>> > The change in issue 27 was only for iteration over a > >> >> >>> tables.Column > >> >> >>> > >>> > instance. To use it, tweak Anthony's code as follows. > >> This > >> >> will > >> >> >>> > >>> iterate > >> >> >>> > >>> > over the "element" column, as in your original example. > >> >> >>> > >>> > > >> >> >>> > >>> > Note also that this will only work with the development > >> >> version > >> >> >>> of > >> >> >>> > >>> PyTables > >> >> >>> > >>> > available on github. It will be very slow using the > >> released > >> >> >>> v2.4.0. > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > from itertools import izip > >> >> >>> > >>> > > >> >> >>> > >>> > with tb.openFile(...) as f: > >> >> >>> > >>> > data = f.root.data.cols.element > >> >> >>> > >>> > data_i = iter(data) > >> >> >>> > >>> > data_j = iter(data) > >> >> >>> > >>> > data_i.next() # throw the first value away > >> >> >>> > >>> > for i, j in izip(data_i, data_j): > >> >> >>> > >>> > compare(i, j) > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > Hope that helps, > >> >> >>> > >>> > Josh > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> >> >>> sc...@gm...> > >> >> >>> > >>> wrote: > >> >> >>> > >>> > > >> >> >>> > >>> > > HI David, > >> >> >>> > >>> > > > >> >> >>> > >>> > > Tables and table column iteration have been overhauled > >> >> fairly > >> >> >>> > >>> recently > >> >> >>> > >>> > > [1]. So you might try creating two iterators, offset > by > >> >> one, > >> >> >>> and > >> >> >>> > >>> then > >> >> >>> > >>> > > doing the comparison. I am hacking this out super > quick > >> so > >> >> >>> please > >> >> >>> > >>> > forgive > >> >> >>> > >>> > > me: > >> >> >>> > >>> > > > >> >> >>> > >>> > > from itertools import izip > >> >> >>> > >>> > > > >> >> >>> > >>> > > with tb.openFile(...) as f: > >> >> >>> > >>> > > data = f.root.data > >> >> >>> > >>> > > data_i = iter(data) > >> >> >>> > >>> > > data_j = iter(data) > >> >> >>> > >>> > > data_i.next() # throw the first value away > >> >> >>> > >>> > > for i, j in izip(data_i, data_j): > >> >> >>> > >>> > > compare(i, j) > >> >> >>> > >>> > > > >> >> >>> > >>> > > You get the idea ;) > >> >> >>> > >>> > > > >> >> >>> > >>> > > Be Well > >> >> >>> > >>> > > Anthony > >> >> >>> > >>> > > > >> >> >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >> >>> dav...@gm... > >> >> >>> > > > >> >> >>> > >>> > wrote: > >> >> >>> > >>> > > > >> >> >>> > >>> > >> I was hoping someone could help me out here. > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> This is from a post I put up on StackOverflow, > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> I am have a fairly large dataset that I store in HDF5 > >> and > >> >> >>> access > >> >> >>> > >>> using > >> >> >>> > >>> > >> PyTables. One operation I need to do on this dataset > are > >> >> >>> pairwise > >> >> >>> > >>> > >> comparisons between each of the elements. This > requires > >> 2 > >> >> >>> loops, > >> >> >>> > >>> one to > >> >> >>> > >>> > >> iterate over each element, and an inner loop to > iterate > >> >> over > >> >> >>> every > >> >> >>> > >>> other > >> >> >>> > >>> > >> element. This operation thus looks at N(N-1)/2 > >> comparisons. > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> For fairly small sets I found it to be faster to dump > >> the > >> >> >>> contents > >> >> >>> > >>> into > >> >> >>> > >>> > a > >> >> >>> > >>> > >> multdimensional numpy array and then do my iteration. > I > >> run > >> >> >>> into > >> >> >>> > >>> > problems > >> >> >>> > >>> > >> with large sets because of memory issues and need to > >> access > >> >> >>> each > >> >> >>> > >>> > element of > >> >> >>> > >>> > >> the dataset at run time. > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> Putting the elements into an array gives me about 600 > >> >> >>> comparisons > >> >> >>> > >>> per > >> >> >>> > >>> > >> second, while operating on hdf5 data itself gives me > >> about > >> >> 300 > >> >> >>> > >>> > comparisons > >> >> >>> > >>> > >> per second. > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> Is there a way to speed this process up? > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> Example follows (this is not my real code, just an > >> >> example): > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> *Small Set*: > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >> >> >>> > >>> > >> data = f.root.data > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> N_elements = len(data) > >> >> >>> > >>> > >> elements = np.empty((N_irises, 1e5)) > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> for ii, d in enumerate(data): > >> >> >>> > >>> > >> elements[ii] = data['element'] > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in > >> >> >>> xrange(N_elements): > >> >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): > >> >> >>> > >>> > >> D[ii, jj] = compare(elements[ii], > elements[jj]) > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> *Large Set*: > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >> >> >>> > >>> > >> data = f.root.data > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> N_elements = len(data) > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) > >> >> >>> > >>> > >> for ii in xrange(N_elements): > >> >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): > >> >> >>> > >>> > >> D[ii, jj] = compare(data['element'][ii], > >> >> >>> > >>> > data['element'][jj]) > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# > >> 2012, > >> >> >>> HTML5, > >> >> >>> > >>> CSS, > >> >> >>> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep > your > >> >> >>> skills > >> >> >>> > >>> current > >> >> >>> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials > by > >> >> >>> Microsoft > >> >> >>> > >>> > >> MVPs and experts. ON SALE this month only -- learn > more > >> at: > >> >> >>> > >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >>> > >> _______________________________________________ > >> >> >>> > >>> > >> Pytables-users mailing list > >> >> >>> > >>> > >> Pyt...@li... > >> >> >>> > >>> > >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > >> >> >>> HTML5, > >> >> >>> > CSS, > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep > your > >> >> skills > >> >> >>> > >>> current > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials > by > >> >> >>> Microsoft > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more > >> at: > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >>> > > _______________________________________________ > >> >> >>> > >>> > > Pytables-users mailing list > >> >> >>> > >>> > > Pyt...@li... > >> >> >>> > >>> > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> > > > >> >> >>> > >>> > > > >> >> >>> > >>> > -------------- next part -------------- > >> >> >>> > >>> > An HTML attachment was scrubbed... > >> >> >>> > >>> > > >> >> >>> > >>> > ------------------------------ > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> >> HTML5, > >> >> >>> CSS, > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> >> skills > >> >> >>> > current > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by > >> >> >>> Microsoft > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more > at: > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >>> > > >> >> >>> > >>> > ------------------------------ > >> >> >>> > >>> > > >> >> >>> > >>> > _______________________________________________ > >> >> >>> > >>> > Pytables-users mailing list > >> >> >>> > >>> > Pyt...@li... > >> >> >>> > >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 > >> >> >>> > >>> > ********************************************* > >> >> >>> > >>> > > >> >> >>> > >>> -------------- next part -------------- > >> >> >>> > >>> An HTML attachment was scrubbed... > >> >> >>> > >>> > >> >> >>> > >>> ------------------------------ > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >> >>> CSS, > >> >> >>> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >> >>> current > >> >> >>> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by > >> >> Microsoft > >> >> >>> > >>> MVPs and experts. ON SALE this month only -- learn more at: > >> >> >>> > >>> http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >>> > >> >> >>> > >>> ------------------------------ > >> >> >>> > >>> > >> >> >>> > >>> _______________________________________________ > >> >> >>> > >>> Pytables-users mailing list > >> >> >>> > >>> Pyt...@li... > >> >> >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 > >> >> >>> > >>> ********************************************* > >> >> >>> > >>> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >> >>> CSS, > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >> >>> current > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > >> >> Microsoft > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > >> _______________________________________________ > >> >> >>> > >> Pytables-users mailing list > >> >> >>> > >> Pyt...@li... > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >> CSS, > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > skills > >> >> >>> current > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > >> Microsoft > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > > _______________________________________________ > >> >> >>> > > Pytables-users mailing list > >> >> >>> > > Pyt...@li... > >> >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > > > >> >> >>> > > > >> >> >>> > -------------- next part -------------- > >> >> >>> > An HTML attachment was scrubbed... > >> >> >>> > > >> >> >>> > ------------------------------ > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > HTML5, > >> >> CSS, > >> >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >> >> current > >> >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by > >> Microsoft > >> >> >>> > MVPs and experts. ON SALE this month only -- learn more at: > >> >> >>> > http://p.sf.net/sfu/learnmore_122712 > >> >> >>> > > >> >> >>> > ------------------------------ > >> >> >>> > > >> >> >>> > _______________________________________________ > >> >> >>> > Pytables-users mailing list > >> >> >>> > Pyt...@li... > >> >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> > > >> >> >>> > > >> >> >>> > End of Pytables-users Digest, Vol 80, Issue 8 > >> >> >>> > ********************************************* > >> >> >>> > > >> >> >>> -------------- ... [truncated message content] |
From: Josh A. <jos...@gm...> - 2013-02-01 22:08:53
|
David, You added a custom version of table.Column.__iter__, correct? Could you also include that along with the script to reproduce the error? It seems like the problem may be in the 'nrowsinbuf' calculation - see [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using the default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6 chunks. Instead, it's reading the entire table. [1]: https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> wrote: > > > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> wrote: > >> at the error: >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> I'm not sure what that means as a dtype, but thats what it is. >> >> Forgive me if I'm being totally naive, but I thought the whole point of >> __iter__ with pyttables was to do iteration on the fly, so there is no >> preallocation. >> > > Nope you are not being naive at all. That is the point. > > >> If you have any ideas on this I'm all ears. >> > > If you could send a minimal script which reproduces this error, that would > help a lot. > > Be Well > Anthony > > >> >> >> Thanks again. >> >> Dave >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < >> pyt...@li...> wrote: >> >>> Send Pytables-users mailing list submissions to >>> pyt...@li... >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> or, via email, send a message with subject or body 'help' to >>> pyt...@li... >>> >>> You can reach the person managing the list at >>> pyt...@li... >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of Pytables-users digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 >>> From: Anthony Scopatz <sc...@gm...> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 >>> To: Discussion list for PyTables >>> <pyt...@li...> >>> Message-ID: >>> < >>> CAP...@ma...> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> >>> wrote: >>> >>> > Hi Anthony, >>> > >>> > Thanks for the reply. >>> > >>> > I honestly don't know how to monitor my Python memory usage, but I'm >>> sure >>> > that its caused by out of memory. >>> > >>> >>> Well, I would just run top or process monitor or something while running >>> the python script to see what happens to memory usage as the script chugs >>> along... >>> >>> >>> > I'm just trying to find out how to fix it. My HDF5 table has 4620 >>> rows >>> > and the column I'm iterating over is a 17x9600 boolean matrix. The >>> > __iter__ method is preallocating an array that is this size which >>> appears >>> > to be root of the error. I was hoping there is a fix somewhere in >>> here to >>> > not have to do this preallocation. >>> > >>> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of >>> these is ~760 MB. If you have 2 GB of memory and you are iterating over >>> 2 >>> of these (templates & masks) it is conceivable that you are just running >>> out of memory. Maybe there is a way that __iter__ could not preallocate >>> something that is basically a temporary. What is the dtype of the >>> templates array? >>> >>> Be Well >>> Anthony >>> >>> >>> > >>> > Thanks again. >>> >>> |
From: Anthony S. <sc...@gm...> - 2013-02-01 21:50:43
|
On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> wrote: > at the error: > > result = numpy.empty(shape=nrows, dtype=dtypeField) > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > > I'm not sure what that means as a dtype, but thats what it is. > > Forgive me if I'm being totally naive, but I thought the whole point of > __iter__ with pyttables was to do iteration on the fly, so there is no > preallocation. > Nope you are not being naive at all. That is the point. > If you have any ideas on this I'm all ears. > If you could send a minimal script which reproduces this error, that would help a lot. Be Well Anthony > > > Thanks again. > > Dave > > > On Fri, Feb 1, 2013 at 3:45 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> >> wrote: >> >> > Hi Anthony, >> > >> > Thanks for the reply. >> > >> > I honestly don't know how to monitor my Python memory usage, but I'm >> sure >> > that its caused by out of memory. >> > >> >> Well, I would just run top or process monitor or something while running >> the python script to see what happens to memory usage as the script chugs >> along... >> >> >> > I'm just trying to find out how to fix it. My HDF5 table has 4620 rows >> > and the column I'm iterating over is a 17x9600 boolean matrix. The >> > __iter__ method is preallocating an array that is this size which >> appears >> > to be root of the error. I was hoping there is a fix somewhere in here >> to >> > not have to do this preallocation. >> > >> >> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of >> these is ~760 MB. If you have 2 GB of memory and you are iterating over 2 >> of these (templates & masks) it is conceivable that you are just running >> out of memory. Maybe there is a way that __iter__ could not preallocate >> something that is basically a temporary. What is the dtype of the >> templates array? >> >> Be Well >> Anthony >> >> >> > >> > Thanks again. >> > >> > >> > >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < >> > pyt...@li...> wrote: >> > >> >> Send Pytables-users mailing list submissions to >> >> pyt...@li... >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> or, via email, send a message with subject or body 'help' to >> >> pyt...@li... >> >> >> >> You can reach the person managing the list at >> >> pyt...@li... >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> >> From: Anthony Scopatz <sc...@gm...> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 9 >> >> To: Discussion list for PyTables >> >> <pyt...@li...> >> >> Message-ID: >> >> < >> >> CAP...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> Hi David, >> >> >> >> Sorry, I haven't had a ton of time recently. You seem to be getting a >> >> memory error on creating a numpy array. This kind of thing typically >> >> happens when you are out of memory. Does this seem to be the case with >> >> you? When this dies, is your memory usage at 100%? If so, this >> algorithm >> >> might require a little tweaking... >> >> >> >> Be Well >> >> Anthony >> >> >> >> >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm...> >> >> wrote: >> >> >> >> > I'm still having problems with this one. I can't tell if this >> something >> >> > dumb Im doing with itertools, or if its something in pytables. >> >> > >> >> > Would appreciate any help. >> >> > >> >> > Thanks >> >> > >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed <dav...@gm... >> >> >wrote: >> >> > >> >> >> I think I have to reopen this issue. I have been running fine for >> >> awhile >> >> >> using the combinations method from itertools, but have recently run >> >> into a >> >> >> memory since I have recently quadrupled the size of the hdf file. >> >> >> >> >> >> Here is my code again: >> >> >> >> >> >> from itertools import combinations, izip >> >> >> with tb.openFile(h5_all, 'r') as f: >> >> >> irises = f.root.irises >> >> >> >> >> >> templates = f.root.irises.cols.templates >> >> >> masks = f.root.irises.cols.masks1 >> >> >> >> >> >> N_irises = len(irises) >> >> >> index = np.ones((20 * 480), np.bool) >> >> >> >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> >> D = np.empty((N_irises, N_irises)) >> >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, >> masks, >> >> >> range(N_irises)), 2): >> >> >> # print ii >> >> >> D[ii, jj] = ham_dist( >> >> >> t1[8, index], >> >> >> t2[:, index], >> >> >> m1[8, index], >> >> >> m2[:, index], >> >> >> ) >> >> >> >> >> >> And here is the error: >> >> >> >> >> >> In [10]: get_hd3() >> >> >> 10669890 Comparisons >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------------- >> >> >> MemoryError Traceback (most recent >> call >> >> >> last) >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> >> >> ----> 1 get_hd3() >> >> >> >> >> >> >> >> >> 118 print '%i Comparisons' % >> (N_irises*(N_irises - >> >> >> 1)/2) >> >> >> 119 D = np.empty((N_irises, N_irises)) >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in >> >> >> combinations(izip(temp >> >> >> lates, masks, range(N_irises)), 2): >> >> >> 121 # print ii >> >> >> 122 D[ii, jj] = ham_dist( >> >> >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) >> >> >> 3274 for start_row in xrange(0, len(self), nrowsinbuf): >> >> >> 3275 end_row = min([start_row + nrowsinbuf, max_row]) >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, >> >> >> field=self.pathname) >> >> >> >> >> >> 3277 for row in buf: >> >> >> 3278 yield row >> >> >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, start, >> >> stop, >> >> >> step, >> >> >> field) >> >> >> 1772 (start, stop, step) = self._processRangeRead(start, >> >> stop, >> >> >> step) >> >> >> 1773 >> >> >> -> 1774 arr = self._read(start, stop, step, field) >> >> >> 1775 return internal_to_flavor(arr, self.flavor) >> >> >> 1776 >> >> >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, >> >> >> stop, step, >> >> >> field) >> >> >> 1719 if field: >> >> >> 1720 # Create a container for the results >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> dtype=dtypeField) >> >> >> 1722 else: >> >> >> 1723 # Recarray case >> >> >> >> >> >> MemoryError: >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() >> >> >> 1720 # Create a container for the results >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> dtype=dtypeField) >> >> >> 1722 else: >> >> >> >> >> >> Also, if you guys see any performance problems in my code, please >> let >> >> me >> >> >> know. >> >> >> >> >> >> Thank you so much for the help. >> >> >> >> >> >> -Dave >> >> >> >> >> >> >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> >> >> pyt...@li...> wrote: >> >> >> >> >> >>> Send Pytables-users mailing list submissions to >> >> >>> pyt...@li... >> >> >>> >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> or, via email, send a message with subject or body 'help' to >> >> >>> pyt...@li... >> >> >>> >> >> >>> You can reach the person managing the list at >> >> >>> pyt...@li... >> >> >>> >> >> >>> When replying, please edit your Subject line so it is more specific >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >>> >> >> >>> >> >> >>> Today's Topics: >> >> >>> >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) >> >> >>> >> >> >>> >> >> >>> >> ---------------------------------------------------------------------- >> >> >>> >> >> >>> Message: 1 >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> >> >>> From: David Reed <dav...@gm...> >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue >> 8 >> >> >>> To: pyt...@li... >> >> >>> Message-ID: >> >> >>> < >> >> >>> CAM...@ma... >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> >> >> >>> I can't thank you guys enough for the help. I was able to add the >> >> >>> __iter__ >> >> >>> function to the table.py file and everything seems to be working >> >> great! >> >> >>> I'm not quite as fast as I was with iterating right of a matrix >> but >> >> >>> pretty >> >> >>> close. I was at 555 comparisons per second, and now im at 420. >> >> >>> >> >> >>> I handled the problem I mentioned earlier by doing this, and it >> seems >> >> to >> >> >>> work great: >> >> >>> >> >> >>> A = f.root.data.cols.A >> >> >>> B = f.root.data.cols.B >> >> >>> >> >> >>> D = np.empty((len(A), len(A)) >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, >> >> range(len(A))), >> >> >>> 2): >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> >> >>> >> >> >>> Again, thanks a lot. >> >> >>> >> >> >>> -Dave >> >> >>> >> >> >>> >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> >> >>> pyt...@li...> wrote: >> >> >>> >> >> >>> > Send Pytables-users mailing list submissions to >> >> >>> > pyt...@li... >> >> >>> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > or, via email, send a message with subject or body 'help' to >> >> >>> > pyt...@li... >> >> >>> > >> >> >>> > You can reach the person managing the list at >> >> >>> > pyt...@li... >> >> >>> > >> >> >>> > When replying, please edit your Subject line so it is more >> specific >> >> >>> > than "Re: Contents of Pytables-users digest..." >> >> >>> > >> >> >>> > >> >> >>> > Today's Topics: >> >> >>> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony >> Scopatz) >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony >> Scopatz) >> >> >>> > >> >> >>> > >> >> >>> > >> >> ---------------------------------------------------------------------- >> >> >>> > >> >> >>> > Message: 1 >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> Issue 3 >> >> >>> > To: Discussion list for PyTables >> >> >>> > <pyt...@li...> >> >> >>> > Message-ID: >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> >> >>> > Gz...@ma...> >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < >> dav...@gm...> >> >> >>> wrote: >> >> >>> > >> >> >>> > > Thanks a lot for the help so far guys! >> >> >>> > > >> >> >>> > > Looking at itertools, I found what I believe to be the perfect >> >> >>> function >> >> >>> > > for what I need, itertools.combinations. This appears to be a >> >> valid >> >> >>> > > replacement to the method proposed. >> >> >>> > > >> >> >>> > >> >> >>> > Yes, combinations is awesome! >> >> >>> > >> >> >>> > >> >> >>> > > >> >> >>> > > There is a small problem that I didn't mention is that my >> compare >> >> >>> > function >> >> >>> > > actually takes as inputs 2 columns from the table. Like so: >> >> >>> > > >> >> >>> > > D = np.empty((N_irises, N_irises)) >> >> >>> > > for ii in xrange(N_elements): >> >> >>> > > for jj in xrange(ii+1, N_elements): >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], >> >> >>> > data['element1'][jj],data['element2'][ii], >> >> >>> > > data['element2'][jj]) >> >> >>> > > >> >> >>> > > Is there an efficient way of using itertools with this >> structure? >> >> >>> > > >> >> >>> > >> >> >>> > You can always make two other iterators for each column. Since >> you >> >> >>> have >> >> >>> > two columns you would have 4 iterators. I am not sure how fast >> >> this is >> >> >>> > going to be but I am confident that there is definitely a way to >> do >> >> >>> this in >> >> >>> > one for-loop, which is going to be way faster than nested loops. >> >> >>> > >> >> >>> > Be Well >> >> >>> > Anthony >> >> >>> > >> >> >>> > >> >> >>> > > >> >> >>> > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> >> >>> > > pyt...@li...> wrote: >> >> >>> > > >> >> >>> > >> Send Pytables-users mailing list submissions to >> >> >>> > >> pyt...@li... >> >> >>> > >> >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> or, via email, send a message with subject or body 'help' to >> >> >>> > >> pyt...@li... >> >> >>> > >> >> >> >>> > >> You can reach the person managing the list at >> >> >>> > >> pyt...@li... >> >> >>> > >> >> >> >>> > >> When replying, please edit your Subject line so it is more >> >> specific >> >> >>> > >> than "Re: Contents of Pytables-users digest..." >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> Today's Topics: >> >> >>> > >> >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> >> ---------------------------------------------------------------------- >> >> >>> > >> >> >> >>> > >> Message: 1 >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> >>> > >> From: Josh Ayers <jos...@gm...> >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> >> >>> PyTables >> >> >>> > >> To: Discussion list for PyTables >> >> >>> > >> <pyt...@li...> >> >> >>> > >> Message-ID: >> >> >>> > >> < >> >> >>> > >> >> >> CAC...@ma...> >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >> >> >> >>> > >> David, >> >> >>> > >> >> >> >>> > >> The change in issue 27 was only for iteration over a >> >> tables.Column >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. This >> will >> >> >>> > iterate >> >> >>> > >> over the "element" column, as in your original example. >> >> >>> > >> >> >> >>> > >> Note also that this will only work with the development >> version >> >> of >> >> >>> > >> PyTables >> >> >>> > >> available on github. It will be very slow using the released >> >> >>> v2.4.0. >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> from itertools import izip >> >> >>> > >> >> >> >>> > >> with tb.openFile(...) as f: >> >> >>> > >> data = f.root.data.cols.element >> >> >>> > >> data_i = iter(data) >> >> >>> > >> data_j = iter(data) >> >> >>> > >> data_i.next() # throw the first value away >> >> >>> > >> for i, j in izip(data_i, data_j): >> >> >>> > >> compare(i, j) >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> Hope that helps, >> >> >>> > >> Josh >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> >> sc...@gm...> >> >> >>> > >> wrote: >> >> >>> > >> >> >> >>> > >> > HI David, >> >> >>> > >> > >> >> >>> > >> > Tables and table column iteration have been overhauled >> fairly >> >> >>> recently >> >> >>> > >> > [1]. So you might try creating two iterators, offset by >> one, >> >> and >> >> >>> then >> >> >>> > >> > doing the comparison. I am hacking this out super quick so >> >> please >> >> >>> > >> forgive >> >> >>> > >> > me: >> >> >>> > >> > >> >> >>> > >> > from itertools import izip >> >> >>> > >> > >> >> >>> > >> > with tb.openFile(...) as f: >> >> >>> > >> > data = f.root.data >> >> >>> > >> > data_i = iter(data) >> >> >>> > >> > data_j = iter(data) >> >> >>> > >> > data_i.next() # throw the first value away >> >> >>> > >> > for i, j in izip(data_i, data_j): >> >> >>> > >> > compare(i, j) >> >> >>> > >> > >> >> >>> > >> > You get the idea ;) >> >> >>> > >> > >> >> >>> > >> > Be Well >> >> >>> > >> > Anthony >> >> >>> > >> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >>> dav...@gm...> >> >> >>> > >> wrote: >> >> >>> > >> > >> >> >>> > >> >> I was hoping someone could help me out here. >> >> >>> > >> >> >> >> >>> > >> >> This is from a post I put up on StackOverflow, >> >> >>> > >> >> >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 and >> >> access >> >> >>> > using >> >> >>> > >> >> PyTables. One operation I need to do on this dataset are >> >> pairwise >> >> >>> > >> >> comparisons between each of the elements. This requires 2 >> >> loops, >> >> >>> one >> >> >>> > to >> >> >>> > >> >> iterate over each element, and an inner loop to iterate >> over >> >> >>> every >> >> >>> > >> other >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> >> >>> > >> >> >> >> >>> > >> >> For fairly small sets I found it to be faster to dump the >> >> >>> contents >> >> >>> > >> into a >> >> >>> > >> >> multdimensional numpy array and then do my iteration. I run >> >> into >> >> >>> > >> problems >> >> >>> > >> >> with large sets because of memory issues and need to access >> >> each >> >> >>> > >> element of >> >> >>> > >> >> the dataset at run time. >> >> >>> > >> >> >> >> >>> > >> >> Putting the elements into an array gives me about 600 >> >> >>> comparisons per >> >> >>> > >> >> second, while operating on hdf5 data itself gives me about >> 300 >> >> >>> > >> comparisons >> >> >>> > >> >> per second. >> >> >>> > >> >> >> >> >>> > >> >> Is there a way to speed this process up? >> >> >>> > >> >> >> >> >>> > >> >> Example follows (this is not my real code, just an >> example): >> >> >>> > >> >> >> >> >>> > >> >> *Small Set*: >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >>> > >> >> data = f.root.data >> >> >>> > >> >> >> >> >>> > >> >> N_elements = len(data) >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> >> >>> > >> >> >> >> >>> > >> >> for ii, d in enumerate(data): >> >> >>> > >> >> elements[ii] = data['element'] >> >> >>> > >> >> >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> >> xrange(N_elements): >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >>> > >> >> >> >> >>> > >> >> *Large Set*: >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >>> > >> >> data = f.root.data >> >> >>> > >> >> >> >> >>> > >> >> N_elements = len(data) >> >> >>> > >> >> >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> >> >>> > >> >> for ii in xrange(N_elements): >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], >> >> >>> > >> data['element'][jj]) >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> HTML5, >> >> >>> CSS, >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > current >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> >> _______________________________________________ >> >> >>> > >> >> Pytables-users mailing list >> >> >>> > >> >> Pyt...@li... >> >> >>> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> HTML5, >> >> >>> CSS, >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >> >>> > current >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> Microsoft >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> > _______________________________________________ >> >> >>> > >> > Pytables-users mailing list >> >> >>> > >> > Pyt...@li... >> >> >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> -------------- next part -------------- >> >> >>> > >> An HTML attachment was scrubbed... >> >> >>> > >> >> >> >>> > >> ------------------------------ >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> >>> CSS, >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >> >>> current >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> Microsoft >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> >> >> >>> > >> ------------------------------ >> >> >>> > >> >> >> >>> > >> _______________________________________________ >> >> >>> > >> Pytables-users mailing list >> >> >>> > >> Pyt...@li... >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> >> >>> > >> ********************************************* >> >> >>> > >> >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> CSS, >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >> >>> current >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > > _______________________________________________ >> >> >>> > > Pytables-users mailing list >> >> >>> > > Pyt...@li... >> >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > > >> >> >>> > > >> >> >>> > -------------- next part -------------- >> >> >>> > An HTML attachment was scrubbed... >> >> >>> > >> >> >>> > ------------------------------ >> >> >>> > >> >> >>> > Message: 2 >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> Issue 4 >> >> >>> > To: Discussion list for PyTables >> >> >>> > <pyt...@li...> >> >> >>> > Message-ID: >> >> >>> > < >> >> >>> > >> CAP...@ma...> >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >> >> >>> > Josh is right that you can just edit the code by hand (which >> works >> >> but >> >> >>> > sucks). >> >> >>> > >> >> >>> > However, on Windows -- on the rare occasion when I also have to >> >> >>> develop on >> >> >>> > it -- I typically use a distribution that includes a compiler, >> >> cython, >> >> >>> > hdf5, and pytables already and then I install my development >> version >> >> >>> from >> >> >>> > github OVER this. I recommend either EPD or Anaconda, though >> other >> >> >>> > distributions listed here [1] might also work. >> >> >>> > >> >> >>> > Be well >> >> >>> > Anthony >> >> >>> > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ >> >> >>> > >> >> >>> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm... >> > >> >> >>> wrote: >> >> >>> > >> >> >>> > > The change was in pure Python code, so you should be able to >> just >> >> >>> paste >> >> >>> > in >> >> >>> > > the changes to your local copy. Start with the >> >> table.Column.__iter__ >> >> >>> > > method (lines 3296-3310) here. >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > >> >> >>> >> >> >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> >> >>> > > >> >> >>> > > It needs to be modified slightly because it uses some >> additional >> >> >>> features >> >> >>> > > that aren't available in the released version (the >> out=buf_slice >> >> >>> argument >> >> >>> > > to table.read). The following should work. >> >> >>> > > >> >> >>> > > def __iter__(self): >> >> >>> > > table = self.table >> >> >>> > > itemsize = self.dtype.itemsize >> >> >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >> >> >>> itemsize >> >> >>> > > max_row = len(self) >> >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): >> >> >>> > > end_row = min([start_row + nrowsinbuf, max_row]) >> >> >>> > > buf = table.read(start_row, end_row, 1, >> >> >>> field=self.pathname) >> >> >>> > > for row in buf: >> >> >>> > > yield row >> >> >>> > > >> >> >>> > > >> >> >>> > > I haven't tested this, but I think it will work. >> >> >>> > > >> >> >>> > > Josh >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < >> >> dav...@gm...> >> >> >>> > wrote: >> >> >>> > > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm forced >> to >> >> >>> work on >> >> >>> > >> Windows 7 at work and have never had luck compiling python >> source >> >> >>> > >> successfully. I have had to rely on precompiled binaries and >> now >> >> >>> its >> >> >>> > >> biting me in the butt. >> >> >>> > >> >> >> >>> > >> Is there any quick fix I can do to improve this iteration >> using >> >> >>> v2.4.0? >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> >> >>> > >> pyt...@li...> wrote: >> >> >>> > >> >> >> >>> > >>> Send Pytables-users mailing list submissions to >> >> >>> > >>> pyt...@li... >> >> >>> > >>> >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >>> >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> or, via email, send a message with subject or body 'help' to >> >> >>> > >>> pyt...@li... >> >> >>> > >>> >> >> >>> > >>> You can reach the person managing the list at >> >> >>> > >>> pyt...@li... >> >> >>> > >>> >> >> >>> > >>> When replying, please edit your Subject line so it is more >> >> specific >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> Today's Topics: >> >> >>> > >>> >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> >> ---------------------------------------------------------------------- >> >> >>> > >>> >> >> >>> > >>> Message: 1 >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> >> >>> > >>> From: David Reed <dav...@gm...> >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> >> Issue >> >> >>> 2 >> >> >>> > >>> To: pyt...@li... >> >> >>> > >>> Message-ID: >> >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> >> >>> > >>> ev...@ma...> >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >>> >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't think >> >> that >> >> >>> > method >> >> >>> > >>> will work since this will only be comparing the ith element >> with >> >> >>> ith+1 >> >> >>> > >>> element. I still need 2 for loops right? >> >> >>> > >>> >> >> >>> > >>> Using itertools might speed things up though, I've never used >> >> them >> >> >>> so I >> >> >>> > >>> will give it a shot and let you know how it goes. Looks >> like I >> >> >>> need to >> >> >>> > >>> download the latest release before I do that too. Thanks for >> >> the >> >> >>> help. >> >> >>> > >>> >> >> >>> > >>> -Dave >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> >> >>> > >>> pyt...@li...> wrote: >> >> >>> > >>> >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >>> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > or, via email, send a message with subject or body 'help' >> to >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > You can reach the person managing the list at >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > When replying, please edit your Subject line so it is more >> >> >>> specific >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > Today's Topics: >> >> >>> > >>> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony >> >> >>> Scopatz) >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >> >> ---------------------------------------------------------------------- >> >> >>> > >>> > >> >> >>> > >>> > Message: 1 >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> using >> >> >>> PyTables >> >> >>> > >>> > To: Discussion list for PyTables >> >> >>> > >>> > <pyt...@li...> >> >> >>> > >>> > Message-ID: >> >> >>> > >>> > <CAPk-6T5b= >> >> >>> > >>> > 1EG...@ma...> >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >>> > >> >> >>> > >>> > HI David, >> >> >>> > >>> > >> >> >>> > >>> > Tables and table column iteration have been overhauled >> fairly >> >> >>> > recently >> >> >>> > >>> [1]. >> >> >>> > >>> > So you might try creating two iterators, offset by one, >> and >> >> then >> >> >>> > >>> doing the >> >> >>> > >>> > comparison. I am hacking this out super quick so please >> >> forgive >> >> >>> me: >> >> >>> > >>> > >> >> >>> > >>> > from itertools import izip >> >> >>> > >>> > >> >> >>> > >>> > with tb.openFile(...) as f: >> >> >>> > >>> > data = f.root.data >> >> >>> > >>> > data_i = iter(data) >> >> >>> > >>> > data_j = iter(data) >> >> >>> > >>> > data_i.next() # throw the first value away >> >> >>> > >>> > for i, j in izip(data_i, data_j): >> >> >>> > >>> > compare(i, j) >> >> >>> > >>> > >> >> >>> > >>> > You get the idea ;) >> >> >>> > >>> > >> >> >>> > >>> > Be Well >> >> >>> > >>> > Anthony >> >> >>> > >>> > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >>> dav...@gm...> >> >> >>> > >>> wrote: >> >> >>> > >>> > >> >> >>> > >>> > > I was hoping someone could help me out here. >> >> >>> > >>> > > >> >> >>> > >>> > > This is from a post I put up on StackOverflow, >> >> >>> > >>> > > >> >> >>> > >>> > > I am have a fairly large dataset that I store in HDF5 and >> >> >>> access >> >> >>> > >>> using >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset are >> >> >>> pairwise >> >> >>> > >>> > > comparisons between each of the elements. This requires 2 >> >> >>> loops, >> >> >>> > one >> >> >>> > >>> to >> >> >>> > >>> > > iterate over each element, and an inner loop to iterate >> over >> >> >>> every >> >> >>> > >>> other >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 >> comparisons. >> >> >>> > >>> > > >> >> >>> > >>> > > For fairly small sets I found it to be faster to dump the >> >> >>> contents >> >> >>> > >>> into a >> >> >>> > >>> > > multdimensional numpy array and then do my iteration. I >> run >> >> >>> into >> >> >>> > >>> problems >> >> >>> > >>> > > with large sets because of memory issues and need to >> access >> >> >>> each >> >> >>> > >>> element >> >> >>> > >>> > of >> >> >>> > >>> > > the dataset at run time. >> >> >>> > >>> > > >> >> >>> > >>> > > Putting the elements into an array gives me about 600 >> >> >>> comparisons >> >> >>> > per >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me >> about >> >> 300 >> >> >>> > >>> > comparisons >> >> >>> > >>> > > per second. >> >> >>> > >>> > > >> >> >>> > >>> > > Is there a way to speed this process up? >> >> >>> > >>> > > >> >> >>> > >>> > > Example follows (this is not my real code, just an >> example): >> >> >>> > >>> > > >> >> >>> > >>> > > *Small Set*: >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> >> >>> > >>> > > data = f.root.data >> >> >>> > >>> > > >> >> >>> > >>> > > N_elements = len(data) >> >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) >> >> >>> > >>> > > >> >> >>> > >>> > > for ii, d in enumerate(data): >> >> >>> > >>> > > elements[ii] = data['element'] >> >> >>> > >>> > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in >> >> >>> xrange(N_elements): >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> >> >>> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >>> > >>> > > >> >> >>> > >>> > > *Large Set*: >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> >> >>> > >>> > > data = f.root.data >> >> >>> > >>> > > >> >> >>> > >>> > > N_elements = len(data) >> >> >>> > >>> > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) >> >> >>> > >>> > > for ii in xrange(N_elements): >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> >> >>> > >>> > > D[ii, jj] = compare(data['element'][ii], >> >> >>> > >>> > data['element'][jj]) >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> >>> HTML5, >> >> >>> > CSS, >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > >>> current >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more >> at: >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > > _______________________________________________ >> >> >>> > >>> > > Pytables-users mailing list >> >> >>> > >>> > > Pyt...@li... >> >> >>> > >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > -------------- next part -------------- >> >> >>> > >>> > An HTML attachment was scrubbed... >> >> >>> > >>> > >> >> >>> > >>> > ------------------------------ >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> HTML5, >> >> >>> CSS, >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > current >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > >> >> >>> > >>> > ------------------------------ >> >> >>> > >>> > >> >> >>> > >>> > _______________________________________________ >> >> >>> > >>> > Pytables-users mailing list >> >> >>> > >>> > Pyt...@li... >> >> >>> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> >> >>> > >>> > ********************************************* >> >> >>> > >>> > >> >> >>> > >>> -------------- next part -------------- >> >> >>> > >>> An HTML attachment was scrubbed... >> >> >>> > >>> >> >> >>> > >>> ------------------------------ >> >> >>> > >>> >> >> >>> > >>> Message: 2 >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> >> >>> > >>> From: David Reed <dav...@gm...> >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> >> Issue >> >> >>> 3 >> >> >>> > >>> To: pyt...@li... >> >> >>> > >>> Message-ID: >> >> >>> > >>> < >> >> >>> > >>> >> >> CAM...@ma... >> >> >>> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >>> >> >> >>> > >>> Thanks a lot for the help so far guys! >> >> >>> > >>> >> >> >>> > >>> Looking at itertools, I found what I believe to be the >> perfect >> >> >>> function >> >> >>> > >>> for >> >> >>> > >>> what I need, itertools.combinations. This appears to be a >> valid >> >> >>> > >>> replacement >> >> >>> > >>> to the method proposed. >> >> >>> > >>> >> >> >>> > >>> There is a small problem that I didn't mention is that my >> >> compare >> >> >>> > >>> function >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like so: >> >> >>> > >>> >> >> >>> > >>> D = np.empty((N_irises, N_irises)) >> >> >>> > >>> for ii in xrange(N_elements): >> >> >>> > >>> for jj in xrange(ii+1, N_elements): >> >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], >> >> >>> > >>> data['element1'][jj],data['element2'][ii], >> >> >>> > >>> data['element2'][jj]) >> >> >>> > >>> >> >> >>> > >>> Is there an efficient way of using itertools with this >> >> structure? >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> >> >>> > >>> pyt...@li...> wrote: >> >> >>> > >>> >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >>> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > or, via email, send a message with subject or body 'help' >> to >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > You can reach the person managing the list at >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > When replying, please edit your Subject line so it is more >> >> >>> specific >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > Today's Topics: >> >> >>> > >>> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh >> Ayers) >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >> >> ---------------------------------------------------------------------- >> >> >>> > >>> > >> >> >>> > >>> > Message: 1 >> >> >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> >>> > >>> > From: Josh Ayers <jos...@gm...> >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> using >> >> >>> PyTables >> >> >>> > >>> > To: Discussion list for PyTables >> >> >>> > >>> > <pyt...@li...> >> >> >>> > >>> > Message-ID: >> >> >>> > >>> > < >> >> >>> > >>> > >> >> >>> CAC...@ma... >> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >>> > >> >> >>> > >>> > David, >> >> >>> > >>> > >> >> >>> > >>> > The change in issue 27 was only for iteration over a >> >> >>> tables.Column >> >> >>> > >>> > instance. To use it, tweak Anthony's code as follows. >> This >> >> will >> >> >>> > >>> iterate >> >> >>> > >>> > over the "element" column, as in your original example. >> >> >>> > >>> > >> >> >>> > >>> > Note also that this will only work with the development >> >> version >> >> >>> of >> >> >>> > >>> PyTables >> >> >>> > >>> > available on github. It will be very slow using the >> released >> >> >>> v2.4.0. >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > from itertools import izip >> >> >>> > >>> > >> >> >>> > >>> > with tb.openFile(...) as f: >> >> >>> > >>> > data = f.root.data.cols.element >> >> >>> > >>> > data_i = iter(data) >> >> >>> > >>> > data_j = iter(data) >> >> >>> > >>> > data_i.next() # throw the first value away >> >> >>> > >>> > for i, j in izip(data_i, data_j): >> >> >>> > >>> > compare(i, j) >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > Hope that helps, >> >> >>> > >>> > Josh >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> >> >>> sc...@gm...> >> >> >>> > >>> wrote: >> >> >>> > >>> > >> >> >>> > >>> > > HI David, >> >> >>> > >>> > > >> >> >>> > >>> > > Tables and table column iteration have been overhauled >> >> fairly >> >> >>> > >>> recently >> >> >>> > >>> > > [1]. So you might try creating two iterators, offset by >> >> one, >> >> >>> and >> >> >>> > >>> then >> >> >>> > >>> > > doing the comparison. I am hacking this out super quick >> so >> >> >>> please >> >> >>> > >>> > forgive >> >> >>> > >>> > > me: >> >> >>> > >>> > > >> >> >>> > >>> > > from itertools import izip >> >> >>> > >>> > > >> >> >>> > >>> > > with tb.openFile(...) as f: >> >> >>> > >>> > > data = f.root.data >> >> >>> > >>> > > data_i = iter(data) >> >> >>> > >>> > > data_j = iter(data) >> >> >>> > >>> > > data_i.next() # throw the first value away >> >> >>> > >>> > > for i, j in izip(data_i, data_j): >> >> >>> > >>> > > compare(i, j) >> >> >>> > >>> > > >> >> >>> > >>> > > You get the idea ;) >> >> >>> > >>> > > >> >> >>> > >>> > > Be Well >> >> >>> > >>> > > Anthony >> >> >>> > >>> > > >> >> >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >>> dav...@gm... >> >> >>> > > >> >> >>> > >>> > wrote: >> >> >>> > >>> > > >> >> >>> > >>> > >> I was hoping someone could help me out here. >> >> >>> > >>> > >> >> >> >>> > >>> > >> This is from a post I put up on StackOverflow, >> >> >>> > >>> > >> >> >> >>> > >>> > >> I am have a fairly large dataset that I store in HDF5 >> and >> >> >>> access >> >> >>> > >>> using >> >> >>> > >>> > >> PyTables. One operation I need to do on this dataset are >> >> >>> pairwise >> >> >>> > >>> > >> comparisons between each of the elements. This requires >> 2 >> >> >>> loops, >> >> >>> > >>> one to >> >> >>> > >>> > >> iterate over each element, and an inner loop to iterate >> >> over >> >> >>> every >> >> >>> > >>> other >> >> >>> > >>> > >> element. This operation thus looks at N(N-1)/2 >> comparisons. >> >> >>> > >>> > >> >> >> >>> > >>> > >> For fairly small sets I found it to be faster to dump >> the >> >> >>> contents >> >> >>> > >>> into >> >> >>> > >>> > a >> >> >>> > >>> > >> multdimensional numpy array and then do my iteration. I >> run >> >> >>> into >> >> >>> > >>> > problems >> >> >>> > >>> > >> with large sets because of memory issues and need to >> access >> >> >>> each >> >> >>> > >>> > element of >> >> >>> > >>> > >> the dataset at run time. >> >> >>> > >>> > >> >> >> >>> > >>> > >> Putting the elements into an array gives me about 600 >> >> >>> comparisons >> >> >>> > >>> per >> >> >>> > >>> > >> second, while operating on hdf5 data itself gives me >> about >> >> 300 >> >> >>> > >>> > comparisons >> >> >>> > >>> > >> per second. >> >> >>> > >>> > >> >> >> >>> > >>> > >> Is there a way to speed this process up? >> >> >>> > >>> > >> >> >> >>> > >>> > >> Example follows (this is not my real code, just an >> >> example): >> >> >>> > >>> > >> >> >> >>> > >>> > >> *Small Set*: >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> >> >>> > >>> > >> data = f.root.data >> >> >>> > >>> > >> >> >> >>> > >>> > >> N_elements = len(data) >> >> >>> > >>> > >> elements = np.empty((N_irises, 1e5)) >> >> >>> > >>> > >> >> >> >>> > >>> > >> for ii, d in enumerate(data): >> >> >>> > >>> > >> elements[ii] = data['element'] >> >> >>> > >>> > >> >> >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in >> >> >>> xrange(N_elements): >> >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): >> >> >>> > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >>> > >>> > >> >> >> >>> > >>> > >> *Large Set*: >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> >> >>> > >>> > >> data = f.root.data >> >> >>> > >>> > >> >> >> >>> > >>> > >> N_elements = len(data) >> >> >>> > >>> > >> >> >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) >> >> >>> > >>> > >> for ii in xrange(N_elements): >> >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): >> >> >>> > >>> > >> D[ii, jj] = compare(data['element'][ii], >> >> >>> > >>> > data['element'][jj]) >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> >> >>> HTML5, >> >> >>> > >>> CSS, >> >> >>> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> >>> skills >> >> >>> > >>> current >> >> >>> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > >> MVPs and experts. ON SALE this month only -- learn more >> at: >> >> >>> > >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > >> _______________________________________________ >> >> >>> > >>> > >> Pytables-users mailing list >> >> >>> > >>> > >> Pyt...@li... >> >> >>> > >>> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> >>> HTML5, >> >> >>> > CSS, >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > >>> current >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more >> at: >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > > _______________________________________________ >> >> >>> > >>> > > Pytables-users mailing list >> >> >>> > >>> > > Pyt...@li... >> >> >>> > >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > -------------- next part -------------- >> >> >>> > >>> > An HTML attachment was scrubbed... >> >> >>> > >>> > >> >> >>> > >>> > ------------------------------ >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> HTML5, >> >> >>> CSS, >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > current >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > >> >> >>> > >>> > ------------------------------ >> >> >>> > >>> > >> >> >>> > >>> > _______________________________________________ >> >> >>> > >>> > Pytables-users mailing list >> >> >>> > >>> > Pyt...@li... >> >> >>> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 >> >> >>> > >>> > ********************************************* >> >> >>> > >>> > >> >> >>> > >>> -------------- next part -------------- >> >> >>> > >>> An HTML attachment was scrubbed... >> >> >>> > >>> >> >> >>> > >>> ------------------------------ >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> >>> CSS, >> >> >>> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >> >>> current >> >> >>> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> Microsoft >> >> >>> > >>> MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >>> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> >> >> >>> > >>> ------------------------------ >> >> >>> > >>> >> >> >>> > >>> _______________________________________________ >> >> >>> > >>> Pytables-users mailing list >> >> >>> > >>> Pyt...@li... >> >> >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 >> >> >>> > >>> ********************************************* >> >> >>> > >>> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> >>> CSS, >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >> >>> current >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> Microsoft >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> _______________________________________________ >> >> >>> > >> Pytables-users mailing list >> >> >>> > >> Pyt...@li... >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> >> >> >>> > >> >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> CSS, >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >> >>> current >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > > _______________________________________________ >> >> >>> > > Pytables-users mailing list >> >> >>> > > Pyt...@li... >> >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > > >> >> >>> > > >> >> >>> > -------------- next part -------------- >> >> >>> > An HTML attachment was scrubbed... >> >> >>> > >> >> >>> > ------------------------------ >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> >> CSS, >> >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >> current >> >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >> >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> >> >>> > ------------------------------ >> >> >>> > >> >> >>> > _______________________________________________ >> >> >>> > Pytables-users mailing list >> >> >>> > Pyt...@li... >> >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> >> >>> > >> >> >>> > End of Pytables-users Digest, Vol 80, Issue 8 >> >> >>> > ********************************************* >> >> >>> > >> >> >>> -------------- next part -------------- >> >> >>> An HTML attachment was scrubbed... >> >> >>> >> >> >>> ------------------------------ >> >> >>> >> >> >>> >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and >> >> >>> much more. Get web development skills now with LearnDevNow - >> >> >>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and >> >> experts. >> >> >>> SALE $99.99 this month only -- learn more at: >> >> >>> http://p.sf.net/sfu/learnmore_122812 >> >> >>> >> >> >>> ------------------------------ >> >> >>> >> >> >>> _______________________________________________ >> >> >>> Pytables-users mailing list >> >> >>> Pyt...@li... >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> >> >> >>> >> >> >>> End of Pytables-users Digest, Vol 80, Issue 9 >> >> >>> ********************************************* >> >> >>> >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> >> ------------------------------------------------------------------------------ >> >> > Everyone hates slow websites. So do we. >> >> > Make your web apps faster with AppDynamics >> >> > Download AppDynamics Lite for free today: >> >> > http://p.sf.net/sfu/appdyn_d2d_jan >> >> > _______________________________________________ >> >> > Pytables-users mailing list >> >> > Pyt...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> > >> >> -------------- next part -------------- >> >> An HTML attachment was scrubbed... >> >> >> >> ------------------------------ >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Everyone hates slow websites. So do we. >> >> Make your web apps faster with AppDynamics >> >> Download AppDynamics Lite for free today: >> >> http://p.sf.net/sfu/appdyn_d2d_jan >> >> >> >> ------------------------------ >> >> >> >> _______________________________________________ >> >> Pytables-users > > ... > > [Message clipped] > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: David R. <dav...@gm...> - 2013-02-01 21:28:17
|
at the error: result = numpy.empty(shape=nrows, dtype=dtypeField) nrows = 4620 and dtypeField is ('bool', (17, 9600)) I'm not sure what that means as a dtype, but thats what it is. Forgive me if I'm being totally naive, but I thought the whole point of __iter__ with pyttables was to do iteration on the fly, so there is no preallocation. If you have any ideas on this I'm all ears. Thanks again. Dave On Fri, Feb 1, 2013 at 3:45 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 1 Feb 2013 14:44:40 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> > wrote: > > > Hi Anthony, > > > > Thanks for the reply. > > > > I honestly don't know how to monitor my Python memory usage, but I'm sure > > that its caused by out of memory. > > > > Well, I would just run top or process monitor or something while running > the python script to see what happens to memory usage as the script chugs > along... > > > > I'm just trying to find out how to fix it. My HDF5 table has 4620 rows > > and the column I'm iterating over is a 17x9600 boolean matrix. The > > __iter__ method is preallocating an array that is this size which appears > > to be root of the error. I was hoping there is a fix somewhere in here > to > > not have to do this preallocation. > > > > So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of > these is ~760 MB. If you have 2 GB of memory and you are iterating over 2 > of these (templates & masks) it is conceivable that you are just running > out of memory. Maybe there is a way that __iter__ could not preallocate > something that is basically a temporary. What is the dtype of the > templates array? > > Be Well > Anthony > > > > > > Thanks again. > > > > > > > > > > On Fri, Feb 1, 2013 at 11:12 AM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > >> From: Anthony Scopatz <sc...@gm...> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 9 > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> < > >> CAP...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> Hi David, > >> > >> Sorry, I haven't had a ton of time recently. You seem to be getting a > >> memory error on creating a numpy array. This kind of thing typically > >> happens when you are out of memory. Does this seem to be the case with > >> you? When this dies, is your memory usage at 100%? If so, this > algorithm > >> might require a little tweaking... > >> > >> Be Well > >> Anthony > >> > >> > >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm...> > >> wrote: > >> > >> > I'm still having problems with this one. I can't tell if this > something > >> > dumb Im doing with itertools, or if its something in pytables. > >> > > >> > Would appreciate any help. > >> > > >> > Thanks > >> > > >> > > >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed <dav...@gm... > >> >wrote: > >> > > >> >> I think I have to reopen this issue. I have been running fine for > >> awhile > >> >> using the combinations method from itertools, but have recently run > >> into a > >> >> memory since I have recently quadrupled the size of the hdf file. > >> >> > >> >> Here is my code again: > >> >> > >> >> from itertools import combinations, izip > >> >> with tb.openFile(h5_all, 'r') as f: > >> >> irises = f.root.irises > >> >> > >> >> templates = f.root.irises.cols.templates > >> >> masks = f.root.irises.cols.masks1 > >> >> > >> >> N_irises = len(irises) > >> >> index = np.ones((20 * 480), np.bool) > >> >> > >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> >> D = np.empty((N_irises, N_irises)) > >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, > masks, > >> >> range(N_irises)), 2): > >> >> # print ii > >> >> D[ii, jj] = ham_dist( > >> >> t1[8, index], > >> >> t2[:, index], > >> >> m1[8, index], > >> >> m2[:, index], > >> >> ) > >> >> > >> >> And here is the error: > >> >> > >> >> In [10]: get_hd3() > >> >> 10669890 Comparisons > >> >> > >> >> > >> > --------------------------------------------------------------------------- > >> >> MemoryError Traceback (most recent call > >> >> last) > >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > >> >> ----> 1 get_hd3() > >> >> > >> >> > >> >> 118 print '%i Comparisons' % (N_irises*(N_irises > - > >> >> 1)/2) > >> >> 119 D = np.empty((N_irises, N_irises)) > >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in > >> >> combinations(izip(temp > >> >> lates, masks, range(N_irises)), 2): > >> >> 121 # print ii > >> >> 122 D[ii, jj] = ham_dist( > >> >> > >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) > >> >> 3274 for start_row in xrange(0, len(self), nrowsinbuf): > >> >> 3275 end_row = min([start_row + nrowsinbuf, max_row]) > >> >> -> 3276 buf = table.read(start_row, end_row, 1, > >> >> field=self.pathname) > >> >> > >> >> 3277 for row in buf: > >> >> 3278 yield row > >> >> > >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, start, > >> stop, > >> >> step, > >> >> field) > >> >> 1772 (start, stop, step) = self._processRangeRead(start, > >> stop, > >> >> step) > >> >> 1773 > >> >> -> 1774 arr = self._read(start, stop, step, field) > >> >> 1775 return internal_to_flavor(arr, self.flavor) > >> >> 1776 > >> >> > >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, > >> >> stop, step, > >> >> field) > >> >> 1719 if field: > >> >> 1720 # Create a container for the results > >> >> -> 1721 result = numpy.empty(shape=nrows, > dtype=dtypeField) > >> >> 1722 else: > >> >> 1723 # Recarray case > >> >> > >> >> MemoryError: > >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() > >> >> 1720 # Create a container for the results > >> >> -> 1721 result = numpy.empty(shape=nrows, > dtype=dtypeField) > >> >> 1722 else: > >> >> > >> >> Also, if you guys see any performance problems in my code, please let > >> me > >> >> know. > >> >> > >> >> Thank you so much for the help. > >> >> > >> >> -Dave > >> >> > >> >> > >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > >> >> pyt...@li...> wrote: > >> >> > >> >>> Send Pytables-users mailing list submissions to > >> >>> pyt...@li... > >> >>> > >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> or, via email, send a message with subject or body 'help' to > >> >>> pyt...@li... > >> >>> > >> >>> You can reach the person managing the list at > >> >>> pyt...@li... > >> >>> > >> >>> When replying, please edit your Subject line so it is more specific > >> >>> than "Re: Contents of Pytables-users digest..." > >> >>> > >> >>> > >> >>> Today's Topics: > >> >>> > >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) > >> >>> > >> >>> > >> >>> > ---------------------------------------------------------------------- > >> >>> > >> >>> Message: 1 > >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > >> >>> From: David Reed <dav...@gm...> > >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8 > >> >>> To: pyt...@li... > >> >>> Message-ID: > >> >>> < > >> >>> CAM...@ma...> > >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >>> > >> >>> I can't thank you guys enough for the help. I was able to add the > >> >>> __iter__ > >> >>> function to the table.py file and everything seems to be working > >> great! > >> >>> I'm not quite as fast as I was with iterating right of a matrix but > >> >>> pretty > >> >>> close. I was at 555 comparisons per second, and now im at 420. > >> >>> > >> >>> I handled the problem I mentioned earlier by doing this, and it > seems > >> to > >> >>> work great: > >> >>> > >> >>> A = f.root.data.cols.A > >> >>> B = f.root.data.cols.B > >> >>> > >> >>> D = np.empty((len(A), len(A)) > >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, > >> range(len(A))), > >> >>> 2): > >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > >> >>> > >> >>> Again, thanks a lot. > >> >>> > >> >>> -Dave > >> >>> > >> >>> > >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > >> >>> pyt...@li...> wrote: > >> >>> > >> >>> > Send Pytables-users mailing list submissions to > >> >>> > pyt...@li... > >> >>> > > >> >>> > To subscribe or unsubscribe via the World Wide Web, visit > >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > or, via email, send a message with subject or body 'help' to > >> >>> > pyt...@li... > >> >>> > > >> >>> > You can reach the person managing the list at > >> >>> > pyt...@li... > >> >>> > > >> >>> > When replying, please edit your Subject line so it is more > specific > >> >>> > than "Re: Contents of Pytables-users digest..." > >> >>> > > >> >>> > > >> >>> > Today's Topics: > >> >>> > > >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) > >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) > >> >>> > > >> >>> > > >> >>> > > >> ---------------------------------------------------------------------- > >> >>> > > >> >>> > Message: 1 > >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > Issue 3 > >> >>> > To: Discussion list for PyTables > >> >>> > <pyt...@li...> > >> >>> > Message-ID: > >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > >> >>> > Gz...@ma...> > >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >>> > > >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > dav...@gm...> > >> >>> wrote: > >> >>> > > >> >>> > > Thanks a lot for the help so far guys! > >> >>> > > > >> >>> > > Looking at itertools, I found what I believe to be the perfect > >> >>> function > >> >>> > > for what I need, itertools.combinations. This appears to be a > >> valid > >> >>> > > replacement to the method proposed. > >> >>> > > > >> >>> > > >> >>> > Yes, combinations is awesome! > >> >>> > > >> >>> > > >> >>> > > > >> >>> > > There is a small problem that I didn't mention is that my > compare > >> >>> > function > >> >>> > > actually takes as inputs 2 columns from the table. Like so: > >> >>> > > > >> >>> > > D = np.empty((N_irises, N_irises)) > >> >>> > > for ii in xrange(N_elements): > >> >>> > > for jj in xrange(ii+1, N_elements): > >> >>> > > D[ii, jj] = compare(data['element1'][ii], > >> >>> > data['element1'][jj],data['element2'][ii], > >> >>> > > data['element2'][jj]) > >> >>> > > > >> >>> > > Is there an efficient way of using itertools with this > structure? > >> >>> > > > >> >>> > > >> >>> > You can always make two other iterators for each column. Since > you > >> >>> have > >> >>> > two columns you would have 4 iterators. I am not sure how fast > >> this is > >> >>> > going to be but I am confident that there is definitely a way to > do > >> >>> this in > >> >>> > one for-loop, which is going to be way faster than nested loops. > >> >>> > > >> >>> > Be Well > >> >>> > Anthony > >> >>> > > >> >>> > > >> >>> > > > >> >>> > > > >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > >> >>> > > pyt...@li...> wrote: > >> >>> > > > >> >>> > >> Send Pytables-users mailing list submissions to > >> >>> > >> pyt...@li... > >> >>> > >> > >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> >>> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >> or, via email, send a message with subject or body 'help' to > >> >>> > >> pyt...@li... > >> >>> > >> > >> >>> > >> You can reach the person managing the list at > >> >>> > >> pyt...@li... > >> >>> > >> > >> >>> > >> When replying, please edit your Subject line so it is more > >> specific > >> >>> > >> than "Re: Contents of Pytables-users digest..." > >> >>> > >> > >> >>> > >> > >> >>> > >> Today's Topics: > >> >>> > >> > >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > >> >>> > >> > >> >>> > >> > >> >>> > >> > >> >>> > ---------------------------------------------------------------------- > >> >>> > >> > >> >>> > >> Message: 1 > >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> >>> > >> From: Josh Ayers <jos...@gm...> > >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > >> >>> PyTables > >> >>> > >> To: Discussion list for PyTables > >> >>> > >> <pyt...@li...> > >> >>> > >> Message-ID: > >> >>> > >> < > >> >>> > >> > >> CAC...@ma...> > >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >>> > >> > >> >>> > >> David, > >> >>> > >> > >> >>> > >> The change in issue 27 was only for iteration over a > >> tables.Column > >> >>> > >> instance. To use it, tweak Anthony's code as follows. This > will > >> >>> > iterate > >> >>> > >> over the "element" column, as in your original example. > >> >>> > >> > >> >>> > >> Note also that this will only work with the development version > >> of > >> >>> > >> PyTables > >> >>> > >> available on github. It will be very slow using the released > >> >>> v2.4.0. > >> >>> > >> > >> >>> > >> > >> >>> > >> from itertools import izip > >> >>> > >> > >> >>> > >> with tb.openFile(...) as f: > >> >>> > >> data = f.root.data.cols.element > >> >>> > >> data_i = iter(data) > >> >>> > >> data_j = iter(data) > >> >>> > >> data_i.next() # throw the first value away > >> >>> > >> for i, j in izip(data_i, data_j): > >> >>> > >> compare(i, j) > >> >>> > >> > >> >>> > >> > >> >>> > >> Hope that helps, > >> >>> > >> Josh > >> >>> > >> > >> >>> > >> > >> >>> > >> > >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> sc...@gm...> > >> >>> > >> wrote: > >> >>> > >> > >> >>> > >> > HI David, > >> >>> > >> > > >> >>> > >> > Tables and table column iteration have been overhauled fairly > >> >>> recently > >> >>> > >> > [1]. So you might try creating two iterators, offset by one, > >> and > >> >>> then > >> >>> > >> > doing the comparison. I am hacking this out super quick so > >> please > >> >>> > >> forgive > >> >>> > >> > me: > >> >>> > >> > > >> >>> > >> > from itertools import izip > >> >>> > >> > > >> >>> > >> > with tb.openFile(...) as f: > >> >>> > >> > data = f.root.data > >> >>> > >> > data_i = iter(data) > >> >>> > >> > data_j = iter(data) > >> >>> > >> > data_i.next() # throw the first value away > >> >>> > >> > for i, j in izip(data_i, data_j): > >> >>> > >> > compare(i, j) > >> >>> > >> > > >> >>> > >> > You get the idea ;) > >> >>> > >> > > >> >>> > >> > Be Well > >> >>> > >> > Anthony > >> >>> > >> > > >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 > >> >>> > >> > > >> >>> > >> > > >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >>> dav...@gm...> > >> >>> > >> wrote: > >> >>> > >> > > >> >>> > >> >> I was hoping someone could help me out here. > >> >>> > >> >> > >> >>> > >> >> This is from a post I put up on StackOverflow, > >> >>> > >> >> > >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 and > >> access > >> >>> > using > >> >>> > >> >> PyTables. One operation I need to do on this dataset are > >> pairwise > >> >>> > >> >> comparisons between each of the elements. This requires 2 > >> loops, > >> >>> one > >> >>> > to > >> >>> > >> >> iterate over each element, and an inner loop to iterate over > >> >>> every > >> >>> > >> other > >> >>> > >> >> element. This operation thus looks at N(N-1)/2 comparisons. > >> >>> > >> >> > >> >>> > >> >> For fairly small sets I found it to be faster to dump the > >> >>> contents > >> >>> > >> into a > >> >>> > >> >> multdimensional numpy array and then do my iteration. I run > >> into > >> >>> > >> problems > >> >>> > >> >> with large sets because of memory issues and need to access > >> each > >> >>> > >> element of > >> >>> > >> >> the dataset at run time. > >> >>> > >> >> > >> >>> > >> >> Putting the elements into an array gives me about 600 > >> >>> comparisons per > >> >>> > >> >> second, while operating on hdf5 data itself gives me about > 300 > >> >>> > >> comparisons > >> >>> > >> >> per second. > >> >>> > >> >> > >> >>> > >> >> Is there a way to speed this process up? > >> >>> > >> >> > >> >>> > >> >> Example follows (this is not my real code, just an example): > >> >>> > >> >> > >> >>> > >> >> *Small Set*: > >> >>> > >> >> > >> >>> > >> >> > >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >>> > >> >> data = f.root.data > >> >>> > >> >> > >> >>> > >> >> N_elements = len(data) > >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > >> >>> > >> >> > >> >>> > >> >> for ii, d in enumerate(data): > >> >>> > >> >> elements[ii] = data['element'] > >> >>> > >> >> > >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > >> xrange(N_elements): > >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > >> >>> > >> >> > >> >>> > >> >> *Large Set*: > >> >>> > >> >> > >> >>> > >> >> > >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >>> > >> >> data = f.root.data > >> >>> > >> >> > >> >>> > >> >> N_elements = len(data) > >> >>> > >> >> > >> >>> > >> >> D = np.empty((N_irises, N_irises)) > >> >>> > >> >> for ii in xrange(N_elements): > >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], > >> >>> > >> data['element'][jj]) > >> >>> > >> >> > >> >>> > >> >> > >> >>> > >> >> > >> >>> > >> >> > >> >>> > >> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >>> CSS, > >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >>> > current > >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by > >> >>> Microsoft > >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > >> >>> > >> >> _______________________________________________ > >> >>> > >> >> Pytables-users mailing list > >> >>> > >> >> Pyt...@li... > >> >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >> >> > >> >>> > >> >> > >> >>> > >> > > >> >>> > >> > > >> >>> > >> > > >> >>> > >> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >>> CSS, > >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your > skills > >> >>> > current > >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by > >> Microsoft > >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > >> >>> > >> > _______________________________________________ > >> >>> > >> > Pytables-users mailing list > >> >>> > >> > Pyt...@li... > >> >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >> > > >> >>> > >> > > >> >>> > >> -------------- next part -------------- > >> >>> > >> An HTML attachment was scrubbed... > >> >>> > >> > >> >>> > >> ------------------------------ > >> >>> > >> > >> >>> > >> > >> >>> > >> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > HTML5, > >> >>> CSS, > >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >> >>> current > >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > >> Microsoft > >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >>> > >> > >> >>> > >> ------------------------------ > >> >>> > >> > >> >>> > >> _______________________________________________ > >> >>> > >> Pytables-users mailing list > >> >>> > >> Pyt...@li... > >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >> > >> >>> > >> > >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> >>> > >> ********************************************* > >> >>> > >> > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >> CSS, > >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >> >>> current > >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >>> > > _______________________________________________ > >> >>> > > Pytables-users mailing list > >> >>> > > Pyt...@li... > >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > > > >> >>> > > > >> >>> > -------------- next part -------------- > >> >>> > An HTML attachment was scrubbed... > >> >>> > > >> >>> > ------------------------------ > >> >>> > > >> >>> > Message: 2 > >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > Issue 4 > >> >>> > To: Discussion list for PyTables > >> >>> > <pyt...@li...> > >> >>> > Message-ID: > >> >>> > < > >> >>> > > CAP...@ma...> > >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >>> > > >> >>> > Josh is right that you can just edit the code by hand (which works > >> but > >> >>> > sucks). > >> >>> > > >> >>> > However, on Windows -- on the rare occasion when I also have to > >> >>> develop on > >> >>> > it -- I typically use a distribution that includes a compiler, > >> cython, > >> >>> > hdf5, and pytables already and then I install my development > version > >> >>> from > >> >>> > github OVER this. I recommend either EPD or Anaconda, though > other > >> >>> > distributions listed here [1] might also work. > >> >>> > > >> >>> > Be well > >> >>> > Anthony > >> >>> > > >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ > >> >>> > > >> >>> > > >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> > >> >>> wrote: > >> >>> > > >> >>> > > The change was in pure Python code, so you should be able to > just > >> >>> paste > >> >>> > in > >> >>> > > the changes to your local copy. Start with the > >> table.Column.__iter__ > >> >>> > > method (lines 3296-3310) here. > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > >> >>> > >> > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > >> >>> > > > >> >>> > > It needs to be modified slightly because it uses some additional > >> >>> features > >> >>> > > that aren't available in the released version (the out=buf_slice > >> >>> argument > >> >>> > > to table.read). The following should work. > >> >>> > > > >> >>> > > def __iter__(self): > >> >>> > > table = self.table > >> >>> > > itemsize = self.dtype.itemsize > >> >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // > >> >>> itemsize > >> >>> > > max_row = len(self) > >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): > >> >>> > > end_row = min([start_row + nrowsinbuf, max_row]) > >> >>> > > buf = table.read(start_row, end_row, 1, > >> >>> field=self.pathname) > >> >>> > > for row in buf: > >> >>> > > yield row > >> >>> > > > >> >>> > > > >> >>> > > I haven't tested this, but I think it will work. > >> >>> > > > >> >>> > > Josh > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < > >> dav...@gm...> > >> >>> > wrote: > >> >>> > > > >> >>> > >> I apologize if I'm starting to sound helpless, but I'm forced > to > >> >>> work on > >> >>> > >> Windows 7 at work and have never had luck compiling python > source > >> >>> > >> successfully. I have had to rely on precompiled binaries and > now > >> >>> its > >> >>> > >> biting me in the butt. > >> >>> > >> > >> >>> > >> Is there any quick fix I can do to improve this iteration using > >> >>> v2.4.0? > >> >>> > >> > >> >>> > >> > >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > >> >>> > >> pyt...@li...> wrote: > >> >>> > >> > >> >>> > >>> Send Pytables-users mailing list submissions to > >> >>> > >>> pyt...@li... > >> >>> > >>> > >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >>> > >>> > >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> or, via email, send a message with subject or body 'help' to > >> >>> > >>> pyt...@li... > >> >>> > >>> > >> >>> > >>> You can reach the person managing the list at > >> >>> > >>> pyt...@li... > >> >>> > >>> > >> >>> > >>> When replying, please edit your Subject line so it is more > >> specific > >> >>> > >>> than "Re: Contents of Pytables-users digest..." > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> Today's Topics: > >> >>> > >>> > >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) > >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > ---------------------------------------------------------------------- > >> >>> > >>> > >> >>> > >>> Message: 1 > >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > >> >>> > >>> From: David Reed <dav...@gm...> > >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > >> Issue > >> >>> 2 > >> >>> > >>> To: pyt...@li... > >> >>> > >>> Message-ID: > >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > >> >>> > >>> ev...@ma...> > >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >> >>> > >>> > >> >>> > >>> Thanks Anthony, but unless Im missing something I don't think > >> that > >> >>> > method > >> >>> > >>> will work since this will only be comparing the ith element > with > >> >>> ith+1 > >> >>> > >>> element. I still need 2 for loops right? > >> >>> > >>> > >> >>> > >>> Using itertools might speed things up though, I've never used > >> them > >> >>> so I > >> >>> > >>> will give it a shot and let you know how it goes. Looks like > I > >> >>> need to > >> >>> > >>> download the latest release before I do that too. Thanks for > >> the > >> >>> help. > >> >>> > >>> > >> >>> > >>> -Dave > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > >> >>> > >>> pyt...@li...> wrote: > >> >>> > >>> > >> >>> > >>> > Send Pytables-users mailing list submissions to > >> >>> > >>> > pyt...@li... > >> >>> > >>> > > >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >> >>> > >>> > > >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> > or, via email, send a message with subject or body 'help' to > >> >>> > >>> > pyt...@li... > >> >>> > >>> > > >> >>> > >>> > You can reach the person managing the list at > >> >>> > >>> > pyt...@li... > >> >>> > >>> > > >> >>> > >>> > When replying, please edit your Subject line so it is more > >> >>> specific > >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > Today's Topics: > >> >>> > >>> > > >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony > >> >>> Scopatz) > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > >> >>> > > >> ---------------------------------------------------------------------- > >> >>> > >>> > > >> >>> > >>> > Message: 1 > >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> > >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > >> >>> PyTables > >> >>> > >>> > To: Discussion list for PyTables > >> >>> > >>> > <pyt...@li...> > >> >>> > >>> > Message-ID: > >> >>> > >>> > <CAPk-6T5b= > >> >>> > >>> > 1EG...@ma...> > >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >>> > >>> > > >> >>> > >>> > HI David, > >> >>> > >>> > > >> >>> > >>> > Tables and table column iteration have been overhauled > fairly > >> >>> > recently > >> >>> > >>> [1]. > >> >>> > >>> > So you might try creating two iterators, offset by one, and > >> then > >> >>> > >>> doing the > >> >>> > >>> > comparison. I am hacking this out super quick so please > >> forgive > >> >>> me: > >> >>> > >>> > > >> >>> > >>> > from itertools import izip > >> >>> > >>> > > >> >>> > >>> > with tb.openFile(...) as f: > >> >>> > >>> > data = f.root.data > >> >>> > >>> > data_i = iter(data) > >> >>> > >>> > data_j = iter(data) > >> >>> > >>> > data_i.next() # throw the first value away > >> >>> > >>> > for i, j in izip(data_i, data_j): > >> >>> > >>> > compare(i, j) > >> >>> > >>> > > >> >>> > >>> > You get the idea ;) > >> >>> > >>> > > >> >>> > >>> > Be Well > >> >>> > >>> > Anthony > >> >>> > >>> > > >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >>> dav...@gm...> > >> >>> > >>> wrote: > >> >>> > >>> > > >> >>> > >>> > > I was hoping someone could help me out here. > >> >>> > >>> > > > >> >>> > >>> > > This is from a post I put up on StackOverflow, > >> >>> > >>> > > > >> >>> > >>> > > I am have a fairly large dataset that I store in HDF5 and > >> >>> access > >> >>> > >>> using > >> >>> > >>> > > PyTables. One operation I need to do on this dataset are > >> >>> pairwise > >> >>> > >>> > > comparisons between each of the elements. This requires 2 > >> >>> loops, > >> >>> > one > >> >>> > >>> to > >> >>> > >>> > > iterate over each element, and an inner loop to iterate > over > >> >>> every > >> >>> > >>> other > >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 > comparisons. > >> >>> > >>> > > > >> >>> > >>> > > For fairly small sets I found it to be faster to dump the > >> >>> contents > >> >>> > >>> into a > >> >>> > >>> > > multdimensional numpy array and then do my iteration. I > run > >> >>> into > >> >>> > >>> problems > >> >>> > >>> > > with large sets because of memory issues and need to > access > >> >>> each > >> >>> > >>> element > >> >>> > >>> > of > >> >>> > >>> > > the dataset at run time. > >> >>> > >>> > > > >> >>> > >>> > > Putting the elements into an array gives me about 600 > >> >>> comparisons > >> >>> > per > >> >>> > >>> > > second, while operating on hdf5 data itself gives me about > >> 300 > >> >>> > >>> > comparisons > >> >>> > >>> > > per second. > >> >>> > >>> > > > >> >>> > >>> > > Is there a way to speed this process up? > >> >>> > >>> > > > >> >>> > >>> > > Example follows (this is not my real code, just an > example): > >> >>> > >>> > > > >> >>> > >>> > > *Small Set*: > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > >> >>> > >>> > > data = f.root.data > >> >>> > >>> > > > >> >>> > >>> > > N_elements = len(data) > >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) > >> >>> > >>> > > > >> >>> > >>> > > for ii, d in enumerate(data): > >> >>> > >>> > > elements[ii] = data['element'] > >> >>> > >>> > > > >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in > >> >>> xrange(N_elements): > >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > >> >>> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) > >> >>> > >>> > > > >> >>> > >>> > > *Large Set*: > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > >> >>> > >>> > > data = f.root.data > >> >>> > >>> > > > >> >>> > >>> > > N_elements = len(data) > >> >>> > >>> > > > >> >>> > >>> > > D = np.empty((N_irises, N_irises)) > >> >>> > >>> > > for ii in xrange(N_elements): > >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > >> >>> > >>> > > D[ii, jj] = compare(data['element'][ii], > >> >>> > >>> > data['element'][jj]) > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > >> >>> > >>> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> >>> HTML5, > >> >>> > CSS, > >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >>> > >>> current > >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > >> >>> Microsoft > >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more > at: > >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >>> > >>> > > _______________________________________________ > >> >>> > >>> > > Pytables-users mailing list > >> >>> > >>> > > Pyt...@li... > >> >>> > >>> > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > -------------- next part -------------- > >> >>> > >>> > An HTML attachment was scrubbed... > >> >>> > >>> > > >> >>> > >>> > ------------------------------ > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >>> CSS, > >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >>> > current > >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by > >> >>> Microsoft > >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > >> >>> > >>> > > >> >>> > >>> > ------------------------------ > >> >>> > >>> > > >> >>> > >>> > _______________________________________________ > >> >>> > >>> > Pytables-users mailing list > >> >>> > >>> > Pyt...@li... > >> >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > >> >>> > >>> > ********************************************* > >> >>> > >>> > > >> >>> > >>> -------------- next part -------------- > >> >>> > >>> An HTML attachment was scrubbed... > >> >>> > >>> > >> >>> > >>> ------------------------------ > >> >>> > >>> > >> >>> > >>> Message: 2 > >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > >> >>> > >>> From: David Reed <dav...@gm...> > >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > >> Issue > >> >>> 3 > >> >>> > >>> To: pyt...@li... > >> >>> > >>> Message-ID: > >> >>> > >>> < > >> >>> > >>> > >> CAM...@ma... > >> >>> > > >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >> >>> > >>> > >> >>> > >>> Thanks a lot for the help so far guys! > >> >>> > >>> > >> >>> > >>> Looking at itertools, I found what I believe to be the perfect > >> >>> function > >> >>> > >>> for > >> >>> > >>> what I need, itertools.combinations. This appears to be a > valid > >> >>> > >>> replacement > >> >>> > >>> to the method proposed. > >> >>> > >>> > >> >>> > >>> There is a small problem that I didn't mention is that my > >> compare > >> >>> > >>> function > >> >>> > >>> actually takes as inputs 2 columns from the table. Like so: > >> >>> > >>> > >> >>> > >>> D = np.empty((N_irises, N_irises)) > >> >>> > >>> for ii in xrange(N_elements): > >> >>> > >>> for jj in xrange(ii+1, N_elements): > >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], > >> >>> > >>> data['element1'][jj],data['element2'][ii], > >> >>> > >>> data['element2'][jj]) > >> >>> > >>> > >> >>> > >>> Is there an efficient way of using itertools with this > >> structure? > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > >> >>> > >>> pyt...@li...> wrote: > >> >>> > >>> > >> >>> > >>> > Send Pytables-users mailing list submissions to > >> >>> > >>> > pyt...@li... > >> >>> > >>> > > >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >> >>> > >>> > > >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> > or, via email, send a message with subject or body 'help' to > >> >>> > >>> > pyt...@li... > >> >>> > >>> > > >> >>> > >>> > You can reach the person managing the list at > >> >>> > >>> > pyt...@li... > >> >>> > >>> > > >> >>> > >>> > When replying, please edit your Subject line so it is more > >> >>> specific > >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > Today's Topics: > >> >>> > >>> > > >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh > Ayers) > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > >> >>> > > >> ---------------------------------------------------------------------- > >> >>> > >>> > > >> >>> > >>> > Message: 1 > >> >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> >>> > >>> > From: Josh Ayers <jos...@gm...> > >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > >> >>> PyTables > >> >>> > >>> > To: Discussion list for PyTables > >> >>> > >>> > <pyt...@li...> > >> >>> > >>> > Message-ID: > >> >>> > >>> > < > >> >>> > >>> > > >> >>> CAC...@ma...> > >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >>> > >>> > > >> >>> > >>> > David, > >> >>> > >>> > > >> >>> > >>> > The change in issue 27 was only for iteration over a > >> >>> tables.Column > >> >>> > >>> > instance. To use it, tweak Anthony's code as follows. This > >> will > >> >>> > >>> iterate > >> >>> > >>> > over the "element" column, as in your original example. > >> >>> > >>> > > >> >>> > >>> > Note also that this will only work with the development > >> version > >> >>> of > >> >>> > >>> PyTables > >> >>> > >>> > available on github. It will be very slow using the > released > >> >>> v2.4.0. > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > from itertools import izip > >> >>> > >>> > > >> >>> > >>> > with tb.openFile(...) as f: > >> >>> > >>> > data = f.root.data.cols.element > >> >>> > >>> > data_i = iter(data) > >> >>> > >>> > data_j = iter(data) > >> >>> > >>> > data_i.next() # throw the first value away > >> >>> > >>> > for i, j in izip(data_i, data_j): > >> >>> > >>> > compare(i, j) > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > Hope that helps, > >> >>> > >>> > Josh > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> >>> sc...@gm...> > >> >>> > >>> wrote: > >> >>> > >>> > > >> >>> > >>> > > HI David, > >> >>> > >>> > > > >> >>> > >>> > > Tables and table column iteration have been overhauled > >> fairly > >> >>> > >>> recently > >> >>> > >>> > > [1]. So you might try creating two iterators, offset by > >> one, > >> >>> and > >> >>> > >>> then > >> >>> > >>> > > doing the comparison. I am hacking this out super quick > so > >> >>> please > >> >>> > >>> > forgive > >> >>> > >>> > > me: > >> >>> > >>> > > > >> >>> > >>> > > from itertools import izip > >> >>> > >>> > > > >> >>> > >>> > > with tb.openFile(...) as f: > >> >>> > >>> > > data = f.root.data > >> >>> > >>> > > data_i = iter(data) > >> >>> > >>> > > data_j = iter(data) > >> >>> > >>> > > data_i.next() # throw the first value away > >> >>> > >>> > > for i, j in izip(data_i, data_j): > >> >>> > >>> > > compare(i, j) > >> >>> > >>> > > > >> >>> > >>> > > You get the idea ;) > >> >>> > >>> > > > >> >>> > >>> > > Be Well > >> >>> > >>> > > Anthony > >> >>> > >>> > > > >> >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >>> dav...@gm... > >> >>> > > > >> >>> > >>> > wrote: > >> >>> > >>> > > > >> >>> > >>> > >> I was hoping someone could help me out here. > >> >>> > >>> > >> > >> >>> > >>> > >> This is from a post I put up on StackOverflow, > >> >>> > >>> > >> > >> >>> > >>> > >> I am have a fairly large dataset that I store in HDF5 and > >> >>> access > >> >>> > >>> using > >> >>> > >>> > >> PyTables. One operation I need to do on this dataset are > >> >>> pairwise > >> >>> > >>> > >> comparisons between each of the elements. This requires 2 > >> >>> loops, > >> >>> > >>> one to > >> >>> > >>> > >> iterate over each element, and an inner loop to iterate > >> over > >> >>> every > >> >>> > >>> other > >> >>> > >>> > >> element. This operation thus looks at N(N-1)/2 > comparisons. > >> >>> > >>> > >> > >> >>> > >>> > >> For fairly small sets I found it to be faster to dump the > >> >>> contents > >> >>> > >>> into > >> >>> > >>> > a > >> >>> > >>> > >> multdimensional numpy array and then do my iteration. I > run > >> >>> into > >> >>> > >>> > problems > >> >>> > >>> > >> with large sets because of memory issues and need to > access > >> >>> each > >> >>> > >>> > element of > >> >>> > >>> > >> the dataset at run time. > >> >>> > >>> > >> > >> >>> > >>> > >> Putting the elements into an array gives me about 600 > >> >>> comparisons > >> >>> > >>> per > >> >>> > >>> > >> second, while operating on hdf5 data itself gives me > about > >> 300 > >> >>> > >>> > comparisons > >> >>> > >>> > >> per second. > >> >>> > >>> > >> > >> >>> > >>> > >> Is there a way to speed this process up? > >> >>> > >>> > >> > >> >>> > >>> > >> Example follows (this is not my real code, just an > >> example): > >> >>> > >>> > >> > >> >>> > >>> > >> *Small Set*: > >> >>> > >>> > >> > >> >>> > >>> > >> > >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >> >>> > >>> > >> data = f.root.data > >> >>> > >>> > >> > >> >>> > >>> > >> N_elements = len(data) > >> >>> > >>> > >> elements = np.empty((N_irises, 1e5)) > >> >>> > >>> > >> > >> >>> > >>> > >> for ii, d in enumerate(data): > >> >>> > >>> > >> elements[ii] = data['element'] > >> >>> > >>> > >> > >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in > >> >>> xrange(N_elements): > >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): > >> >>> > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) > >> >>> > >>> > >> > >> >>> > >>> > >> *Large Set*: > >> >>> > >>> > >> > >> >>> > >>> > >> > >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >> >>> > >>> > >> data = f.root.data > >> >>> > >>> > >> > >> >>> > >>> > >> N_elements = len(data) > >> >>> > >>> > >> > >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) > >> >>> > >>> > >> for ii in xrange(N_elements): > >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): > >> >>> > >>> > >> D[ii, jj] = compare(data['element'][ii], > >> >>> > >>> > data['element'][jj]) > >> >>> > >>> > >> > >> >>> > >>> > >> > >> >>> > >>> > >> > >> >>> > >>> > >> > >> >>> > >>> > > >> >>> > >>> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> >>> HTML5, > >> >>> > >>> CSS, > >> >>> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> >>> skills > >> >>> > >>> current > >> >>> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > >> >>> Microsoft > >> >>> > >>> > >> MVPs and experts. ON SALE this month only -- learn more > at: > >> >>> > >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >>> > >>> > >> _______________________________________________ > >> >>> > >>> > >> Pytables-users mailing list > >> >>> > >>> > >> Pyt...@li... > >> >>> > >>> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> > >> > >> >>> > >>> > >> > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > >> >>> > >>> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> >>> HTML5, > >> >>> > CSS, > >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >>> > >>> current > >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > >> >>> Microsoft > >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more > at: > >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >>> > >>> > > _______________________________________________ > >> >>> > >>> > > Pytables-users mailing list > >> >>> > >>> > > Pyt...@li... > >> >>> > >>> > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > -------------- next part -------------- > >> >>> > >>> > An HTML attachment was scrubbed... > >> >>> > >>> > > >> >>> > >>> > ------------------------------ > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >> HTML5, > >> >>> CSS, > >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> skills > >> >>> > current > >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by > >> >>> Microsoft > >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > >> >>> > >>> > > >> >>> > >>> > ------------------------------ > >> >>> > >>> > > >> >>> > >>> > _______________________________________________ > >> >>> > >>> > Pytables-users mailing list > >> >>> > >>> > Pyt...@li... > >> >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 > >> >>> > >>> > ********************************************* > >> >>> > >>> > > >> >>> > >>> -------------- next part -------------- > >> >>> > >>> An HTML attachment was scrubbed... > >> >>> > >>> > >> >>> > >>> ------------------------------ > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > HTML5, > >> >>> CSS, > >> >>> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your > skills > >> >>> current > >> >>> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by > >> Microsoft > >> >>> > >>> MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > >>> http://p.sf.net/sfu/learnmore_122712 > >> >>> > >>> > >> >>> > >>> ------------------------------ > >> >>> > >>> > >> >>> > >>> _______________________________________________ > >> >>> > >>> Pytables-users mailing list > >> >>> > >>> Pyt...@li... > >> >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 > >> >>> > >>> ********************************************* > >> >>> > >>> > >> >>> > >> > >> >>> > >> > >> >>> > >> > >> >>> > >> > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > HTML5, > >> >>> CSS, > >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >> >>> current > >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > >> Microsoft > >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >>> > >> _______________________________________________ > >> >>> > >> Pytables-users mailing list > >> >>> > >> Pyt...@li... > >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >> > >> >>> > >> > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >> CSS, > >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >> >>> current > >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >>> > > _______________________________________________ > >> >>> > > Pytables-users mailing list > >> >>> > > Pyt...@li... > >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > > > >> >>> > > > >> >>> > -------------- next part -------------- > >> >>> > An HTML attachment was scrubbed... > >> >>> > > >> >>> > ------------------------------ > >> >>> > > >> >>> > > >> >>> > > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >> CSS, > >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >> current > >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> >>> > MVPs and experts. ON SALE this month only -- learn more at: > >> >>> > http://p.sf.net/sfu/learnmore_122712 > >> >>> > > >> >>> > ------------------------------ > >> >>> > > >> >>> > _______________________________________________ > >> >>> > Pytables-users mailing list > >> >>> > Pyt...@li... > >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > > >> >>> > > >> >>> > End of Pytables-users Digest, Vol 80, Issue 8 > >> >>> > ********************************************* > >> >>> > > >> >>> -------------- next part -------------- > >> >>> An HTML attachment was scrubbed... > >> >>> > >> >>> ------------------------------ > >> >>> > >> >>> > >> >>> > >> > ------------------------------------------------------------------------------ > >> >>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > >> >>> much more. Get web development skills now with LearnDevNow - > >> >>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and > >> experts. > >> >>> SALE $99.99 this month only -- learn more at: > >> >>> http://p.sf.net/sfu/learnmore_122812 > >> >>> > >> >>> ------------------------------ > >> >>> > >> >>> _______________________________________________ > >> >>> Pytables-users mailing list > >> >>> Pyt...@li... > >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> > >> >>> > >> >>> End of Pytables-users Digest, Vol 80, Issue 9 > >> >>> ********************************************* > >> >>> > >> >> > >> >> > >> > > >> > > >> > > >> > ------------------------------------------------------------------------------ > >> > Everyone hates slow websites. So do we. > >> > Make your web apps faster with AppDynamics > >> > Download AppDynamics Lite for free today: > >> > http://p.sf.net/sfu/appdyn_d2d_jan > >> > _______________________________________________ > >> > Pytables-users mailing list > >> > Pyt...@li... > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > > >> > > >> -------------- next part -------------- > >> An HTML attachment was scrubbed... > >> > >> ------------------------------ > >> > >> > >> > ------------------------------------------------------------------------------ > >> Everyone hates slow websites. So do we. > >> Make your web apps faster with AppDynamics > >> Download AppDynamics Lite for free today: > >> http://p.sf.net/sfu/appdyn_d2d_jan > >> > >> ------------------------------ > >> > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> > >> End of Pytables-users Digest, Vol 81, Issue 2 > >> ********************************************* > >> > > > > > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_jan > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 81, Issue 4 > ********************************************* > |
From: Anthony S. <sc...@gm...> - 2013-02-01 20:45:12
|
On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> wrote: > Hi Anthony, > > Thanks for the reply. > > I honestly don't know how to monitor my Python memory usage, but I'm sure > that its caused by out of memory. > Well, I would just run top or process monitor or something while running the python script to see what happens to memory usage as the script chugs along... > I'm just trying to find out how to fix it. My HDF5 table has 4620 rows > and the column I'm iterating over is a 17x9600 boolean matrix. The > __iter__ method is preallocating an array that is this size which appears > to be root of the error. I was hoping there is a fix somewhere in here to > not have to do this preallocation. > So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of these is ~760 MB. If you have 2 GB of memory and you are iterating over 2 of these (templates & masks) it is conceivable that you are just running out of memory. Maybe there is a way that __iter__ could not preallocate something that is basically a temporary. What is the dtype of the templates array? Be Well Anthony > > Thanks again. > > > > > On Fri, Feb 1, 2013 at 11:12 AM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 9 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi David, >> >> Sorry, I haven't had a ton of time recently. You seem to be getting a >> memory error on creating a numpy array. This kind of thing typically >> happens when you are out of memory. Does this seem to be the case with >> you? When this dies, is your memory usage at 100%? If so, this algorithm >> might require a little tweaking... >> >> Be Well >> Anthony >> >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm...> >> wrote: >> >> > I'm still having problems with this one. I can't tell if this something >> > dumb Im doing with itertools, or if its something in pytables. >> > >> > Would appreciate any help. >> > >> > Thanks >> > >> > >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed <dav...@gm... >> >wrote: >> > >> >> I think I have to reopen this issue. I have been running fine for >> awhile >> >> using the combinations method from itertools, but have recently run >> into a >> >> memory since I have recently quadrupled the size of the hdf file. >> >> >> >> Here is my code again: >> >> >> >> from itertools import combinations, izip >> >> with tb.openFile(h5_all, 'r') as f: >> >> irises = f.root.irises >> >> >> >> templates = f.root.irises.cols.templates >> >> masks = f.root.irises.cols.masks1 >> >> >> >> N_irises = len(irises) >> >> index = np.ones((20 * 480), np.bool) >> >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> D = np.empty((N_irises, N_irises)) >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, >> >> range(N_irises)), 2): >> >> # print ii >> >> D[ii, jj] = ham_dist( >> >> t1[8, index], >> >> t2[:, index], >> >> m1[8, index], >> >> m2[:, index], >> >> ) >> >> >> >> And here is the error: >> >> >> >> In [10]: get_hd3() >> >> 10669890 Comparisons >> >> >> >> >> --------------------------------------------------------------------------- >> >> MemoryError Traceback (most recent call >> >> last) >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> >> ----> 1 get_hd3() >> >> >> >> >> >> 118 print '%i Comparisons' % (N_irises*(N_irises - >> >> 1)/2) >> >> 119 D = np.empty((N_irises, N_irises)) >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in >> >> combinations(izip(temp >> >> lates, masks, range(N_irises)), 2): >> >> 121 # print ii >> >> 122 D[ii, jj] = ham_dist( >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) >> >> 3274 for start_row in xrange(0, len(self), nrowsinbuf): >> >> 3275 end_row = min([start_row + nrowsinbuf, max_row]) >> >> -> 3276 buf = table.read(start_row, end_row, 1, >> >> field=self.pathname) >> >> >> >> 3277 for row in buf: >> >> 3278 yield row >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, start, >> stop, >> >> step, >> >> field) >> >> 1772 (start, stop, step) = self._processRangeRead(start, >> stop, >> >> step) >> >> 1773 >> >> -> 1774 arr = self._read(start, stop, step, field) >> >> 1775 return internal_to_flavor(arr, self.flavor) >> >> 1776 >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, >> >> stop, step, >> >> field) >> >> 1719 if field: >> >> 1720 # Create a container for the results >> >> -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> 1722 else: >> >> 1723 # Recarray case >> >> >> >> MemoryError: >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() >> >> 1720 # Create a container for the results >> >> -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> 1722 else: >> >> >> >> Also, if you guys see any performance problems in my code, please let >> me >> >> know. >> >> >> >> Thank you so much for the help. >> >> >> >> -Dave >> >> >> >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> >> pyt...@li...> wrote: >> >> >> >>> Send Pytables-users mailing list submissions to >> >>> pyt...@li... >> >>> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> or, via email, send a message with subject or body 'help' to >> >>> pyt...@li... >> >>> >> >>> You can reach the person managing the list at >> >>> pyt...@li... >> >>> >> >>> When replying, please edit your Subject line so it is more specific >> >>> than "Re: Contents of Pytables-users digest..." >> >>> >> >>> >> >>> Today's Topics: >> >>> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) >> >>> >> >>> >> >>> ---------------------------------------------------------------------- >> >>> >> >>> Message: 1 >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> >>> From: David Reed <dav...@gm...> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8 >> >>> To: pyt...@li... >> >>> Message-ID: >> >>> < >> >>> CAM...@ma...> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> >> >>> I can't thank you guys enough for the help. I was able to add the >> >>> __iter__ >> >>> function to the table.py file and everything seems to be working >> great! >> >>> I'm not quite as fast as I was with iterating right of a matrix but >> >>> pretty >> >>> close. I was at 555 comparisons per second, and now im at 420. >> >>> >> >>> I handled the problem I mentioned earlier by doing this, and it seems >> to >> >>> work great: >> >>> >> >>> A = f.root.data.cols.A >> >>> B = f.root.data.cols.B >> >>> >> >>> D = np.empty((len(A), len(A)) >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, >> range(len(A))), >> >>> 2): >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> >>> >> >>> Again, thanks a lot. >> >>> >> >>> -Dave >> >>> >> >>> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> >>> pyt...@li...> wrote: >> >>> >> >>> > Send Pytables-users mailing list submissions to >> >>> > pyt...@li... >> >>> > >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > or, via email, send a message with subject or body 'help' to >> >>> > pyt...@li... >> >>> > >> >>> > You can reach the person managing the list at >> >>> > pyt...@li... >> >>> > >> >>> > When replying, please edit your Subject line so it is more specific >> >>> > than "Re: Contents of Pytables-users digest..." >> >>> > >> >>> > >> >>> > Today's Topics: >> >>> > >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) >> >>> > >> >>> > >> >>> > >> ---------------------------------------------------------------------- >> >>> > >> >>> > Message: 1 >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> >>> > From: Anthony Scopatz <sc...@gm...> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> >>> > To: Discussion list for PyTables >> >>> > <pyt...@li...> >> >>> > Message-ID: >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> >>> > Gz...@ma...> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >>> > >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> >> >>> wrote: >> >>> > >> >>> > > Thanks a lot for the help so far guys! >> >>> > > >> >>> > > Looking at itertools, I found what I believe to be the perfect >> >>> function >> >>> > > for what I need, itertools.combinations. This appears to be a >> valid >> >>> > > replacement to the method proposed. >> >>> > > >> >>> > >> >>> > Yes, combinations is awesome! >> >>> > >> >>> > >> >>> > > >> >>> > > There is a small problem that I didn't mention is that my compare >> >>> > function >> >>> > > actually takes as inputs 2 columns from the table. Like so: >> >>> > > >> >>> > > D = np.empty((N_irises, N_irises)) >> >>> > > for ii in xrange(N_elements): >> >>> > > for jj in xrange(ii+1, N_elements): >> >>> > > D[ii, jj] = compare(data['element1'][ii], >> >>> > data['element1'][jj],data['element2'][ii], >> >>> > > data['element2'][jj]) >> >>> > > >> >>> > > Is there an efficient way of using itertools with this structure? >> >>> > > >> >>> > >> >>> > You can always make two other iterators for each column. Since you >> >>> have >> >>> > two columns you would have 4 iterators. I am not sure how fast >> this is >> >>> > going to be but I am confident that there is definitely a way to do >> >>> this in >> >>> > one for-loop, which is going to be way faster than nested loops. >> >>> > >> >>> > Be Well >> >>> > Anthony >> >>> > >> >>> > >> >>> > > >> >>> > > >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> >>> > > pyt...@li...> wrote: >> >>> > > >> >>> > >> Send Pytables-users mailing list submissions to >> >>> > >> pyt...@li... >> >>> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit >> >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> or, via email, send a message with subject or body 'help' to >> >>> > >> pyt...@li... >> >>> > >> >> >>> > >> You can reach the person managing the list at >> >>> > >> pyt...@li... >> >>> > >> >> >>> > >> When replying, please edit your Subject line so it is more >> specific >> >>> > >> than "Re: Contents of Pytables-users digest..." >> >>> > >> >> >>> > >> >> >>> > >> Today's Topics: >> >>> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >>> > >> >> >>> > >> >> >>> > >> >> >>> ---------------------------------------------------------------------- >> >>> > >> >> >>> > >> Message: 1 >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >>> > >> From: Josh Ayers <jos...@gm...> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> >>> PyTables >> >>> > >> To: Discussion list for PyTables >> >>> > >> <pyt...@li...> >> >>> > >> Message-ID: >> >>> > >> < >> >>> > >> >> CAC...@ma...> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> >>> > >> >> >>> > >> David, >> >>> > >> >> >>> > >> The change in issue 27 was only for iteration over a >> tables.Column >> >>> > >> instance. To use it, tweak Anthony's code as follows. This will >> >>> > iterate >> >>> > >> over the "element" column, as in your original example. >> >>> > >> >> >>> > >> Note also that this will only work with the development version >> of >> >>> > >> PyTables >> >>> > >> available on github. It will be very slow using the released >> >>> v2.4.0. >> >>> > >> >> >>> > >> >> >>> > >> from itertools import izip >> >>> > >> >> >>> > >> with tb.openFile(...) as f: >> >>> > >> data = f.root.data.cols.element >> >>> > >> data_i = iter(data) >> >>> > >> data_j = iter(data) >> >>> > >> data_i.next() # throw the first value away >> >>> > >> for i, j in izip(data_i, data_j): >> >>> > >> compare(i, j) >> >>> > >> >> >>> > >> >> >>> > >> Hope that helps, >> >>> > >> Josh >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> sc...@gm...> >> >>> > >> wrote: >> >>> > >> >> >>> > >> > HI David, >> >>> > >> > >> >>> > >> > Tables and table column iteration have been overhauled fairly >> >>> recently >> >>> > >> > [1]. So you might try creating two iterators, offset by one, >> and >> >>> then >> >>> > >> > doing the comparison. I am hacking this out super quick so >> please >> >>> > >> forgive >> >>> > >> > me: >> >>> > >> > >> >>> > >> > from itertools import izip >> >>> > >> > >> >>> > >> > with tb.openFile(...) as f: >> >>> > >> > data = f.root.data >> >>> > >> > data_i = iter(data) >> >>> > >> > data_j = iter(data) >> >>> > >> > data_i.next() # throw the first value away >> >>> > >> > for i, j in izip(data_i, data_j): >> >>> > >> > compare(i, j) >> >>> > >> > >> >>> > >> > You get the idea ;) >> >>> > >> > >> >>> > >> > Be Well >> >>> > >> > Anthony >> >>> > >> > >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> >>> > >> > >> >>> > >> > >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >>> dav...@gm...> >> >>> > >> wrote: >> >>> > >> > >> >>> > >> >> I was hoping someone could help me out here. >> >>> > >> >> >> >>> > >> >> This is from a post I put up on StackOverflow, >> >>> > >> >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 and >> access >> >>> > using >> >>> > >> >> PyTables. One operation I need to do on this dataset are >> pairwise >> >>> > >> >> comparisons between each of the elements. This requires 2 >> loops, >> >>> one >> >>> > to >> >>> > >> >> iterate over each element, and an inner loop to iterate over >> >>> every >> >>> > >> other >> >>> > >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> >>> > >> >> >> >>> > >> >> For fairly small sets I found it to be faster to dump the >> >>> contents >> >>> > >> into a >> >>> > >> >> multdimensional numpy array and then do my iteration. I run >> into >> >>> > >> problems >> >>> > >> >> with large sets because of memory issues and need to access >> each >> >>> > >> element of >> >>> > >> >> the dataset at run time. >> >>> > >> >> >> >>> > >> >> Putting the elements into an array gives me about 600 >> >>> comparisons per >> >>> > >> >> second, while operating on hdf5 data itself gives me about 300 >> >>> > >> comparisons >> >>> > >> >> per second. >> >>> > >> >> >> >>> > >> >> Is there a way to speed this process up? >> >>> > >> >> >> >>> > >> >> Example follows (this is not my real code, just an example): >> >>> > >> >> >> >>> > >> >> *Small Set*: >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >>> > >> >> data = f.root.data >> >>> > >> >> >> >>> > >> >> N_elements = len(data) >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> >>> > >> >> >> >>> > >> >> for ii, d in enumerate(data): >> >>> > >> >> elements[ii] = data['element'] >> >>> > >> >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> xrange(N_elements): >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >>> > >> >> >> >>> > >> >> *Large Set*: >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >>> > >> >> data = f.root.data >> >>> > >> >> >> >>> > >> >> N_elements = len(data) >> >>> > >> >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> >>> > >> >> for ii in xrange(N_elements): >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], >> >>> > >> data['element'][jj]) >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >>> CSS, >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >>> > current >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >>> Microsoft >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> >>> > >> >> _______________________________________________ >> >>> > >> >> Pytables-users mailing list >> >>> > >> >> Pyt...@li... >> >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >> >> >>> > >> >> >> >>> > >> > >> >>> > >> > >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >>> CSS, >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> > current >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> >>> > >> > _______________________________________________ >> >>> > >> > Pytables-users mailing list >> >>> > >> > Pyt...@li... >> >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> > >> >>> > >> > >> >>> > >> -------------- next part -------------- >> >>> > >> An HTML attachment was scrubbed... >> >>> > >> >> >>> > >> ------------------------------ >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> >>> CSS, >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >>> > >> >> >>> > >> ------------------------------ >> >>> > >> >> >>> > >> _______________________________________________ >> >>> > >> Pytables-users mailing list >> >>> > >> Pyt...@li... >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >> >>> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> >>> > >> ********************************************* >> >>> > >> >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >>> > > _______________________________________________ >> >>> > > Pytables-users mailing list >> >>> > > Pyt...@li... >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > > >> >>> > > >> >>> > -------------- next part -------------- >> >>> > An HTML attachment was scrubbed... >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > Message: 2 >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> >>> > From: Anthony Scopatz <sc...@gm...> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 >> >>> > To: Discussion list for PyTables >> >>> > <pyt...@li...> >> >>> > Message-ID: >> >>> > < >> >>> > CAP...@ma...> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >>> > >> >>> > Josh is right that you can just edit the code by hand (which works >> but >> >>> > sucks). >> >>> > >> >>> > However, on Windows -- on the rare occasion when I also have to >> >>> develop on >> >>> > it -- I typically use a distribution that includes a compiler, >> cython, >> >>> > hdf5, and pytables already and then I install my development version >> >>> from >> >>> > github OVER this. I recommend either EPD or Anaconda, though other >> >>> > distributions listed here [1] might also work. >> >>> > >> >>> > Be well >> >>> > Anthony >> >>> > >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ >> >>> > >> >>> > >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> >> >>> wrote: >> >>> > >> >>> > > The change was in pure Python code, so you should be able to just >> >>> paste >> >>> > in >> >>> > > the changes to your local copy. Start with the >> table.Column.__iter__ >> >>> > > method (lines 3296-3310) here. >> >>> > > >> >>> > > >> >>> > > >> >>> > >> >>> >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> >>> > > >> >>> > > It needs to be modified slightly because it uses some additional >> >>> features >> >>> > > that aren't available in the released version (the out=buf_slice >> >>> argument >> >>> > > to table.read). The following should work. >> >>> > > >> >>> > > def __iter__(self): >> >>> > > table = self.table >> >>> > > itemsize = self.dtype.itemsize >> >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >> >>> itemsize >> >>> > > max_row = len(self) >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): >> >>> > > end_row = min([start_row + nrowsinbuf, max_row]) >> >>> > > buf = table.read(start_row, end_row, 1, >> >>> field=self.pathname) >> >>> > > for row in buf: >> >>> > > yield row >> >>> > > >> >>> > > >> >>> > > I haven't tested this, but I think it will work. >> >>> > > >> >>> > > Josh >> >>> > > >> >>> > > >> >>> > > >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < >> dav...@gm...> >> >>> > wrote: >> >>> > > >> >>> > >> I apologize if I'm starting to sound helpless, but I'm forced to >> >>> work on >> >>> > >> Windows 7 at work and have never had luck compiling python source >> >>> > >> successfully. I have had to rely on precompiled binaries and now >> >>> its >> >>> > >> biting me in the butt. >> >>> > >> >> >>> > >> Is there any quick fix I can do to improve this iteration using >> >>> v2.4.0? >> >>> > >> >> >>> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> >>> > >> pyt...@li...> wrote: >> >>> > >> >> >>> > >>> Send Pytables-users mailing list submissions to >> >>> > >>> pyt...@li... >> >>> > >>> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit >> >>> > >>> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> or, via email, send a message with subject or body 'help' to >> >>> > >>> pyt...@li... >> >>> > >>> >> >>> > >>> You can reach the person managing the list at >> >>> > >>> pyt...@li... >> >>> > >>> >> >>> > >>> When replying, please edit your Subject line so it is more >> specific >> >>> > >>> than "Re: Contents of Pytables-users digest..." >> >>> > >>> >> >>> > >>> >> >>> > >>> Today's Topics: >> >>> > >>> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> ---------------------------------------------------------------------- >> >>> > >>> >> >>> > >>> Message: 1 >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> >>> > >>> From: David Reed <dav...@gm...> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> Issue >> >>> 2 >> >>> > >>> To: pyt...@li... >> >>> > >>> Message-ID: >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> >>> > >>> ev...@ma...> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> > >>> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't think >> that >> >>> > method >> >>> > >>> will work since this will only be comparing the ith element with >> >>> ith+1 >> >>> > >>> element. I still need 2 for loops right? >> >>> > >>> >> >>> > >>> Using itertools might speed things up though, I've never used >> them >> >>> so I >> >>> > >>> will give it a shot and let you know how it goes. Looks like I >> >>> need to >> >>> > >>> download the latest release before I do that too. Thanks for >> the >> >>> help. >> >>> > >>> >> >>> > >>> -Dave >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> >>> > >>> pyt...@li...> wrote: >> >>> > >>> >> >>> > >>> > Send Pytables-users mailing list submissions to >> >>> > >>> > pyt...@li... >> >>> > >>> > >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >>> > >>> > >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> > or, via email, send a message with subject or body 'help' to >> >>> > >>> > pyt...@li... >> >>> > >>> > >> >>> > >>> > You can reach the person managing the list at >> >>> > >>> > pyt...@li... >> >>> > >>> > >> >>> > >>> > When replying, please edit your Subject line so it is more >> >>> specific >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > Today's Topics: >> >>> > >>> > >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony >> >>> Scopatz) >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > >> ---------------------------------------------------------------------- >> >>> > >>> > >> >>> > >>> > Message: 1 >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> >>> PyTables >> >>> > >>> > To: Discussion list for PyTables >> >>> > >>> > <pyt...@li...> >> >>> > >>> > Message-ID: >> >>> > >>> > <CAPk-6T5b= >> >>> > >>> > 1EG...@ma...> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> >>> > >>> > >> >>> > >>> > HI David, >> >>> > >>> > >> >>> > >>> > Tables and table column iteration have been overhauled fairly >> >>> > recently >> >>> > >>> [1]. >> >>> > >>> > So you might try creating two iterators, offset by one, and >> then >> >>> > >>> doing the >> >>> > >>> > comparison. I am hacking this out super quick so please >> forgive >> >>> me: >> >>> > >>> > >> >>> > >>> > from itertools import izip >> >>> > >>> > >> >>> > >>> > with tb.openFile(...) as f: >> >>> > >>> > data = f.root.data >> >>> > >>> > data_i = iter(data) >> >>> > >>> > data_j = iter(data) >> >>> > >>> > data_i.next() # throw the first value away >> >>> > >>> > for i, j in izip(data_i, data_j): >> >>> > >>> > compare(i, j) >> >>> > >>> > >> >>> > >>> > You get the idea ;) >> >>> > >>> > >> >>> > >>> > Be Well >> >>> > >>> > Anthony >> >>> > >>> > >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >>> dav...@gm...> >> >>> > >>> wrote: >> >>> > >>> > >> >>> > >>> > > I was hoping someone could help me out here. >> >>> > >>> > > >> >>> > >>> > > This is from a post I put up on StackOverflow, >> >>> > >>> > > >> >>> > >>> > > I am have a fairly large dataset that I store in HDF5 and >> >>> access >> >>> > >>> using >> >>> > >>> > > PyTables. One operation I need to do on this dataset are >> >>> pairwise >> >>> > >>> > > comparisons between each of the elements. This requires 2 >> >>> loops, >> >>> > one >> >>> > >>> to >> >>> > >>> > > iterate over each element, and an inner loop to iterate over >> >>> every >> >>> > >>> other >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. >> >>> > >>> > > >> >>> > >>> > > For fairly small sets I found it to be faster to dump the >> >>> contents >> >>> > >>> into a >> >>> > >>> > > multdimensional numpy array and then do my iteration. I run >> >>> into >> >>> > >>> problems >> >>> > >>> > > with large sets because of memory issues and need to access >> >>> each >> >>> > >>> element >> >>> > >>> > of >> >>> > >>> > > the dataset at run time. >> >>> > >>> > > >> >>> > >>> > > Putting the elements into an array gives me about 600 >> >>> comparisons >> >>> > per >> >>> > >>> > > second, while operating on hdf5 data itself gives me about >> 300 >> >>> > >>> > comparisons >> >>> > >>> > > per second. >> >>> > >>> > > >> >>> > >>> > > Is there a way to speed this process up? >> >>> > >>> > > >> >>> > >>> > > Example follows (this is not my real code, just an example): >> >>> > >>> > > >> >>> > >>> > > *Small Set*: >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> >>> > >>> > > data = f.root.data >> >>> > >>> > > >> >>> > >>> > > N_elements = len(data) >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) >> >>> > >>> > > >> >>> > >>> > > for ii, d in enumerate(data): >> >>> > >>> > > elements[ii] = data['element'] >> >>> > >>> > > >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in >> >>> xrange(N_elements): >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> >>> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> >>> > >>> > > >> >>> > >>> > > *Large Set*: >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> >>> > >>> > > data = f.root.data >> >>> > >>> > > >> >>> > >>> > > N_elements = len(data) >> >>> > >>> > > >> >>> > >>> > > D = np.empty((N_irises, N_irises)) >> >>> > >>> > > for ii in xrange(N_elements): >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> >>> > >>> > > D[ii, jj] = compare(data['element'][ii], >> >>> > >>> > data['element'][jj]) >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > >> >>> > >>> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >>> HTML5, >> >>> > CSS, >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >>> > >>> current >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> >>> Microsoft >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> >>> > >>> > > _______________________________________________ >> >>> > >>> > > Pytables-users mailing list >> >>> > >>> > > Pyt...@li... >> >>> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > -------------- next part -------------- >> >>> > >>> > An HTML attachment was scrubbed... >> >>> > >>> > >> >>> > >>> > ------------------------------ >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >>> CSS, >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >>> > current >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >> >>> Microsoft >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >> >>> > >>> > >> >>> > >>> > ------------------------------ >> >>> > >>> > >> >>> > >>> > _______________________________________________ >> >>> > >>> > Pytables-users mailing list >> >>> > >>> > Pyt...@li... >> >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> >>> > >>> > ********************************************* >> >>> > >>> > >> >>> > >>> -------------- next part -------------- >> >>> > >>> An HTML attachment was scrubbed... >> >>> > >>> >> >>> > >>> ------------------------------ >> >>> > >>> >> >>> > >>> Message: 2 >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> >>> > >>> From: David Reed <dav...@gm...> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> Issue >> >>> 3 >> >>> > >>> To: pyt...@li... >> >>> > >>> Message-ID: >> >>> > >>> < >> >>> > >>> >> CAM...@ma... >> >>> > >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> > >>> >> >>> > >>> Thanks a lot for the help so far guys! >> >>> > >>> >> >>> > >>> Looking at itertools, I found what I believe to be the perfect >> >>> function >> >>> > >>> for >> >>> > >>> what I need, itertools.combinations. This appears to be a valid >> >>> > >>> replacement >> >>> > >>> to the method proposed. >> >>> > >>> >> >>> > >>> There is a small problem that I didn't mention is that my >> compare >> >>> > >>> function >> >>> > >>> actually takes as inputs 2 columns from the table. Like so: >> >>> > >>> >> >>> > >>> D = np.empty((N_irises, N_irises)) >> >>> > >>> for ii in xrange(N_elements): >> >>> > >>> for jj in xrange(ii+1, N_elements): >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], >> >>> > >>> data['element1'][jj],data['element2'][ii], >> >>> > >>> data['element2'][jj]) >> >>> > >>> >> >>> > >>> Is there an efficient way of using itertools with this >> structure? >> >>> > >>> >> >>> > >>> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> >>> > >>> pyt...@li...> wrote: >> >>> > >>> >> >>> > >>> > Send Pytables-users mailing list submissions to >> >>> > >>> > pyt...@li... >> >>> > >>> > >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >>> > >>> > >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> > or, via email, send a message with subject or body 'help' to >> >>> > >>> > pyt...@li... >> >>> > >>> > >> >>> > >>> > You can reach the person managing the list at >> >>> > >>> > pyt...@li... >> >>> > >>> > >> >>> > >>> > When replying, please edit your Subject line so it is more >> >>> specific >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > Today's Topics: >> >>> > >>> > >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > >> ---------------------------------------------------------------------- >> >>> > >>> > >> >>> > >>> > Message: 1 >> >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >>> > >>> > From: Josh Ayers <jos...@gm...> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> >>> PyTables >> >>> > >>> > To: Discussion list for PyTables >> >>> > >>> > <pyt...@li...> >> >>> > >>> > Message-ID: >> >>> > >>> > < >> >>> > >>> > >> >>> CAC...@ma...> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> >>> > >>> > >> >>> > >>> > David, >> >>> > >>> > >> >>> > >>> > The change in issue 27 was only for iteration over a >> >>> tables.Column >> >>> > >>> > instance. To use it, tweak Anthony's code as follows. This >> will >> >>> > >>> iterate >> >>> > >>> > over the "element" column, as in your original example. >> >>> > >>> > >> >>> > >>> > Note also that this will only work with the development >> version >> >>> of >> >>> > >>> PyTables >> >>> > >>> > available on github. It will be very slow using the released >> >>> v2.4.0. >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > from itertools import izip >> >>> > >>> > >> >>> > >>> > with tb.openFile(...) as f: >> >>> > >>> > data = f.root.data.cols.element >> >>> > >>> > data_i = iter(data) >> >>> > >>> > data_j = iter(data) >> >>> > >>> > data_i.next() # throw the first value away >> >>> > >>> > for i, j in izip(data_i, data_j): >> >>> > >>> > compare(i, j) >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > Hope that helps, >> >>> > >>> > Josh >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> >>> sc...@gm...> >> >>> > >>> wrote: >> >>> > >>> > >> >>> > >>> > > HI David, >> >>> > >>> > > >> >>> > >>> > > Tables and table column iteration have been overhauled >> fairly >> >>> > >>> recently >> >>> > >>> > > [1]. So you might try creating two iterators, offset by >> one, >> >>> and >> >>> > >>> then >> >>> > >>> > > doing the comparison. I am hacking this out super quick so >> >>> please >> >>> > >>> > forgive >> >>> > >>> > > me: >> >>> > >>> > > >> >>> > >>> > > from itertools import izip >> >>> > >>> > > >> >>> > >>> > > with tb.openFile(...) as f: >> >>> > >>> > > data = f.root.data >> >>> > >>> > > data_i = iter(data) >> >>> > >>> > > data_j = iter(data) >> >>> > >>> > > data_i.next() # throw the first value away >> >>> > >>> > > for i, j in izip(data_i, data_j): >> >>> > >>> > > compare(i, j) >> >>> > >>> > > >> >>> > >>> > > You get the idea ;) >> >>> > >>> > > >> >>> > >>> > > Be Well >> >>> > >>> > > Anthony >> >>> > >>> > > >> >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >>> dav...@gm... >> >>> > > >> >>> > >>> > wrote: >> >>> > >>> > > >> >>> > >>> > >> I was hoping someone could help me out here. >> >>> > >>> > >> >> >>> > >>> > >> This is from a post I put up on StackOverflow, >> >>> > >>> > >> >> >>> > >>> > >> I am have a fairly large dataset that I store in HDF5 and >> >>> access >> >>> > >>> using >> >>> > >>> > >> PyTables. One operation I need to do on this dataset are >> >>> pairwise >> >>> > >>> > >> comparisons between each of the elements. This requires 2 >> >>> loops, >> >>> > >>> one to >> >>> > >>> > >> iterate over each element, and an inner loop to iterate >> over >> >>> every >> >>> > >>> other >> >>> > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. >> >>> > >>> > >> >> >>> > >>> > >> For fairly small sets I found it to be faster to dump the >> >>> contents >> >>> > >>> into >> >>> > >>> > a >> >>> > >>> > >> multdimensional numpy array and then do my iteration. I run >> >>> into >> >>> > >>> > problems >> >>> > >>> > >> with large sets because of memory issues and need to access >> >>> each >> >>> > >>> > element of >> >>> > >>> > >> the dataset at run time. >> >>> > >>> > >> >> >>> > >>> > >> Putting the elements into an array gives me about 600 >> >>> comparisons >> >>> > >>> per >> >>> > >>> > >> second, while operating on hdf5 data itself gives me about >> 300 >> >>> > >>> > comparisons >> >>> > >>> > >> per second. >> >>> > >>> > >> >> >>> > >>> > >> Is there a way to speed this process up? >> >>> > >>> > >> >> >>> > >>> > >> Example follows (this is not my real code, just an >> example): >> >>> > >>> > >> >> >>> > >>> > >> *Small Set*: >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> >>> > >>> > >> data = f.root.data >> >>> > >>> > >> >> >>> > >>> > >> N_elements = len(data) >> >>> > >>> > >> elements = np.empty((N_irises, 1e5)) >> >>> > >>> > >> >> >>> > >>> > >> for ii, d in enumerate(data): >> >>> > >>> > >> elements[ii] = data['element'] >> >>> > >>> > >> >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in >> >>> xrange(N_elements): >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): >> >>> > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >>> > >>> > >> >> >>> > >>> > >> *Large Set*: >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> >>> > >>> > >> data = f.root.data >> >>> > >>> > >> >> >>> > >>> > >> N_elements = len(data) >> >>> > >>> > >> >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) >> >>> > >>> > >> for ii in xrange(N_elements): >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): >> >>> > >>> > >> D[ii, jj] = compare(data['element'][ii], >> >>> > >>> > data['element'][jj]) >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >>> > >>> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >>> HTML5, >> >>> > >>> CSS, >> >>> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >>> skills >> >>> > >>> current >> >>> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >>> Microsoft >> >>> > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >>> > >>> > >> _______________________________________________ >> >>> > >>> > >> Pytables-users mailing list >> >>> > >>> > >> Pyt...@li... >> >>> > >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > >> >>> > >>> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >>> HTML5, >> >>> > CSS, >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >>> > >>> current >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> >>> Microsoft >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> >>> > >>> > > _______________________________________________ >> >>> > >>> > > Pytables-users mailing list >> >>> > >>> > > Pyt...@li... >> >>> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> > > >> >>> > >>> > > >> >>> > >>> > -------------- next part -------------- >> >>> > >>> > An HTML attachment was scrubbed... >> >>> > >>> > >> >>> > >>> > ------------------------------ >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >>> CSS, >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >>> > current >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >> >>> Microsoft >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >> >>> > >>> > >> >>> > >>> > ------------------------------ >> >>> > >>> > >> >>> > >>> > _______________________________________________ >> >>> > >>> > Pytables-users mailing list >> >>> > >>> > Pyt...@li... >> >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 >> >>> > >>> > ********************************************* >> >>> > >>> > >> >>> > >>> -------------- next part -------------- >> >>> > >>> An HTML attachment was scrubbed... >> >>> > >>> >> >>> > >>> ------------------------------ >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> >>> CSS, >> >>> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >>> > >>> MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >>> http://p.sf.net/sfu/learnmore_122712 >> >>> > >>> >> >>> > >>> ------------------------------ >> >>> > >>> >> >>> > >>> _______________________________________________ >> >>> > >>> Pytables-users mailing list >> >>> > >>> Pyt...@li... >> >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >>> >> >>> > >>> >> >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 >> >>> > >>> ********************************************* >> >>> > >>> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> >>> CSS, >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >>> > >> _______________________________________________ >> >>> > >> Pytables-users mailing list >> >>> > >> Pyt...@li... >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >> >>> > >> >> >>> > > >> >>> > > >> >>> > > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >>> > > _______________________________________________ >> >>> > > Pytables-users mailing list >> >>> > > Pyt...@li... >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > > >> >>> > > >> >>> > -------------- next part -------------- >> >>> > An HTML attachment was scrubbed... >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > http://p.sf.net/sfu/learnmore_122712 >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > _______________________________________________ >> >>> > Pytables-users mailing list >> >>> > Pyt...@li... >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >>> > >> >>> > End of Pytables-users Digest, Vol 80, Issue 8 >> >>> > ********************************************* >> >>> > >> >>> -------------- next part -------------- >> >>> An HTML attachment was scrubbed... >> >>> >> >>> ------------------------------ >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and >> >>> much more. Get web development skills now with LearnDevNow - >> >>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and >> experts. >> >>> SALE $99.99 this month only -- learn more at: >> >>> http://p.sf.net/sfu/learnmore_122812 >> >>> >> >>> ------------------------------ >> >>> >> >>> _______________________________________________ >> >>> Pytables-users mailing list >> >>> Pyt...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> >> >>> >> >>> End of Pytables-users Digest, Vol 80, Issue 9 >> >>> ********************************************* >> >>> >> >> >> >> >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Everyone hates slow websites. So do we. >> > Make your web apps faster with AppDynamics >> > Download AppDynamics Lite for free today: >> > http://p.sf.net/sfu/appdyn_d2d_jan >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_jan >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 81, Issue 2 >> ********************************************* >> > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: David R. <dav...@gm...> - 2013-02-01 18:44:22
|
Hi Anthony, Thanks for the reply. I honestly don't know how to monitor my Python memory usage, but I'm sure that its caused by out of memory. I'm just trying to find out how to fix it. My HDF5 table has 4620 rows and the column I'm iterating over is a 17x9600 boolean matrix. The __iter__ method is preallocating an array that is this size which appears to be root of the error. I was hoping there is a fix somewhere in here to not have to do this preallocation. Thanks again. On Fri, Feb 1, 2013 at 11:12 AM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 1 Feb 2013 10:11:47 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 9 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hi David, > > Sorry, I haven't had a ton of time recently. You seem to be getting a > memory error on creating a numpy array. This kind of thing typically > happens when you are out of memory. Does this seem to be the case with > you? When this dies, is your memory usage at 100%? If so, this algorithm > might require a little tweaking... > > Be Well > Anthony > > > On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm...> wrote: > > > I'm still having problems with this one. I can't tell if this something > > dumb Im doing with itertools, or if its something in pytables. > > > > Would appreciate any help. > > > > Thanks > > > > > > On Wed, Jan 30, 2013 at 5:00 PM, David Reed <dav...@gm... > >wrote: > > > >> I think I have to reopen this issue. I have been running fine for > awhile > >> using the combinations method from itertools, but have recently run > into a > >> memory since I have recently quadrupled the size of the hdf file. > >> > >> Here is my code again: > >> > >> from itertools import combinations, izip > >> with tb.openFile(h5_all, 'r') as f: > >> irises = f.root.irises > >> > >> templates = f.root.irises.cols.templates > >> masks = f.root.irises.cols.masks1 > >> > >> N_irises = len(irises) > >> index = np.ones((20 * 480), np.bool) > >> > >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> D = np.empty((N_irises, N_irises)) > >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > >> range(N_irises)), 2): > >> # print ii > >> D[ii, jj] = ham_dist( > >> t1[8, index], > >> t2[:, index], > >> m1[8, index], > >> m2[:, index], > >> ) > >> > >> And here is the error: > >> > >> In [10]: get_hd3() > >> 10669890 Comparisons > >> > >> > --------------------------------------------------------------------------- > >> MemoryError Traceback (most recent call > >> last) > >> <ipython-input-10-cfb255ce7bd1> in <module>() > >> ----> 1 get_hd3() > >> > >> > >> 118 print '%i Comparisons' % (N_irises*(N_irises - > >> 1)/2) > >> 119 D = np.empty((N_irises, N_irises)) > >> --> 120 for (t1, m1, ii), (t2, m2, jj) in > >> combinations(izip(temp > >> lates, masks, range(N_irises)), 2): > >> 121 # print ii > >> 122 D[ii, jj] = ham_dist( > >> > >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) > >> 3274 for start_row in xrange(0, len(self), nrowsinbuf): > >> 3275 end_row = min([start_row + nrowsinbuf, max_row]) > >> -> 3276 buf = table.read(start_row, end_row, 1, > >> field=self.pathname) > >> > >> 3277 for row in buf: > >> 3278 yield row > >> > >> c:\python27\lib\site-packages\tables\table.pyc in read(self, start, > stop, > >> step, > >> field) > >> 1772 (start, stop, step) = self._processRangeRead(start, > stop, > >> step) > >> 1773 > >> -> 1774 arr = self._read(start, stop, step, field) > >> 1775 return internal_to_flavor(arr, self.flavor) > >> 1776 > >> > >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, > >> stop, step, > >> field) > >> 1719 if field: > >> 1720 # Create a container for the results > >> -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) > >> 1722 else: > >> 1723 # Recarray case > >> > >> MemoryError: > >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() > >> 1720 # Create a container for the results > >> -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) > >> 1722 else: > >> > >> Also, if you guys see any performance problems in my code, please let me > >> know. > >> > >> Thank you so much for the help. > >> > >> -Dave > >> > >> > >> On Fri, Jan 4, 2013 at 8:57 AM, < > >> pyt...@li...> wrote: > >> > >>> Send Pytables-users mailing list submissions to > >>> pyt...@li... > >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> or, via email, send a message with subject or body 'help' to > >>> pyt...@li... > >>> > >>> You can reach the person managing the list at > >>> pyt...@li... > >>> > >>> When replying, please edit your Subject line so it is more specific > >>> than "Re: Contents of Pytables-users digest..." > >>> > >>> > >>> Today's Topics: > >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) > >>> > >>> > >>> ---------------------------------------------------------------------- > >>> > >>> Message: 1 > >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > >>> From: David Reed <dav...@gm...> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8 > >>> To: pyt...@li... > >>> Message-ID: > >>> < > >>> CAM...@ma...> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> I can't thank you guys enough for the help. I was able to add the > >>> __iter__ > >>> function to the table.py file and everything seems to be working great! > >>> I'm not quite as fast as I was with iterating right of a matrix but > >>> pretty > >>> close. I was at 555 comparisons per second, and now im at 420. > >>> > >>> I handled the problem I mentioned earlier by doing this, and it seems > to > >>> work great: > >>> > >>> A = f.root.data.cols.A > >>> B = f.root.data.cols.B > >>> > >>> D = np.empty((len(A), len(A)) > >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, > range(len(A))), > >>> 2): > >>> D[ii, jj] = compare(a1, a2, b1, b2) > >>> > >>> Again, thanks a lot. > >>> > >>> -Dave > >>> > >>> > >>> On Thu, Jan 3, 2013 at 6:31 PM, < > >>> pyt...@li...> wrote: > >>> > >>> > Send Pytables-users mailing list submissions to > >>> > pyt...@li... > >>> > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > or, via email, send a message with subject or body 'help' to > >>> > pyt...@li... > >>> > > >>> > You can reach the person managing the list at > >>> > pyt...@li... > >>> > > >>> > When replying, please edit your Subject line so it is more specific > >>> > than "Re: Contents of Pytables-users digest..." > >>> > > >>> > > >>> > Today's Topics: > >>> > > >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) > >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) > >>> > > >>> > > >>> > > ---------------------------------------------------------------------- > >>> > > >>> > Message: 1 > >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > >>> > From: Anthony Scopatz <sc...@gm...> > >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > >>> > To: Discussion list for PyTables > >>> > <pyt...@li...> > >>> > Message-ID: > >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > >>> > Gz...@ma...> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > > >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> > >>> wrote: > >>> > > >>> > > Thanks a lot for the help so far guys! > >>> > > > >>> > > Looking at itertools, I found what I believe to be the perfect > >>> function > >>> > > for what I need, itertools.combinations. This appears to be a valid > >>> > > replacement to the method proposed. > >>> > > > >>> > > >>> > Yes, combinations is awesome! > >>> > > >>> > > >>> > > > >>> > > There is a small problem that I didn't mention is that my compare > >>> > function > >>> > > actually takes as inputs 2 columns from the table. Like so: > >>> > > > >>> > > D = np.empty((N_irises, N_irises)) > >>> > > for ii in xrange(N_elements): > >>> > > for jj in xrange(ii+1, N_elements): > >>> > > D[ii, jj] = compare(data['element1'][ii], > >>> > data['element1'][jj],data['element2'][ii], > >>> > > data['element2'][jj]) > >>> > > > >>> > > Is there an efficient way of using itertools with this structure? > >>> > > > >>> > > >>> > You can always make two other iterators for each column. Since you > >>> have > >>> > two columns you would have 4 iterators. I am not sure how fast this > is > >>> > going to be but I am confident that there is definitely a way to do > >>> this in > >>> > one for-loop, which is going to be way faster than nested loops. > >>> > > >>> > Be Well > >>> > Anthony > >>> > > >>> > > >>> > > > >>> > > > >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > >>> > > pyt...@li...> wrote: > >>> > > > >>> > >> Send Pytables-users mailing list submissions to > >>> > >> pyt...@li... > >>> > >> > >>> > >> To subscribe or unsubscribe via the World Wide Web, visit > >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >> or, via email, send a message with subject or body 'help' to > >>> > >> pyt...@li... > >>> > >> > >>> > >> You can reach the person managing the list at > >>> > >> pyt...@li... > >>> > >> > >>> > >> When replying, please edit your Subject line so it is more > specific > >>> > >> than "Re: Contents of Pytables-users digest..." > >>> > >> > >>> > >> > >>> > >> Today's Topics: > >>> > >> > >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > >>> > >> > >>> > >> > >>> > >> > >>> ---------------------------------------------------------------------- > >>> > >> > >>> > >> Message: 1 > >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >>> > >> From: Josh Ayers <jos...@gm...> > >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > >>> PyTables > >>> > >> To: Discussion list for PyTables > >>> > >> <pyt...@li...> > >>> > >> Message-ID: > >>> > >> < > >>> > >> > CAC...@ma...> > >>> > >> Content-Type: text/plain; charset="iso-8859-1" > >>> > >> > >>> > >> David, > >>> > >> > >>> > >> The change in issue 27 was only for iteration over a tables.Column > >>> > >> instance. To use it, tweak Anthony's code as follows. This will > >>> > iterate > >>> > >> over the "element" column, as in your original example. > >>> > >> > >>> > >> Note also that this will only work with the development version of > >>> > >> PyTables > >>> > >> available on github. It will be very slow using the released > >>> v2.4.0. > >>> > >> > >>> > >> > >>> > >> from itertools import izip > >>> > >> > >>> > >> with tb.openFile(...) as f: > >>> > >> data = f.root.data.cols.element > >>> > >> data_i = iter(data) > >>> > >> data_j = iter(data) > >>> > >> data_i.next() # throw the first value away > >>> > >> for i, j in izip(data_i, data_j): > >>> > >> compare(i, j) > >>> > >> > >>> > >> > >>> > >> Hope that helps, > >>> > >> Josh > >>> > >> > >>> > >> > >>> > >> > >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > sc...@gm...> > >>> > >> wrote: > >>> > >> > >>> > >> > HI David, > >>> > >> > > >>> > >> > Tables and table column iteration have been overhauled fairly > >>> recently > >>> > >> > [1]. So you might try creating two iterators, offset by one, > and > >>> then > >>> > >> > doing the comparison. I am hacking this out super quick so > please > >>> > >> forgive > >>> > >> > me: > >>> > >> > > >>> > >> > from itertools import izip > >>> > >> > > >>> > >> > with tb.openFile(...) as f: > >>> > >> > data = f.root.data > >>> > >> > data_i = iter(data) > >>> > >> > data_j = iter(data) > >>> > >> > data_i.next() # throw the first value away > >>> > >> > for i, j in izip(data_i, data_j): > >>> > >> > compare(i, j) > >>> > >> > > >>> > >> > You get the idea ;) > >>> > >> > > >>> > >> > Be Well > >>> > >> > Anthony > >>> > >> > > >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 > >>> > >> > > >>> > >> > > >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >>> dav...@gm...> > >>> > >> wrote: > >>> > >> > > >>> > >> >> I was hoping someone could help me out here. > >>> > >> >> > >>> > >> >> This is from a post I put up on StackOverflow, > >>> > >> >> > >>> > >> >> I am have a fairly large dataset that I store in HDF5 and > access > >>> > using > >>> > >> >> PyTables. One operation I need to do on this dataset are > pairwise > >>> > >> >> comparisons between each of the elements. This requires 2 > loops, > >>> one > >>> > to > >>> > >> >> iterate over each element, and an inner loop to iterate over > >>> every > >>> > >> other > >>> > >> >> element. This operation thus looks at N(N-1)/2 comparisons. > >>> > >> >> > >>> > >> >> For fairly small sets I found it to be faster to dump the > >>> contents > >>> > >> into a > >>> > >> >> multdimensional numpy array and then do my iteration. I run > into > >>> > >> problems > >>> > >> >> with large sets because of memory issues and need to access > each > >>> > >> element of > >>> > >> >> the dataset at run time. > >>> > >> >> > >>> > >> >> Putting the elements into an array gives me about 600 > >>> comparisons per > >>> > >> >> second, while operating on hdf5 data itself gives me about 300 > >>> > >> comparisons > >>> > >> >> per second. > >>> > >> >> > >>> > >> >> Is there a way to speed this process up? > >>> > >> >> > >>> > >> >> Example follows (this is not my real code, just an example): > >>> > >> >> > >>> > >> >> *Small Set*: > >>> > >> >> > >>> > >> >> > >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >>> > >> >> data = f.root.data > >>> > >> >> > >>> > >> >> N_elements = len(data) > >>> > >> >> elements = np.empty((N_irises, 1e5)) > >>> > >> >> > >>> > >> >> for ii, d in enumerate(data): > >>> > >> >> elements[ii] = data['element'] > >>> > >> >> > >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > xrange(N_elements): > >>> > >> >> for jj in xrange(ii+1, N_elements): > >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > >>> > >> >> > >>> > >> >> *Large Set*: > >>> > >> >> > >>> > >> >> > >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >>> > >> >> data = f.root.data > >>> > >> >> > >>> > >> >> N_elements = len(data) > >>> > >> >> > >>> > >> >> D = np.empty((N_irises, N_irises)) > >>> > >> >> for ii in xrange(N_elements): > >>> > >> >> for jj in xrange(ii+1, N_elements): > >>> > >> >> D[ii, jj] = compare(data['element'][ii], > >>> > >> data['element'][jj]) > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > HTML5, > >>> CSS, > >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> > current > >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by > >>> Microsoft > >>> > >> >> MVPs and experts. ON SALE this month only -- learn more at: > >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > >>> > >> >> _______________________________________________ > >>> > >> >> Pytables-users mailing list > >>> > >> >> Pyt...@li... > >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >> >> > >>> > >> >> > >>> > >> > > >>> > >> > > >>> > >> > > >>> > >> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >>> CSS, > >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> > current > >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > >>> > >> > MVPs and experts. ON SALE this month only -- learn more at: > >>> > >> > http://p.sf.net/sfu/learnmore_122712 > >>> > >> > _______________________________________________ > >>> > >> > Pytables-users mailing list > >>> > >> > Pyt...@li... > >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >> > > >>> > >> > > >>> > >> -------------- next part -------------- > >>> > >> An HTML attachment was scrubbed... > >>> > >> > >>> > >> ------------------------------ > >>> > >> > >>> > >> > >>> > >> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >>> CSS, > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >>> > >> http://p.sf.net/sfu/learnmore_122712 > >>> > >> > >>> > >> ------------------------------ > >>> > >> > >>> > >> _______________________________________________ > >>> > >> Pytables-users mailing list > >>> > >> Pyt...@li... > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >> > >>> > >> > >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >>> > >> ********************************************* > >>> > >> > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >>> > > http://p.sf.net/sfu/learnmore_122712 > >>> > > _______________________________________________ > >>> > > Pytables-users mailing list > >>> > > Pyt...@li... > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > > >>> > > > >>> > -------------- next part -------------- > >>> > An HTML attachment was scrubbed... > >>> > > >>> > ------------------------------ > >>> > > >>> > Message: 2 > >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > >>> > From: Anthony Scopatz <sc...@gm...> > >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 > >>> > To: Discussion list for PyTables > >>> > <pyt...@li...> > >>> > Message-ID: > >>> > < > >>> > CAP...@ma...> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > > >>> > Josh is right that you can just edit the code by hand (which works > but > >>> > sucks). > >>> > > >>> > However, on Windows -- on the rare occasion when I also have to > >>> develop on > >>> > it -- I typically use a distribution that includes a compiler, > cython, > >>> > hdf5, and pytables already and then I install my development version > >>> from > >>> > github OVER this. I recommend either EPD or Anaconda, though other > >>> > distributions listed here [1] might also work. > >>> > > >>> > Be well > >>> > Anthony > >>> > > >>> > 1. http://numfocus.org/projects-2/software-distributions/ > >>> > > >>> > > >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> > >>> wrote: > >>> > > >>> > > The change was in pure Python code, so you should be able to just > >>> paste > >>> > in > >>> > > the changes to your local copy. Start with the > table.Column.__iter__ > >>> > > method (lines 3296-3310) here. > >>> > > > >>> > > > >>> > > > >>> > > >>> > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > >>> > > > >>> > > It needs to be modified slightly because it uses some additional > >>> features > >>> > > that aren't available in the released version (the out=buf_slice > >>> argument > >>> > > to table.read). The following should work. > >>> > > > >>> > > def __iter__(self): > >>> > > table = self.table > >>> > > itemsize = self.dtype.itemsize > >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // > >>> itemsize > >>> > > max_row = len(self) > >>> > > for start_row in xrange(0, len(self), nrowsinbuf): > >>> > > end_row = min([start_row + nrowsinbuf, max_row]) > >>> > > buf = table.read(start_row, end_row, 1, > >>> field=self.pathname) > >>> > > for row in buf: > >>> > > yield row > >>> > > > >>> > > > >>> > > I haven't tested this, but I think it will work. > >>> > > > >>> > > Josh > >>> > > > >>> > > > >>> > > > >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm... > > > >>> > wrote: > >>> > > > >>> > >> I apologize if I'm starting to sound helpless, but I'm forced to > >>> work on > >>> > >> Windows 7 at work and have never had luck compiling python source > >>> > >> successfully. I have had to rely on precompiled binaries and now > >>> its > >>> > >> biting me in the butt. > >>> > >> > >>> > >> Is there any quick fix I can do to improve this iteration using > >>> v2.4.0? > >>> > >> > >>> > >> > >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > >>> > >> pyt...@li...> wrote: > >>> > >> > >>> > >>> Send Pytables-users mailing list submissions to > >>> > >>> pyt...@li... > >>> > >>> > >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >>> > >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> or, via email, send a message with subject or body 'help' to > >>> > >>> pyt...@li... > >>> > >>> > >>> > >>> You can reach the person managing the list at > >>> > >>> pyt...@li... > >>> > >>> > >>> > >>> When replying, please edit your Subject line so it is more > specific > >>> > >>> than "Re: Contents of Pytables-users digest..." > >>> > >>> > >>> > >>> > >>> > >>> Today's Topics: > >>> > >>> > >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) > >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) > >>> > >>> > >>> > >>> > >>> > >>> > >>> ---------------------------------------------------------------------- > >>> > >>> > >>> > >>> Message: 1 > >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > >>> > >>> From: David Reed <dav...@gm...> > >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > Issue > >>> 2 > >>> > >>> To: pyt...@li... > >>> > >>> Message-ID: > >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > >>> > >>> ev...@ma...> > >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> > >>> > >>> Thanks Anthony, but unless Im missing something I don't think > that > >>> > method > >>> > >>> will work since this will only be comparing the ith element with > >>> ith+1 > >>> > >>> element. I still need 2 for loops right? > >>> > >>> > >>> > >>> Using itertools might speed things up though, I've never used > them > >>> so I > >>> > >>> will give it a shot and let you know how it goes. Looks like I > >>> need to > >>> > >>> download the latest release before I do that too. Thanks for the > >>> help. > >>> > >>> > >>> > >>> -Dave > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > >>> > >>> pyt...@li...> wrote: > >>> > >>> > >>> > >>> > Send Pytables-users mailing list submissions to > >>> > >>> > pyt...@li... > >>> > >>> > > >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >>> > >>> > > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > or, via email, send a message with subject or body 'help' to > >>> > >>> > pyt...@li... > >>> > >>> > > >>> > >>> > You can reach the person managing the list at > >>> > >>> > pyt...@li... > >>> > >>> > > >>> > >>> > When replying, please edit your Subject line so it is more > >>> specific > >>> > >>> > than "Re: Contents of Pytables-users digest..." > >>> > >>> > > >>> > >>> > > >>> > >>> > Today's Topics: > >>> > >>> > > >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony > >>> Scopatz) > >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > > ---------------------------------------------------------------------- > >>> > >>> > > >>> > >>> > Message: 1 > >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > >>> > >>> > From: Anthony Scopatz <sc...@gm...> > >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > >>> PyTables > >>> > >>> > To: Discussion list for PyTables > >>> > >>> > <pyt...@li...> > >>> > >>> > Message-ID: > >>> > >>> > <CAPk-6T5b= > >>> > >>> > 1EG...@ma...> > >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> > > >>> > >>> > HI David, > >>> > >>> > > >>> > >>> > Tables and table column iteration have been overhauled fairly > >>> > recently > >>> > >>> [1]. > >>> > >>> > So you might try creating two iterators, offset by one, and > then > >>> > >>> doing the > >>> > >>> > comparison. I am hacking this out super quick so please > forgive > >>> me: > >>> > >>> > > >>> > >>> > from itertools import izip > >>> > >>> > > >>> > >>> > with tb.openFile(...) as f: > >>> > >>> > data = f.root.data > >>> > >>> > data_i = iter(data) > >>> > >>> > data_j = iter(data) > >>> > >>> > data_i.next() # throw the first value away > >>> > >>> > for i, j in izip(data_i, data_j): > >>> > >>> > compare(i, j) > >>> > >>> > > >>> > >>> > You get the idea ;) > >>> > >>> > > >>> > >>> > Be Well > >>> > >>> > Anthony > >>> > >>> > > >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > >>> > >>> > > >>> > >>> > > >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >>> dav...@gm...> > >>> > >>> wrote: > >>> > >>> > > >>> > >>> > > I was hoping someone could help me out here. > >>> > >>> > > > >>> > >>> > > This is from a post I put up on StackOverflow, > >>> > >>> > > > >>> > >>> > > I am have a fairly large dataset that I store in HDF5 and > >>> access > >>> > >>> using > >>> > >>> > > PyTables. One operation I need to do on this dataset are > >>> pairwise > >>> > >>> > > comparisons between each of the elements. This requires 2 > >>> loops, > >>> > one > >>> > >>> to > >>> > >>> > > iterate over each element, and an inner loop to iterate over > >>> every > >>> > >>> other > >>> > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. > >>> > >>> > > > >>> > >>> > > For fairly small sets I found it to be faster to dump the > >>> contents > >>> > >>> into a > >>> > >>> > > multdimensional numpy array and then do my iteration. I run > >>> into > >>> > >>> problems > >>> > >>> > > with large sets because of memory issues and need to access > >>> each > >>> > >>> element > >>> > >>> > of > >>> > >>> > > the dataset at run time. > >>> > >>> > > > >>> > >>> > > Putting the elements into an array gives me about 600 > >>> comparisons > >>> > per > >>> > >>> > > second, while operating on hdf5 data itself gives me about > 300 > >>> > >>> > comparisons > >>> > >>> > > per second. > >>> > >>> > > > >>> > >>> > > Is there a way to speed this process up? > >>> > >>> > > > >>> > >>> > > Example follows (this is not my real code, just an example): > >>> > >>> > > > >>> > >>> > > *Small Set*: > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > >>> > >>> > > data = f.root.data > >>> > >>> > > > >>> > >>> > > N_elements = len(data) > >>> > >>> > > elements = np.empty((N_irises, 1e5)) > >>> > >>> > > > >>> > >>> > > for ii, d in enumerate(data): > >>> > >>> > > elements[ii] = data['element'] > >>> > >>> > > > >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in > >>> xrange(N_elements): > >>> > >>> > > for jj in xrange(ii+1, N_elements): > >>> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) > >>> > >>> > > > >>> > >>> > > *Large Set*: > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > >>> > >>> > > data = f.root.data > >>> > >>> > > > >>> > >>> > > N_elements = len(data) > >>> > >>> > > > >>> > >>> > > D = np.empty((N_irises, N_irises)) > >>> > >>> > > for ii in xrange(N_elements): > >>> > >>> > > for jj in xrange(ii+1, N_elements): > >>> > >>> > > D[ii, jj] = compare(data['element'][ii], > >>> > >>> > data['element'][jj]) > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > >>> > >>> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >>> HTML5, > >>> > CSS, > >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > skills > >>> > >>> current > >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > >>> Microsoft > >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > >>> > >>> > > _______________________________________________ > >>> > >>> > > Pytables-users mailing list > >>> > >>> > > Pyt...@li... > >>> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > > > >>> > >>> > > > >>> > >>> > -------------- next part -------------- > >>> > >>> > An HTML attachment was scrubbed... > >>> > >>> > > >>> > >>> > ------------------------------ > >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > >>> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > HTML5, > >>> CSS, > >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> > current > >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by > >>> Microsoft > >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > >>> > >>> > > >>> > >>> > ------------------------------ > >>> > >>> > > >>> > >>> > _______________________________________________ > >>> > >>> > Pytables-users mailing list > >>> > >>> > Pyt...@li... > >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > > >>> > >>> > > >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > >>> > >>> > ********************************************* > >>> > >>> > > >>> > >>> -------------- next part -------------- > >>> > >>> An HTML attachment was scrubbed... > >>> > >>> > >>> > >>> ------------------------------ > >>> > >>> > >>> > >>> Message: 2 > >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > >>> > >>> From: David Reed <dav...@gm...> > >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > Issue > >>> 3 > >>> > >>> To: pyt...@li... > >>> > >>> Message-ID: > >>> > >>> < > >>> > >>> > CAM...@ma... > >>> > > >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> > >>> > >>> Thanks a lot for the help so far guys! > >>> > >>> > >>> > >>> Looking at itertools, I found what I believe to be the perfect > >>> function > >>> > >>> for > >>> > >>> what I need, itertools.combinations. This appears to be a valid > >>> > >>> replacement > >>> > >>> to the method proposed. > >>> > >>> > >>> > >>> There is a small problem that I didn't mention is that my compare > >>> > >>> function > >>> > >>> actually takes as inputs 2 columns from the table. Like so: > >>> > >>> > >>> > >>> D = np.empty((N_irises, N_irises)) > >>> > >>> for ii in xrange(N_elements): > >>> > >>> for jj in xrange(ii+1, N_elements): > >>> > >>> D[ii, jj] = compare(data['element1'][ii], > >>> > >>> data['element1'][jj],data['element2'][ii], > >>> > >>> data['element2'][jj]) > >>> > >>> > >>> > >>> Is there an efficient way of using itertools with this structure? > >>> > >>> > >>> > >>> > >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > >>> > >>> pyt...@li...> wrote: > >>> > >>> > >>> > >>> > Send Pytables-users mailing list submissions to > >>> > >>> > pyt...@li... > >>> > >>> > > >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >>> > >>> > > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > or, via email, send a message with subject or body 'help' to > >>> > >>> > pyt...@li... > >>> > >>> > > >>> > >>> > You can reach the person managing the list at > >>> > >>> > pyt...@li... > >>> > >>> > > >>> > >>> > When replying, please edit your Subject line so it is more > >>> specific > >>> > >>> > than "Re: Contents of Pytables-users digest..." > >>> > >>> > > >>> > >>> > > >>> > >>> > Today's Topics: > >>> > >>> > > >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > > ---------------------------------------------------------------------- > >>> > >>> > > >>> > >>> > Message: 1 > >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > >>> > >>> > From: Josh Ayers <jos...@gm...> > >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > >>> PyTables > >>> > >>> > To: Discussion list for PyTables > >>> > >>> > <pyt...@li...> > >>> > >>> > Message-ID: > >>> > >>> > < > >>> > >>> > > >>> CAC...@ma...> > >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> > > >>> > >>> > David, > >>> > >>> > > >>> > >>> > The change in issue 27 was only for iteration over a > >>> tables.Column > >>> > >>> > instance. To use it, tweak Anthony's code as follows. This > will > >>> > >>> iterate > >>> > >>> > over the "element" column, as in your original example. > >>> > >>> > > >>> > >>> > Note also that this will only work with the development version > >>> of > >>> > >>> PyTables > >>> > >>> > available on github. It will be very slow using the released > >>> v2.4.0. > >>> > >>> > > >>> > >>> > > >>> > >>> > from itertools import izip > >>> > >>> > > >>> > >>> > with tb.openFile(...) as f: > >>> > >>> > data = f.root.data.cols.element > >>> > >>> > data_i = iter(data) > >>> > >>> > data_j = iter(data) > >>> > >>> > data_i.next() # throw the first value away > >>> > >>> > for i, j in izip(data_i, data_j): > >>> > >>> > compare(i, j) > >>> > >>> > > >>> > >>> > > >>> > >>> > Hope that helps, > >>> > >>> > Josh > >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >>> sc...@gm...> > >>> > >>> wrote: > >>> > >>> > > >>> > >>> > > HI David, > >>> > >>> > > > >>> > >>> > > Tables and table column iteration have been overhauled fairly > >>> > >>> recently > >>> > >>> > > [1]. So you might try creating two iterators, offset by one, > >>> and > >>> > >>> then > >>> > >>> > > doing the comparison. I am hacking this out super quick so > >>> please > >>> > >>> > forgive > >>> > >>> > > me: > >>> > >>> > > > >>> > >>> > > from itertools import izip > >>> > >>> > > > >>> > >>> > > with tb.openFile(...) as f: > >>> > >>> > > data = f.root.data > >>> > >>> > > data_i = iter(data) > >>> > >>> > > data_j = iter(data) > >>> > >>> > > data_i.next() # throw the first value away > >>> > >>> > > for i, j in izip(data_i, data_j): > >>> > >>> > > compare(i, j) > >>> > >>> > > > >>> > >>> > > You get the idea ;) > >>> > >>> > > > >>> > >>> > > Be Well > >>> > >>> > > Anthony > >>> > >>> > > > >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >>> dav...@gm... > >>> > > > >>> > >>> > wrote: > >>> > >>> > > > >>> > >>> > >> I was hoping someone could help me out here. > >>> > >>> > >> > >>> > >>> > >> This is from a post I put up on StackOverflow, > >>> > >>> > >> > >>> > >>> > >> I am have a fairly large dataset that I store in HDF5 and > >>> access > >>> > >>> using > >>> > >>> > >> PyTables. One operation I need to do on this dataset are > >>> pairwise > >>> > >>> > >> comparisons between each of the elements. This requires 2 > >>> loops, > >>> > >>> one to > >>> > >>> > >> iterate over each element, and an inner loop to iterate over > >>> every > >>> > >>> other > >>> > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. > >>> > >>> > >> > >>> > >>> > >> For fairly small sets I found it to be faster to dump the > >>> contents > >>> > >>> into > >>> > >>> > a > >>> > >>> > >> multdimensional numpy array and then do my iteration. I run > >>> into > >>> > >>> > problems > >>> > >>> > >> with large sets because of memory issues and need to access > >>> each > >>> > >>> > element of > >>> > >>> > >> the dataset at run time. > >>> > >>> > >> > >>> > >>> > >> Putting the elements into an array gives me about 600 > >>> comparisons > >>> > >>> per > >>> > >>> > >> second, while operating on hdf5 data itself gives me about > 300 > >>> > >>> > comparisons > >>> > >>> > >> per second. > >>> > >>> > >> > >>> > >>> > >> Is there a way to speed this process up? > >>> > >>> > >> > >>> > >>> > >> Example follows (this is not my real code, just an example): > >>> > >>> > >> > >>> > >>> > >> *Small Set*: > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >>> > >>> > >> data = f.root.data > >>> > >>> > >> > >>> > >>> > >> N_elements = len(data) > >>> > >>> > >> elements = np.empty((N_irises, 1e5)) > >>> > >>> > >> > >>> > >>> > >> for ii, d in enumerate(data): > >>> > >>> > >> elements[ii] = data['element'] > >>> > >>> > >> > >>> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in > >>> xrange(N_elements): > >>> > >>> > >> for jj in xrange(ii+1, N_elements): > >>> > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) > >>> > >>> > >> > >>> > >>> > >> *Large Set*: > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >>> > >>> > >> data = f.root.data > >>> > >>> > >> > >>> > >>> > >> N_elements = len(data) > >>> > >>> > >> > >>> > >>> > >> D = np.empty((N_irises, N_irises)) > >>> > >>> > >> for ii in xrange(N_elements): > >>> > >>> > >> for jj in xrange(ii+1, N_elements): > >>> > >>> > >> D[ii, jj] = compare(data['element'][ii], > >>> > >>> > data['element'][jj]) > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > > >>> > >>> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >>> HTML5, > >>> > >>> CSS, > >>> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your > >>> skills > >>> > >>> current > >>> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > >>> Microsoft > >>> > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >>> > >>> > >> http://p.sf.net/sfu/learnmore_122712 > >>> > >>> > >> _______________________________________________ > >>> > >>> > >> Pytables-users mailing list > >>> > >>> > >> Pyt...@li... > >>> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > >>> > >>> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > >>> HTML5, > >>> > CSS, > >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > skills > >>> > >>> current > >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > >>> Microsoft > >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > >>> > >>> > > _______________________________________________ > >>> > >>> > > Pytables-users mailing list > >>> > >>> > > Pyt...@li... > >>> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > > > >>> > >>> > > > >>> > >>> > -------------- next part -------------- > >>> > >>> > An HTML attachment was scrubbed... > >>> > >>> > > >>> > >>> > ------------------------------ > >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > >>> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, > HTML5, > >>> CSS, > >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> > current > >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by > >>> Microsoft > >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > >>> > >>> > > >>> > >>> > ------------------------------ > >>> > >>> > > >>> > >>> > _______________________________________________ > >>> > >>> > Pytables-users mailing list > >>> > >>> > Pyt...@li... > >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > > >>> > >>> > > >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 > >>> > >>> > ********************************************* > >>> > >>> > > >>> > >>> -------------- next part -------------- > >>> > >>> An HTML attachment was scrubbed... > >>> > >>> > >>> > >>> ------------------------------ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >>> CSS, > >>> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > >>> > >>> MVPs and experts. ON SALE this month only -- learn more at: > >>> > >>> http://p.sf.net/sfu/learnmore_122712 > >>> > >>> > >>> > >>> ------------------------------ > >>> > >>> > >>> > >>> _______________________________________________ > >>> > >>> Pytables-users mailing list > >>> > >>> Pyt...@li... > >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > >>> > >>> > >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 > >>> > >>> ********************************************* > >>> > >>> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >>> CSS, > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >>> > >> http://p.sf.net/sfu/learnmore_122712 > >>> > >> _______________________________________________ > >>> > >> Pytables-users mailing list > >>> > >> Pyt...@li... > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >> > >>> > >> > >>> > > > >>> > > > >>> > > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >>> > > http://p.sf.net/sfu/learnmore_122712 > >>> > > _______________________________________________ > >>> > > Pytables-users mailing list > >>> > > Pyt...@li... > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > > >>> > > > >>> > -------------- next part -------------- > >>> > An HTML attachment was scrubbed... > >>> > > >>> > ------------------------------ > >>> > > >>> > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >>> > http://p.sf.net/sfu/learnmore_122712 > >>> > > >>> > ------------------------------ > >>> > > >>> > _______________________________________________ > >>> > Pytables-users mailing list > >>> > Pyt...@li... > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > >>> > > >>> > End of Pytables-users Digest, Vol 80, Issue 8 > >>> > ********************************************* > >>> > > >>> -------------- next part -------------- > >>> An HTML attachment was scrubbed... > >>> > >>> ------------------------------ > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > >>> much more. Get web development skills now with LearnDevNow - > >>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and > experts. > >>> SALE $99.99 this month only -- learn more at: > >>> http://p.sf.net/sfu/learnmore_122812 > >>> > >>> ------------------------------ > >>> > >>> _______________________________________________ > >>> Pytables-users mailing list > >>> Pyt...@li... > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > >>> End of Pytables-users Digest, Vol 80, Issue 9 > >>> ********************************************* > >>> > >> > >> > > > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_jan > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 81, Issue 2 > ********************************************* > |
From: Anthony S. <sc...@gm...> - 2013-02-01 16:12:17
|
Hi David, Sorry, I haven't had a ton of time recently. You seem to be getting a memory error on creating a numpy array. This kind of thing typically happens when you are out of memory. Does this seem to be the case with you? When this dies, is your memory usage at 100%? If so, this algorithm might require a little tweaking... Be Well Anthony On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm...> wrote: > I'm still having problems with this one. I can't tell if this something > dumb Im doing with itertools, or if its something in pytables. > > Would appreciate any help. > > Thanks > > > On Wed, Jan 30, 2013 at 5:00 PM, David Reed <dav...@gm...>wrote: > >> I think I have to reopen this issue. I have been running fine for awhile >> using the combinations method from itertools, but have recently run into a >> memory since I have recently quadrupled the size of the hdf file. >> >> Here is my code again: >> >> from itertools import combinations, izip >> with tb.openFile(h5_all, 'r') as f: >> irises = f.root.irises >> >> templates = f.root.irises.cols.templates >> masks = f.root.irises.cols.masks1 >> >> N_irises = len(irises) >> index = np.ones((20 * 480), np.bool) >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> D = np.empty((N_irises, N_irises)) >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, >> range(N_irises)), 2): >> # print ii >> D[ii, jj] = ham_dist( >> t1[8, index], >> t2[:, index], >> m1[8, index], >> m2[:, index], >> ) >> >> And here is the error: >> >> In [10]: get_hd3() >> 10669890 Comparisons >> >> --------------------------------------------------------------------------- >> MemoryError Traceback (most recent call >> last) >> <ipython-input-10-cfb255ce7bd1> in <module>() >> ----> 1 get_hd3() >> >> >> 118 print '%i Comparisons' % (N_irises*(N_irises - >> 1)/2) >> 119 D = np.empty((N_irises, N_irises)) >> --> 120 for (t1, m1, ii), (t2, m2, jj) in >> combinations(izip(temp >> lates, masks, range(N_irises)), 2): >> 121 # print ii >> 122 D[ii, jj] = ham_dist( >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) >> 3274 for start_row in xrange(0, len(self), nrowsinbuf): >> 3275 end_row = min([start_row + nrowsinbuf, max_row]) >> -> 3276 buf = table.read(start_row, end_row, 1, >> field=self.pathname) >> >> 3277 for row in buf: >> 3278 yield row >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, start, stop, >> step, >> field) >> 1772 (start, stop, step) = self._processRangeRead(start, stop, >> step) >> 1773 >> -> 1774 arr = self._read(start, stop, step, field) >> 1775 return internal_to_flavor(arr, self.flavor) >> 1776 >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, >> stop, step, >> field) >> 1719 if field: >> 1720 # Create a container for the results >> -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) >> 1722 else: >> 1723 # Recarray case >> >> MemoryError: >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() >> 1720 # Create a container for the results >> -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) >> 1722 else: >> >> Also, if you guys see any performance problems in my code, please let me >> know. >> >> Thank you so much for the help. >> >> -Dave >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> pyt...@li...> wrote: >> >>> Send Pytables-users mailing list submissions to >>> pyt...@li... >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> or, via email, send a message with subject or body 'help' to >>> pyt...@li... >>> >>> You can reach the person managing the list at >>> pyt...@li... >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of Pytables-users digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >>> From: David Reed <dav...@gm...> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8 >>> To: pyt...@li... >>> Message-ID: >>> < >>> CAM...@ma...> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> I can't thank you guys enough for the help. I was able to add the >>> __iter__ >>> function to the table.py file and everything seems to be working great! >>> I'm not quite as fast as I was with iterating right of a matrix but >>> pretty >>> close. I was at 555 comparisons per second, and now im at 420. >>> >>> I handled the problem I mentioned earlier by doing this, and it seems to >>> work great: >>> >>> A = f.root.data.cols.A >>> B = f.root.data.cols.B >>> >>> D = np.empty((len(A), len(A)) >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), >>> 2): >>> D[ii, jj] = compare(a1, a2, b1, b2) >>> >>> Again, thanks a lot. >>> >>> -Dave >>> >>> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >>> pyt...@li...> wrote: >>> >>> > Send Pytables-users mailing list submissions to >>> > pyt...@li... >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > or, via email, send a message with subject or body 'help' to >>> > pyt...@li... >>> > >>> > You can reach the person managing the list at >>> > pyt...@li... >>> > >>> > When replying, please edit your Subject line so it is more specific >>> > than "Re: Contents of Pytables-users digest..." >>> > >>> > >>> > Today's Topics: >>> > >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) >>> > >>> > >>> > ---------------------------------------------------------------------- >>> > >>> > Message: 1 >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >>> > From: Anthony Scopatz <sc...@gm...> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >>> > To: Discussion list for PyTables >>> > <pyt...@li...> >>> > Message-ID: >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >>> > Gz...@ma...> >>> > Content-Type: text/plain; charset="iso-8859-1" >>> > >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> >>> wrote: >>> > >>> > > Thanks a lot for the help so far guys! >>> > > >>> > > Looking at itertools, I found what I believe to be the perfect >>> function >>> > > for what I need, itertools.combinations. This appears to be a valid >>> > > replacement to the method proposed. >>> > > >>> > >>> > Yes, combinations is awesome! >>> > >>> > >>> > > >>> > > There is a small problem that I didn't mention is that my compare >>> > function >>> > > actually takes as inputs 2 columns from the table. Like so: >>> > > >>> > > D = np.empty((N_irises, N_irises)) >>> > > for ii in xrange(N_elements): >>> > > for jj in xrange(ii+1, N_elements): >>> > > D[ii, jj] = compare(data['element1'][ii], >>> > data['element1'][jj],data['element2'][ii], >>> > > data['element2'][jj]) >>> > > >>> > > Is there an efficient way of using itertools with this structure? >>> > > >>> > >>> > You can always make two other iterators for each column. Since you >>> have >>> > two columns you would have 4 iterators. I am not sure how fast this is >>> > going to be but I am confident that there is definitely a way to do >>> this in >>> > one for-loop, which is going to be way faster than nested loops. >>> > >>> > Be Well >>> > Anthony >>> > >>> > >>> > > >>> > > >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >>> > > pyt...@li...> wrote: >>> > > >>> > >> Send Pytables-users mailing list submissions to >>> > >> pyt...@li... >>> > >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >> or, via email, send a message with subject or body 'help' to >>> > >> pyt...@li... >>> > >> >>> > >> You can reach the person managing the list at >>> > >> pyt...@li... >>> > >> >>> > >> When replying, please edit your Subject line so it is more specific >>> > >> than "Re: Contents of Pytables-users digest..." >>> > >> >>> > >> >>> > >> Today's Topics: >>> > >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >>> > >> >>> > >> >>> > >> >>> ---------------------------------------------------------------------- >>> > >> >>> > >> Message: 1 >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >>> > >> From: Josh Ayers <jos...@gm...> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >>> PyTables >>> > >> To: Discussion list for PyTables >>> > >> <pyt...@li...> >>> > >> Message-ID: >>> > >> < >>> > >> CAC...@ma...> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >>> > >> >>> > >> David, >>> > >> >>> > >> The change in issue 27 was only for iteration over a tables.Column >>> > >> instance. To use it, tweak Anthony's code as follows. This will >>> > iterate >>> > >> over the "element" column, as in your original example. >>> > >> >>> > >> Note also that this will only work with the development version of >>> > >> PyTables >>> > >> available on github. It will be very slow using the released >>> v2.4.0. >>> > >> >>> > >> >>> > >> from itertools import izip >>> > >> >>> > >> with tb.openFile(...) as f: >>> > >> data = f.root.data.cols.element >>> > >> data_i = iter(data) >>> > >> data_j = iter(data) >>> > >> data_i.next() # throw the first value away >>> > >> for i, j in izip(data_i, data_j): >>> > >> compare(i, j) >>> > >> >>> > >> >>> > >> Hope that helps, >>> > >> Josh >>> > >> >>> > >> >>> > >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >>> > >> wrote: >>> > >> >>> > >> > HI David, >>> > >> > >>> > >> > Tables and table column iteration have been overhauled fairly >>> recently >>> > >> > [1]. So you might try creating two iterators, offset by one, and >>> then >>> > >> > doing the comparison. I am hacking this out super quick so please >>> > >> forgive >>> > >> > me: >>> > >> > >>> > >> > from itertools import izip >>> > >> > >>> > >> > with tb.openFile(...) as f: >>> > >> > data = f.root.data >>> > >> > data_i = iter(data) >>> > >> > data_j = iter(data) >>> > >> > data_i.next() # throw the first value away >>> > >> > for i, j in izip(data_i, data_j): >>> > >> > compare(i, j) >>> > >> > >>> > >> > You get the idea ;) >>> > >> > >>> > >> > Be Well >>> > >> > Anthony >>> > >> > >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >>> > >> > >>> > >> > >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >>> dav...@gm...> >>> > >> wrote: >>> > >> > >>> > >> >> I was hoping someone could help me out here. >>> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, >>> > >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 and access >>> > using >>> > >> >> PyTables. One operation I need to do on this dataset are pairwise >>> > >> >> comparisons between each of the elements. This requires 2 loops, >>> one >>> > to >>> > >> >> iterate over each element, and an inner loop to iterate over >>> every >>> > >> other >>> > >> >> element. This operation thus looks at N(N-1)/2 comparisons. >>> > >> >> >>> > >> >> For fairly small sets I found it to be faster to dump the >>> contents >>> > >> into a >>> > >> >> multdimensional numpy array and then do my iteration. I run into >>> > >> problems >>> > >> >> with large sets because of memory issues and need to access each >>> > >> element of >>> > >> >> the dataset at run time. >>> > >> >> >>> > >> >> Putting the elements into an array gives me about 600 >>> comparisons per >>> > >> >> second, while operating on hdf5 data itself gives me about 300 >>> > >> comparisons >>> > >> >> per second. >>> > >> >> >>> > >> >> Is there a way to speed this process up? >>> > >> >> >>> > >> >> Example follows (this is not my real code, just an example): >>> > >> >> >>> > >> >> *Small Set*: >>> > >> >> >>> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >>> > >> >> data = f.root.data >>> > >> >> >>> > >> >> N_elements = len(data) >>> > >> >> elements = np.empty((N_irises, 1e5)) >>> > >> >> >>> > >> >> for ii, d in enumerate(data): >>> > >> >> elements[ii] = data['element'] >>> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >>> > >> >> for jj in xrange(ii+1, N_elements): >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >>> > >> >> >>> > >> >> *Large Set*: >>> > >> >> >>> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >>> > >> >> data = f.root.data >>> > >> >> >>> > >> >> N_elements = len(data) >>> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >>> > >> >> for ii in xrange(N_elements): >>> > >> >> for jj in xrange(ii+1, N_elements): >>> > >> >> D[ii, jj] = compare(data['element'][ii], >>> > >> data['element'][jj]) >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >>> > >>> ------------------------------------------------------------------------------ >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>> CSS, >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> > current >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by >>> Microsoft >>> > >> >> MVPs and experts. ON SALE this month only -- learn more at: >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >>> > >> >> _______________________________________________ >>> > >> >> Pytables-users mailing list >>> > >> >> Pyt...@li... >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >> >> >>> > >> >> >>> > >> > >>> > >> > >>> > >> > >>> > >> >>> > >>> ------------------------------------------------------------------------------ >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>> CSS, >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> > current >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > >> > MVPs and experts. ON SALE this month only -- learn more at: >>> > >> > http://p.sf.net/sfu/learnmore_122712 >>> > >> > _______________________________________________ >>> > >> > Pytables-users mailing list >>> > >> > Pyt...@li... >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >> > >>> > >> > >>> > >> -------------- next part -------------- >>> > >> An HTML attachment was scrubbed... >>> > >> >>> > >> ------------------------------ >>> > >> >>> > >> >>> > >> >>> > >>> ------------------------------------------------------------------------------ >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>> CSS, >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> current >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >>> > >> http://p.sf.net/sfu/learnmore_122712 >>> > >> >>> > >> ------------------------------ >>> > >> >>> > >> _______________________________________________ >>> > >> Pytables-users mailing list >>> > >> Pyt...@li... >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >> >>> > >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >>> > >> ********************************************* >>> > >> >>> > > >>> > > >>> > > >>> > > >>> > >>> ------------------------------------------------------------------------------ >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> current >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > > MVPs and experts. ON SALE this month only -- learn more at: >>> > > http://p.sf.net/sfu/learnmore_122712 >>> > > _______________________________________________ >>> > > Pytables-users mailing list >>> > > Pyt...@li... >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > > >>> > > >>> > -------------- next part -------------- >>> > An HTML attachment was scrubbed... >>> > >>> > ------------------------------ >>> > >>> > Message: 2 >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >>> > From: Anthony Scopatz <sc...@gm...> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 >>> > To: Discussion list for PyTables >>> > <pyt...@li...> >>> > Message-ID: >>> > < >>> > CAP...@ma...> >>> > Content-Type: text/plain; charset="iso-8859-1" >>> > >>> > Josh is right that you can just edit the code by hand (which works but >>> > sucks). >>> > >>> > However, on Windows -- on the rare occasion when I also have to >>> develop on >>> > it -- I typically use a distribution that includes a compiler, cython, >>> > hdf5, and pytables already and then I install my development version >>> from >>> > github OVER this. I recommend either EPD or Anaconda, though other >>> > distributions listed here [1] might also work. >>> > >>> > Be well >>> > Anthony >>> > >>> > 1. http://numfocus.org/projects-2/software-distributions/ >>> > >>> > >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> >>> wrote: >>> > >>> > > The change was in pure Python code, so you should be able to just >>> paste >>> > in >>> > > the changes to your local copy. Start with the table.Column.__iter__ >>> > > method (lines 3296-3310) here. >>> > > >>> > > >>> > > >>> > >>> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >>> > > >>> > > It needs to be modified slightly because it uses some additional >>> features >>> > > that aren't available in the released version (the out=buf_slice >>> argument >>> > > to table.read). The following should work. >>> > > >>> > > def __iter__(self): >>> > > table = self.table >>> > > itemsize = self.dtype.itemsize >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >>> itemsize >>> > > max_row = len(self) >>> > > for start_row in xrange(0, len(self), nrowsinbuf): >>> > > end_row = min([start_row + nrowsinbuf, max_row]) >>> > > buf = table.read(start_row, end_row, 1, >>> field=self.pathname) >>> > > for row in buf: >>> > > yield row >>> > > >>> > > >>> > > I haven't tested this, but I think it will work. >>> > > >>> > > Josh >>> > > >>> > > >>> > > >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> >>> > wrote: >>> > > >>> > >> I apologize if I'm starting to sound helpless, but I'm forced to >>> work on >>> > >> Windows 7 at work and have never had luck compiling python source >>> > >> successfully. I have had to rely on precompiled binaries and now >>> its >>> > >> biting me in the butt. >>> > >> >>> > >> Is there any quick fix I can do to improve this iteration using >>> v2.4.0? >>> > >> >>> > >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >>> > >> pyt...@li...> wrote: >>> > >> >>> > >>> Send Pytables-users mailing list submissions to >>> > >>> pyt...@li... >>> > >>> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit >>> > >>> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> or, via email, send a message with subject or body 'help' to >>> > >>> pyt...@li... >>> > >>> >>> > >>> You can reach the person managing the list at >>> > >>> pyt...@li... >>> > >>> >>> > >>> When replying, please edit your Subject line so it is more specific >>> > >>> than "Re: Contents of Pytables-users digest..." >>> > >>> >>> > >>> >>> > >>> Today's Topics: >>> > >>> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >>> > >>> >>> > >>> >>> > >>> >>> ---------------------------------------------------------------------- >>> > >>> >>> > >>> Message: 1 >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >>> > >>> From: David Reed <dav...@gm...> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue >>> 2 >>> > >>> To: pyt...@li... >>> > >>> Message-ID: >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >>> > >>> ev...@ma...> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >>> > >>> >>> > >>> Thanks Anthony, but unless Im missing something I don't think that >>> > method >>> > >>> will work since this will only be comparing the ith element with >>> ith+1 >>> > >>> element. I still need 2 for loops right? >>> > >>> >>> > >>> Using itertools might speed things up though, I've never used them >>> so I >>> > >>> will give it a shot and let you know how it goes. Looks like I >>> need to >>> > >>> download the latest release before I do that too. Thanks for the >>> help. >>> > >>> >>> > >>> -Dave >>> > >>> >>> > >>> >>> > >>> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >>> > >>> pyt...@li...> wrote: >>> > >>> >>> > >>> > Send Pytables-users mailing list submissions to >>> > >>> > pyt...@li... >>> > >>> > >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >>> > >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > or, via email, send a message with subject or body 'help' to >>> > >>> > pyt...@li... >>> > >>> > >>> > >>> > You can reach the person managing the list at >>> > >>> > pyt...@li... >>> > >>> > >>> > >>> > When replying, please edit your Subject line so it is more >>> specific >>> > >>> > than "Re: Contents of Pytables-users digest..." >>> > >>> > >>> > >>> > >>> > >>> > Today's Topics: >>> > >>> > >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony >>> Scopatz) >>> > >>> > >>> > >>> > >>> > >>> > >>> > ---------------------------------------------------------------------- >>> > >>> > >>> > >>> > Message: 1 >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >>> > >>> > From: Anthony Scopatz <sc...@gm...> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >>> PyTables >>> > >>> > To: Discussion list for PyTables >>> > >>> > <pyt...@li...> >>> > >>> > Message-ID: >>> > >>> > <CAPk-6T5b= >>> > >>> > 1EG...@ma...> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >>> > >>> > >>> > >>> > HI David, >>> > >>> > >>> > >>> > Tables and table column iteration have been overhauled fairly >>> > recently >>> > >>> [1]. >>> > >>> > So you might try creating two iterators, offset by one, and then >>> > >>> doing the >>> > >>> > comparison. I am hacking this out super quick so please forgive >>> me: >>> > >>> > >>> > >>> > from itertools import izip >>> > >>> > >>> > >>> > with tb.openFile(...) as f: >>> > >>> > data = f.root.data >>> > >>> > data_i = iter(data) >>> > >>> > data_j = iter(data) >>> > >>> > data_i.next() # throw the first value away >>> > >>> > for i, j in izip(data_i, data_j): >>> > >>> > compare(i, j) >>> > >>> > >>> > >>> > You get the idea ;) >>> > >>> > >>> > >>> > Be Well >>> > >>> > Anthony >>> > >>> > >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >>> > >>> > >>> > >>> > >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >>> dav...@gm...> >>> > >>> wrote: >>> > >>> > >>> > >>> > > I was hoping someone could help me out here. >>> > >>> > > >>> > >>> > > This is from a post I put up on StackOverflow, >>> > >>> > > >>> > >>> > > I am have a fairly large dataset that I store in HDF5 and >>> access >>> > >>> using >>> > >>> > > PyTables. One operation I need to do on this dataset are >>> pairwise >>> > >>> > > comparisons between each of the elements. This requires 2 >>> loops, >>> > one >>> > >>> to >>> > >>> > > iterate over each element, and an inner loop to iterate over >>> every >>> > >>> other >>> > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. >>> > >>> > > >>> > >>> > > For fairly small sets I found it to be faster to dump the >>> contents >>> > >>> into a >>> > >>> > > multdimensional numpy array and then do my iteration. I run >>> into >>> > >>> problems >>> > >>> > > with large sets because of memory issues and need to access >>> each >>> > >>> element >>> > >>> > of >>> > >>> > > the dataset at run time. >>> > >>> > > >>> > >>> > > Putting the elements into an array gives me about 600 >>> comparisons >>> > per >>> > >>> > > second, while operating on hdf5 data itself gives me about 300 >>> > >>> > comparisons >>> > >>> > > per second. >>> > >>> > > >>> > >>> > > Is there a way to speed this process up? >>> > >>> > > >>> > >>> > > Example follows (this is not my real code, just an example): >>> > >>> > > >>> > >>> > > *Small Set*: >>> > >>> > > >>> > >>> > > >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >>> > >>> > > data = f.root.data >>> > >>> > > >>> > >>> > > N_elements = len(data) >>> > >>> > > elements = np.empty((N_irises, 1e5)) >>> > >>> > > >>> > >>> > > for ii, d in enumerate(data): >>> > >>> > > elements[ii] = data['element'] >>> > >>> > > >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in >>> xrange(N_elements): >>> > >>> > > for jj in xrange(ii+1, N_elements): >>> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >>> > >>> > > >>> > >>> > > *Large Set*: >>> > >>> > > >>> > >>> > > >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >>> > >>> > > data = f.root.data >>> > >>> > > >>> > >>> > > N_elements = len(data) >>> > >>> > > >>> > >>> > > D = np.empty((N_irises, N_irises)) >>> > >>> > > for ii in xrange(N_elements): >>> > >>> > > for jj in xrange(ii+1, N_elements): >>> > >>> > > D[ii, jj] = compare(data['element'][ii], >>> > >>> > data['element'][jj]) >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > >>> > >>> > >>> >>> > >>> ------------------------------------------------------------------------------ >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >>> HTML5, >>> > CSS, >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> > >>> current >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >>> Microsoft >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >>> > >>> > > _______________________________________________ >>> > >>> > > Pytables-users mailing list >>> > >>> > > Pyt...@li... >>> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > > >>> > >>> > > >>> > >>> > -------------- next part -------------- >>> > >>> > An HTML attachment was scrubbed... >>> > >>> > >>> > >>> > ------------------------------ >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> > >>> ------------------------------------------------------------------------------ >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>> CSS, >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> > current >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >>> Microsoft >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >>> > >>> > >>> > >>> > ------------------------------ >>> > >>> > >>> > >>> > _______________________________________________ >>> > >>> > Pytables-users mailing list >>> > >>> > Pyt...@li... >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > >>> > >>> > >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >>> > >>> > ********************************************* >>> > >>> > >>> > >>> -------------- next part -------------- >>> > >>> An HTML attachment was scrubbed... >>> > >>> >>> > >>> ------------------------------ >>> > >>> >>> > >>> Message: 2 >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >>> > >>> From: David Reed <dav...@gm...> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue >>> 3 >>> > >>> To: pyt...@li... >>> > >>> Message-ID: >>> > >>> < >>> > >>> CAM...@ma... >>> > >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >>> > >>> >>> > >>> Thanks a lot for the help so far guys! >>> > >>> >>> > >>> Looking at itertools, I found what I believe to be the perfect >>> function >>> > >>> for >>> > >>> what I need, itertools.combinations. This appears to be a valid >>> > >>> replacement >>> > >>> to the method proposed. >>> > >>> >>> > >>> There is a small problem that I didn't mention is that my compare >>> > >>> function >>> > >>> actually takes as inputs 2 columns from the table. Like so: >>> > >>> >>> > >>> D = np.empty((N_irises, N_irises)) >>> > >>> for ii in xrange(N_elements): >>> > >>> for jj in xrange(ii+1, N_elements): >>> > >>> D[ii, jj] = compare(data['element1'][ii], >>> > >>> data['element1'][jj],data['element2'][ii], >>> > >>> data['element2'][jj]) >>> > >>> >>> > >>> Is there an efficient way of using itertools with this structure? >>> > >>> >>> > >>> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < >>> > >>> pyt...@li...> wrote: >>> > >>> >>> > >>> > Send Pytables-users mailing list submissions to >>> > >>> > pyt...@li... >>> > >>> > >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >>> > >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > or, via email, send a message with subject or body 'help' to >>> > >>> > pyt...@li... >>> > >>> > >>> > >>> > You can reach the person managing the list at >>> > >>> > pyt...@li... >>> > >>> > >>> > >>> > When replying, please edit your Subject line so it is more >>> specific >>> > >>> > than "Re: Contents of Pytables-users digest..." >>> > >>> > >>> > >>> > >>> > >>> > Today's Topics: >>> > >>> > >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >>> > >>> > >>> > >>> > >>> > >>> > >>> > ---------------------------------------------------------------------- >>> > >>> > >>> > >>> > Message: 1 >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >>> > >>> > From: Josh Ayers <jos...@gm...> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >>> PyTables >>> > >>> > To: Discussion list for PyTables >>> > >>> > <pyt...@li...> >>> > >>> > Message-ID: >>> > >>> > < >>> > >>> > >>> CAC...@ma...> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >>> > >>> > >>> > >>> > David, >>> > >>> > >>> > >>> > The change in issue 27 was only for iteration over a >>> tables.Column >>> > >>> > instance. To use it, tweak Anthony's code as follows. This will >>> > >>> iterate >>> > >>> > over the "element" column, as in your original example. >>> > >>> > >>> > >>> > Note also that this will only work with the development version >>> of >>> > >>> PyTables >>> > >>> > available on github. It will be very slow using the released >>> v2.4.0. >>> > >>> > >>> > >>> > >>> > >>> > from itertools import izip >>> > >>> > >>> > >>> > with tb.openFile(...) as f: >>> > >>> > data = f.root.data.cols.element >>> > >>> > data_i = iter(data) >>> > >>> > data_j = iter(data) >>> > >>> > data_i.next() # throw the first value away >>> > >>> > for i, j in izip(data_i, data_j): >>> > >>> > compare(i, j) >>> > >>> > >>> > >>> > >>> > >>> > Hope that helps, >>> > >>> > Josh >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >>> sc...@gm...> >>> > >>> wrote: >>> > >>> > >>> > >>> > > HI David, >>> > >>> > > >>> > >>> > > Tables and table column iteration have been overhauled fairly >>> > >>> recently >>> > >>> > > [1]. So you might try creating two iterators, offset by one, >>> and >>> > >>> then >>> > >>> > > doing the comparison. I am hacking this out super quick so >>> please >>> > >>> > forgive >>> > >>> > > me: >>> > >>> > > >>> > >>> > > from itertools import izip >>> > >>> > > >>> > >>> > > with tb.openFile(...) as f: >>> > >>> > > data = f.root.data >>> > >>> > > data_i = iter(data) >>> > >>> > > data_j = iter(data) >>> > >>> > > data_i.next() # throw the first value away >>> > >>> > > for i, j in izip(data_i, data_j): >>> > >>> > > compare(i, j) >>> > >>> > > >>> > >>> > > You get the idea ;) >>> > >>> > > >>> > >>> > > Be Well >>> > >>> > > Anthony >>> > >>> > > >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >>> > >>> > > >>> > >>> > > >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >>> dav...@gm... >>> > > >>> > >>> > wrote: >>> > >>> > > >>> > >>> > >> I was hoping someone could help me out here. >>> > >>> > >> >>> > >>> > >> This is from a post I put up on StackOverflow, >>> > >>> > >> >>> > >>> > >> I am have a fairly large dataset that I store in HDF5 and >>> access >>> > >>> using >>> > >>> > >> PyTables. One operation I need to do on this dataset are >>> pairwise >>> > >>> > >> comparisons between each of the elements. This requires 2 >>> loops, >>> > >>> one to >>> > >>> > >> iterate over each element, and an inner loop to iterate over >>> every >>> > >>> other >>> > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. >>> > >>> > >> >>> > >>> > >> For fairly small sets I found it to be faster to dump the >>> contents >>> > >>> into >>> > >>> > a >>> > >>> > >> multdimensional numpy array and then do my iteration. I run >>> into >>> > >>> > problems >>> > >>> > >> with large sets because of memory issues and need to access >>> each >>> > >>> > element of >>> > >>> > >> the dataset at run time. >>> > >>> > >> >>> > >>> > >> Putting the elements into an array gives me about 600 >>> comparisons >>> > >>> per >>> > >>> > >> second, while operating on hdf5 data itself gives me about 300 >>> > >>> > comparisons >>> > >>> > >> per second. >>> > >>> > >> >>> > >>> > >> Is there a way to speed this process up? >>> > >>> > >> >>> > >>> > >> Example follows (this is not my real code, just an example): >>> > >>> > >> >>> > >>> > >> *Small Set*: >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: >>> > >>> > >> data = f.root.data >>> > >>> > >> >>> > >>> > >> N_elements = len(data) >>> > >>> > >> elements = np.empty((N_irises, 1e5)) >>> > >>> > >> >>> > >>> > >> for ii, d in enumerate(data): >>> > >>> > >> elements[ii] = data['element'] >>> > >>> > >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in >>> xrange(N_elements): >>> > >>> > >> for jj in xrange(ii+1, N_elements): >>> > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >>> > >>> > >> >>> > >>> > >> *Large Set*: >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: >>> > >>> > >> data = f.root.data >>> > >>> > >> >>> > >>> > >> N_elements = len(data) >>> > >>> > >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) >>> > >>> > >> for ii in xrange(N_elements): >>> > >>> > >> for jj in xrange(ii+1, N_elements): >>> > >>> > >> D[ii, jj] = compare(data['element'][ii], >>> > >>> > data['element'][jj]) >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >> >>> > >>> > >>> > >>> >>> > >>> ------------------------------------------------------------------------------ >>> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >>> HTML5, >>> > >>> CSS, >>> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >>> skills >>> > >>> current >>> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >>> Microsoft >>> > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >>> > >>> > >> http://p.sf.net/sfu/learnmore_122712 >>> > >>> > >> _______________________________________________ >>> > >>> > >> Pytables-users mailing list >>> > >>> > >> Pyt...@li... >>> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > >> >>> > >>> > >> >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > >>> > >>> > >>> >>> > >>> ------------------------------------------------------------------------------ >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >>> HTML5, >>> > CSS, >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> > >>> current >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >>> Microsoft >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >>> > >>> > > _______________________________________________ >>> > >>> > > Pytables-users mailing list >>> > >>> > > Pyt...@li... >>> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > > >>> > >>> > > >>> > >>> > -------------- next part -------------- >>> > >>> > An HTML attachment was scrubbed... >>> > >>> > >>> > >>> > ------------------------------ >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> > >>> ------------------------------------------------------------------------------ >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>> CSS, >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> > current >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >>> Microsoft >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >>> > >>> > >>> > >>> > ------------------------------ >>> > >>> > >>> > >>> > _______________________________________________ >>> > >>> > Pytables-users mailing list >>> > >>> > Pyt...@li... >>> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > >>> > >>> > >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 >>> > >>> > ********************************************* >>> > >>> > >>> > >>> -------------- next part -------------- >>> > >>> An HTML attachment was scrubbed... >>> > >>> >>> > >>> ------------------------------ >>> > >>> >>> > >>> >>> > >>> >>> > >>> ------------------------------------------------------------------------------ >>> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>> CSS, >>> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> current >>> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > >>> MVPs and experts. ON SALE this month only -- learn more at: >>> > >>> http://p.sf.net/sfu/learnmore_122712 >>> > >>> >>> > >>> ------------------------------ >>> > >>> >>> > >>> _______________________________________________ >>> > >>> Pytables-users mailing list >>> > >>> Pyt...@li... >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> >>> > >>> >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 >>> > >>> ********************************************* >>> > >>> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >>> ------------------------------------------------------------------------------ >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>> CSS, >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> current >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >>> > >> http://p.sf.net/sfu/learnmore_122712 >>> > >> _______________________________________________ >>> > >> Pytables-users mailing list >>> > >> Pyt...@li... >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >> >>> > >> >>> > > >>> > > >>> > > >>> > >>> ------------------------------------------------------------------------------ >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> current >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > > MVPs and experts. ON SALE this month only -- learn more at: >>> > > http://p.sf.net/sfu/learnmore_122712 >>> > > _______________________________________________ >>> > > Pytables-users mailing list >>> > > Pyt...@li... >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > > >>> > > >>> > -------------- next part -------------- >>> > An HTML attachment was scrubbed... >>> > >>> > ------------------------------ >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > MVPs and experts. ON SALE this month only -- learn more at: >>> > http://p.sf.net/sfu/learnmore_122712 >>> > >>> > ------------------------------ >>> > >>> > _______________________________________________ >>> > Pytables-users mailing list >>> > Pyt...@li... >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 8 >>> > ********************************************* >>> > >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> >>> ------------------------------ >>> >>> >>> ------------------------------------------------------------------------------ >>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and >>> much more. Get web development skills now with LearnDevNow - >>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. >>> SALE $99.99 this month only -- learn more at: >>> http://p.sf.net/sfu/learnmore_122812 >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >>> End of Pytables-users Digest, Vol 80, Issue 9 >>> ********************************************* >>> >> >> > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: David R. <dav...@gm...> - 2013-02-01 12:16:04
|
I'm still having problems with this one. I can't tell if this something dumb Im doing with itertools, or if its something in pytables. Would appreciate any help. Thanks On Wed, Jan 30, 2013 at 5:00 PM, David Reed <dav...@gm...> wrote: > I think I have to reopen this issue. I have been running fine for awhile > using the combinations method from itertools, but have recently run into a > memory since I have recently quadrupled the size of the hdf file. > > Here is my code again: > > from itertools import combinations, izip > with tb.openFile(h5_all, 'r') as f: > irises = f.root.irises > > templates = f.root.irises.cols.templates > masks = f.root.irises.cols.masks1 > > N_irises = len(irises) > index = np.ones((20 * 480), np.bool) > > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > D = np.empty((N_irises, N_irises)) > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > range(N_irises)), 2): > # print ii > D[ii, jj] = ham_dist( > t1[8, index], > t2[:, index], > m1[8, index], > m2[:, index], > ) > > And here is the error: > > In [10]: get_hd3() > 10669890 Comparisons > --------------------------------------------------------------------------- > MemoryError Traceback (most recent call last) > <ipython-input-10-cfb255ce7bd1> in <module>() > ----> 1 get_hd3() > > > 118 print '%i Comparisons' % (N_irises*(N_irises - > 1)/2) > 119 D = np.empty((N_irises, N_irises)) > --> 120 for (t1, m1, ii), (t2, m2, jj) in > combinations(izip(temp > lates, masks, range(N_irises)), 2): > 121 # print ii > 122 D[ii, jj] = ham_dist( > > c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) > 3274 for start_row in xrange(0, len(self), nrowsinbuf): > 3275 end_row = min([start_row + nrowsinbuf, max_row]) > -> 3276 buf = table.read(start_row, end_row, 1, > field=self.pathname) > > 3277 for row in buf: > 3278 yield row > > c:\python27\lib\site-packages\tables\table.pyc in read(self, start, stop, > step, > field) > 1772 (start, stop, step) = self._processRangeRead(start, stop, > step) > 1773 > -> 1774 arr = self._read(start, stop, step, field) > 1775 return internal_to_flavor(arr, self.flavor) > 1776 > > c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, stop, > step, > field) > 1719 if field: > 1720 # Create a container for the results > -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) > 1722 else: > 1723 # Recarray case > > MemoryError: > > c:\python27\lib\site-packages\tables\table.py(1721)_read() > 1720 # Create a container for the results > -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) > 1722 else: > > Also, if you guys see any performance problems in my code, please let me > know. > > Thank you so much for the help. > > -Dave > > > On Fri, Jan 4, 2013 at 8:57 AM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> From: David Reed <dav...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8 >> To: pyt...@li... >> Message-ID: >> < >> CAM...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> I can't thank you guys enough for the help. I was able to add the >> __iter__ >> function to the table.py file and everything seems to be working great! >> I'm not quite as fast as I was with iterating right of a matrix but >> pretty >> close. I was at 555 comparisons per second, and now im at 420. >> >> I handled the problem I mentioned earlier by doing this, and it seems to >> work great: >> >> A = f.root.data.cols.A >> B = f.root.data.cols.B >> >> D = np.empty((len(A), len(A)) >> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), >> 2): >> D[ii, jj] = compare(a1, a2, b1, b2) >> >> Again, thanks a lot. >> >> -Dave >> >> >> On Thu, Jan 3, 2013 at 6:31 PM, < >> pyt...@li...> wrote: >> >> > Send Pytables-users mailing list submissions to >> > pyt...@li... >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > or, via email, send a message with subject or body 'help' to >> > pyt...@li... >> > >> > You can reach the person managing the list at >> > pyt...@li... >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Pytables-users digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) >> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> > From: Anthony Scopatz <sc...@gm...> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> > Gz...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> >> wrote: >> > >> > > Thanks a lot for the help so far guys! >> > > >> > > Looking at itertools, I found what I believe to be the perfect >> function >> > > for what I need, itertools.combinations. This appears to be a valid >> > > replacement to the method proposed. >> > > >> > >> > Yes, combinations is awesome! >> > >> > >> > > >> > > There is a small problem that I didn't mention is that my compare >> > function >> > > actually takes as inputs 2 columns from the table. Like so: >> > > >> > > D = np.empty((N_irises, N_irises)) >> > > for ii in xrange(N_elements): >> > > for jj in xrange(ii+1, N_elements): >> > > D[ii, jj] = compare(data['element1'][ii], >> > data['element1'][jj],data['element2'][ii], >> > > data['element2'][jj]) >> > > >> > > Is there an efficient way of using itertools with this structure? >> > > >> > >> > You can always make two other iterators for each column. Since you have >> > two columns you would have 4 iterators. I am not sure how fast this is >> > going to be but I am confident that there is definitely a way to do >> this in >> > one for-loop, which is going to be way faster than nested loops. >> > >> > Be Well >> > Anthony >> > >> > >> > > >> > > >> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> > > pyt...@li...> wrote: >> > > >> > >> Send Pytables-users mailing list submissions to >> > >> pyt...@li... >> > >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> or, via email, send a message with subject or body 'help' to >> > >> pyt...@li... >> > >> >> > >> You can reach the person managing the list at >> > >> pyt...@li... >> > >> >> > >> When replying, please edit your Subject line so it is more specific >> > >> than "Re: Contents of Pytables-users digest..." >> > >> >> > >> >> > >> Today's Topics: >> > >> >> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> > >> >> > >> >> > >> >> ---------------------------------------------------------------------- >> > >> >> > >> Message: 1 >> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> > >> From: Josh Ayers <jos...@gm...> >> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> > >> To: Discussion list for PyTables >> > >> <pyt...@li...> >> > >> Message-ID: >> > >> < >> > >> CAC...@ma...> >> > >> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> > >> David, >> > >> >> > >> The change in issue 27 was only for iteration over a tables.Column >> > >> instance. To use it, tweak Anthony's code as follows. This will >> > iterate >> > >> over the "element" column, as in your original example. >> > >> >> > >> Note also that this will only work with the development version of >> > >> PyTables >> > >> available on github. It will be very slow using the released v2.4.0. >> > >> >> > >> >> > >> from itertools import izip >> > >> >> > >> with tb.openFile(...) as f: >> > >> data = f.root.data.cols.element >> > >> data_i = iter(data) >> > >> data_j = iter(data) >> > >> data_i.next() # throw the first value away >> > >> for i, j in izip(data_i, data_j): >> > >> compare(i, j) >> > >> >> > >> >> > >> Hope that helps, >> > >> Josh >> > >> >> > >> >> > >> >> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> > >> wrote: >> > >> >> > >> > HI David, >> > >> > >> > >> > Tables and table column iteration have been overhauled fairly >> recently >> > >> > [1]. So you might try creating two iterators, offset by one, and >> then >> > >> > doing the comparison. I am hacking this out super quick so please >> > >> forgive >> > >> > me: >> > >> > >> > >> > from itertools import izip >> > >> > >> > >> > with tb.openFile(...) as f: >> > >> > data = f.root.data >> > >> > data_i = iter(data) >> > >> > data_j = iter(data) >> > >> > data_i.next() # throw the first value away >> > >> > for i, j in izip(data_i, data_j): >> > >> > compare(i, j) >> > >> > >> > >> > You get the idea ;) >> > >> > >> > >> > Be Well >> > >> > Anthony >> > >> > >> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> > >> > >> > >> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm... >> > >> > >> wrote: >> > >> > >> > >> >> I was hoping someone could help me out here. >> > >> >> >> > >> >> This is from a post I put up on StackOverflow, >> > >> >> >> > >> >> I am have a fairly large dataset that I store in HDF5 and access >> > using >> > >> >> PyTables. One operation I need to do on this dataset are pairwise >> > >> >> comparisons between each of the elements. This requires 2 loops, >> one >> > to >> > >> >> iterate over each element, and an inner loop to iterate over every >> > >> other >> > >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> > >> >> >> > >> >> For fairly small sets I found it to be faster to dump the contents >> > >> into a >> > >> >> multdimensional numpy array and then do my iteration. I run into >> > >> problems >> > >> >> with large sets because of memory issues and need to access each >> > >> element of >> > >> >> the dataset at run time. >> > >> >> >> > >> >> Putting the elements into an array gives me about 600 comparisons >> per >> > >> >> second, while operating on hdf5 data itself gives me about 300 >> > >> comparisons >> > >> >> per second. >> > >> >> >> > >> >> Is there a way to speed this process up? >> > >> >> >> > >> >> Example follows (this is not my real code, just an example): >> > >> >> >> > >> >> *Small Set*: >> > >> >> >> > >> >> >> > >> >> with tb.openFile(h5_file, 'r') as f: >> > >> >> data = f.root.data >> > >> >> >> > >> >> N_elements = len(data) >> > >> >> elements = np.empty((N_irises, 1e5)) >> > >> >> >> > >> >> for ii, d in enumerate(data): >> > >> >> elements[ii] = data['element'] >> > >> >> >> > >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> > >> >> for jj in xrange(ii+1, N_elements): >> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> > >> >> >> > >> >> *Large Set*: >> > >> >> >> > >> >> >> > >> >> with tb.openFile(h5_file, 'r') as f: >> > >> >> data = f.root.data >> > >> >> >> > >> >> N_elements = len(data) >> > >> >> >> > >> >> D = np.empty((N_irises, N_irises)) >> > >> >> for ii in xrange(N_elements): >> > >> >> for jj in xrange(ii+1, N_elements): >> > >> >> D[ii, jj] = compare(data['element'][ii], >> > >> data['element'][jj]) >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > current >> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> >> MVPs and experts. ON SALE this month only -- learn more at: >> > >> >> http://p.sf.net/sfu/learnmore_122712 >> > >> >> _______________________________________________ >> > >> >> Pytables-users mailing list >> > >> >> Pyt...@li... >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >> > >> >> >> > >> > >> > >> > >> > >> > >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > current >> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> > MVPs and experts. ON SALE this month only -- learn more at: >> > >> > http://p.sf.net/sfu/learnmore_122712 >> > >> > _______________________________________________ >> > >> > Pytables-users mailing list >> > >> > Pyt...@li... >> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > >> > >> > >> -------------- next part -------------- >> > >> An HTML attachment was scrubbed... >> > >> >> > >> ------------------------------ >> > >> >> > >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> MVPs and experts. ON SALE this month only -- learn more at: >> > >> http://p.sf.net/sfu/learnmore_122712 >> > >> >> > >> ------------------------------ >> > >> >> > >> _______________________________________________ >> > >> Pytables-users mailing list >> > >> Pyt...@li... >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> > >> >> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> > >> ********************************************* >> > >> >> > > >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------------ >> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > > MVPs and experts. ON SALE this month only -- learn more at: >> > > http://p.sf.net/sfu/learnmore_122712 >> > > _______________________________________________ >> > > Pytables-users mailing list >> > > Pyt...@li... >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > >> > ------------------------------ >> > >> > Message: 2 >> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> > From: Anthony Scopatz <sc...@gm...> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > < >> > CAP...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > Josh is right that you can just edit the code by hand (which works but >> > sucks). >> > >> > However, on Windows -- on the rare occasion when I also have to develop >> on >> > it -- I typically use a distribution that includes a compiler, cython, >> > hdf5, and pytables already and then I install my development version >> from >> > github OVER this. I recommend either EPD or Anaconda, though other >> > distributions listed here [1] might also work. >> > >> > Be well >> > Anthony >> > >> > 1. http://numfocus.org/projects-2/software-distributions/ >> > >> > >> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> >> wrote: >> > >> > > The change was in pure Python code, so you should be able to just >> paste >> > in >> > > the changes to your local copy. Start with the table.Column.__iter__ >> > > method (lines 3296-3310) here. >> > > >> > > >> > > >> > >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> > > >> > > It needs to be modified slightly because it uses some additional >> features >> > > that aren't available in the released version (the out=buf_slice >> argument >> > > to table.read). The following should work. >> > > >> > > def __iter__(self): >> > > table = self.table >> > > itemsize = self.dtype.itemsize >> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >> itemsize >> > > max_row = len(self) >> > > for start_row in xrange(0, len(self), nrowsinbuf): >> > > end_row = min([start_row + nrowsinbuf, max_row]) >> > > buf = table.read(start_row, end_row, 1, >> field=self.pathname) >> > > for row in buf: >> > > yield row >> > > >> > > >> > > I haven't tested this, but I think it will work. >> > > >> > > Josh >> > > >> > > >> > > >> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> >> > wrote: >> > > >> > >> I apologize if I'm starting to sound helpless, but I'm forced to >> work on >> > >> Windows 7 at work and have never had luck compiling python source >> > >> successfully. I have had to rely on precompiled binaries and now its >> > >> biting me in the butt. >> > >> >> > >> Is there any quick fix I can do to improve this iteration using >> v2.4.0? >> > >> >> > >> >> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> > >> pyt...@li...> wrote: >> > >> >> > >>> Send Pytables-users mailing list submissions to >> > >>> pyt...@li... >> > >>> >> > >>> To subscribe or unsubscribe via the World Wide Web, visit >> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> or, via email, send a message with subject or body 'help' to >> > >>> pyt...@li... >> > >>> >> > >>> You can reach the person managing the list at >> > >>> pyt...@li... >> > >>> >> > >>> When replying, please edit your Subject line so it is more specific >> > >>> than "Re: Contents of Pytables-users digest..." >> > >>> >> > >>> >> > >>> Today's Topics: >> > >>> >> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> > >>> >> > >>> >> > >>> >> ---------------------------------------------------------------------- >> > >>> >> > >>> Message: 1 >> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> > >>> From: David Reed <dav...@gm...> >> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 >> > >>> To: pyt...@li... >> > >>> Message-ID: >> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> > >>> ev...@ma...> >> > >>> Content-Type: text/plain; charset="iso-8859-1" >> > >>> >> > >>> Thanks Anthony, but unless Im missing something I don't think that >> > method >> > >>> will work since this will only be comparing the ith element with >> ith+1 >> > >>> element. I still need 2 for loops right? >> > >>> >> > >>> Using itertools might speed things up though, I've never used them >> so I >> > >>> will give it a shot and let you know how it goes. Looks like I >> need to >> > >>> download the latest release before I do that too. Thanks for the >> help. >> > >>> >> > >>> -Dave >> > >>> >> > >>> >> > >>> >> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> > >>> pyt...@li...> wrote: >> > >>> >> > >>> > Send Pytables-users mailing list submissions to >> > >>> > pyt...@li... >> > >>> > >> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > or, via email, send a message with subject or body 'help' to >> > >>> > pyt...@li... >> > >>> > >> > >>> > You can reach the person managing the list at >> > >>> > pyt...@li... >> > >>> > >> > >>> > When replying, please edit your Subject line so it is more >> specific >> > >>> > than "Re: Contents of Pytables-users digest..." >> > >>> > >> > >>> > >> > >>> > Today's Topics: >> > >>> > >> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony >> Scopatz) >> > >>> > >> > >>> > >> > >>> > >> > ---------------------------------------------------------------------- >> > >>> > >> > >>> > Message: 1 >> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> > >>> > From: Anthony Scopatz <sc...@gm...> >> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> PyTables >> > >>> > To: Discussion list for PyTables >> > >>> > <pyt...@li...> >> > >>> > Message-ID: >> > >>> > <CAPk-6T5b= >> > >>> > 1EG...@ma...> >> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >>> > >> > >>> > HI David, >> > >>> > >> > >>> > Tables and table column iteration have been overhauled fairly >> > recently >> > >>> [1]. >> > >>> > So you might try creating two iterators, offset by one, and then >> > >>> doing the >> > >>> > comparison. I am hacking this out super quick so please forgive >> me: >> > >>> > >> > >>> > from itertools import izip >> > >>> > >> > >>> > with tb.openFile(...) as f: >> > >>> > data = f.root.data >> > >>> > data_i = iter(data) >> > >>> > data_j = iter(data) >> > >>> > data_i.next() # throw the first value away >> > >>> > for i, j in izip(data_i, data_j): >> > >>> > compare(i, j) >> > >>> > >> > >>> > You get the idea ;) >> > >>> > >> > >>> > Be Well >> > >>> > Anthony >> > >>> > >> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >>> > >> > >>> > >> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> dav...@gm...> >> > >>> wrote: >> > >>> > >> > >>> > > I was hoping someone could help me out here. >> > >>> > > >> > >>> > > This is from a post I put up on StackOverflow, >> > >>> > > >> > >>> > > I am have a fairly large dataset that I store in HDF5 and access >> > >>> using >> > >>> > > PyTables. One operation I need to do on this dataset are >> pairwise >> > >>> > > comparisons between each of the elements. This requires 2 loops, >> > one >> > >>> to >> > >>> > > iterate over each element, and an inner loop to iterate over >> every >> > >>> other >> > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. >> > >>> > > >> > >>> > > For fairly small sets I found it to be faster to dump the >> contents >> > >>> into a >> > >>> > > multdimensional numpy array and then do my iteration. I run into >> > >>> problems >> > >>> > > with large sets because of memory issues and need to access each >> > >>> element >> > >>> > of >> > >>> > > the dataset at run time. >> > >>> > > >> > >>> > > Putting the elements into an array gives me about 600 >> comparisons >> > per >> > >>> > > second, while operating on hdf5 data itself gives me about 300 >> > >>> > comparisons >> > >>> > > per second. >> > >>> > > >> > >>> > > Is there a way to speed this process up? >> > >>> > > >> > >>> > > Example follows (this is not my real code, just an example): >> > >>> > > >> > >>> > > *Small Set*: >> > >>> > > >> > >>> > > >> > >>> > > with tb.openFile(h5_file, 'r') as f: >> > >>> > > data = f.root.data >> > >>> > > >> > >>> > > N_elements = len(data) >> > >>> > > elements = np.empty((N_irises, 1e5)) >> > >>> > > >> > >>> > > for ii, d in enumerate(data): >> > >>> > > elements[ii] = data['element'] >> > >>> > > >> > >>> > > D = np.empty((N_irises, N_irises)) for ii in >> xrange(N_elements): >> > >>> > > for jj in xrange(ii+1, N_elements): >> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> > >>> > > >> > >>> > > *Large Set*: >> > >>> > > >> > >>> > > >> > >>> > > with tb.openFile(h5_file, 'r') as f: >> > >>> > > data = f.root.data >> > >>> > > >> > >>> > > N_elements = len(data) >> > >>> > > >> > >>> > > D = np.empty((N_irises, N_irises)) >> > >>> > > for ii in xrange(N_elements): >> > >>> > > for jj in xrange(ii+1, N_elements): >> > >>> > > D[ii, jj] = compare(data['element'][ii], >> > >>> > data['element'][jj]) >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> > CSS, >> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > >>> current >> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> > >>> > > _______________________________________________ >> > >>> > > Pytables-users mailing list >> > >>> > > Pyt...@li... >> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > > >> > >>> > > >> > >>> > -------------- next part -------------- >> > >>> > An HTML attachment was scrubbed... >> > >>> > >> > >>> > ------------------------------ >> > >>> > >> > >>> > >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > current >> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > http://p.sf.net/sfu/learnmore_122712 >> > >>> > >> > >>> > ------------------------------ >> > >>> > >> > >>> > _______________________________________________ >> > >>> > Pytables-users mailing list >> > >>> > Pyt...@li... >> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > >> > >>> > >> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> > >>> > ********************************************* >> > >>> > >> > >>> -------------- next part -------------- >> > >>> An HTML attachment was scrubbed... >> > >>> >> > >>> ------------------------------ >> > >>> >> > >>> Message: 2 >> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> > >>> From: David Reed <dav...@gm...> >> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> > >>> To: pyt...@li... >> > >>> Message-ID: >> > >>> < >> > >>> CAM...@ma...> >> > >>> Content-Type: text/plain; charset="iso-8859-1" >> > >>> >> > >>> Thanks a lot for the help so far guys! >> > >>> >> > >>> Looking at itertools, I found what I believe to be the perfect >> function >> > >>> for >> > >>> what I need, itertools.combinations. This appears to be a valid >> > >>> replacement >> > >>> to the method proposed. >> > >>> >> > >>> There is a small problem that I didn't mention is that my compare >> > >>> function >> > >>> actually takes as inputs 2 columns from the table. Like so: >> > >>> >> > >>> D = np.empty((N_irises, N_irises)) >> > >>> for ii in xrange(N_elements): >> > >>> for jj in xrange(ii+1, N_elements): >> > >>> D[ii, jj] = compare(data['element1'][ii], >> > >>> data['element1'][jj],data['element2'][ii], >> > >>> data['element2'][jj]) >> > >>> >> > >>> Is there an efficient way of using itertools with this structure? >> > >>> >> > >>> >> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> > >>> pyt...@li...> wrote: >> > >>> >> > >>> > Send Pytables-users mailing list submissions to >> > >>> > pyt...@li... >> > >>> > >> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > or, via email, send a message with subject or body 'help' to >> > >>> > pyt...@li... >> > >>> > >> > >>> > You can reach the person managing the list at >> > >>> > pyt...@li... >> > >>> > >> > >>> > When replying, please edit your Subject line so it is more >> specific >> > >>> > than "Re: Contents of Pytables-users digest..." >> > >>> > >> > >>> > >> > >>> > Today's Topics: >> > >>> > >> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> > >>> > >> > >>> > >> > >>> > >> > ---------------------------------------------------------------------- >> > >>> > >> > >>> > Message: 1 >> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> > >>> > From: Josh Ayers <jos...@gm...> >> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> PyTables >> > >>> > To: Discussion list for PyTables >> > >>> > <pyt...@li...> >> > >>> > Message-ID: >> > >>> > < >> > >>> > >> CAC...@ma...> >> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >>> > >> > >>> > David, >> > >>> > >> > >>> > The change in issue 27 was only for iteration over a tables.Column >> > >>> > instance. To use it, tweak Anthony's code as follows. This will >> > >>> iterate >> > >>> > over the "element" column, as in your original example. >> > >>> > >> > >>> > Note also that this will only work with the development version of >> > >>> PyTables >> > >>> > available on github. It will be very slow using the released >> v2.4.0. >> > >>> > >> > >>> > >> > >>> > from itertools import izip >> > >>> > >> > >>> > with tb.openFile(...) as f: >> > >>> > data = f.root.data.cols.element >> > >>> > data_i = iter(data) >> > >>> > data_j = iter(data) >> > >>> > data_i.next() # throw the first value away >> > >>> > for i, j in izip(data_i, data_j): >> > >>> > compare(i, j) >> > >>> > >> > >>> > >> > >>> > Hope that helps, >> > >>> > Josh >> > >>> > >> > >>> > >> > >>> > >> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> sc...@gm...> >> > >>> wrote: >> > >>> > >> > >>> > > HI David, >> > >>> > > >> > >>> > > Tables and table column iteration have been overhauled fairly >> > >>> recently >> > >>> > > [1]. So you might try creating two iterators, offset by one, >> and >> > >>> then >> > >>> > > doing the comparison. I am hacking this out super quick so >> please >> > >>> > forgive >> > >>> > > me: >> > >>> > > >> > >>> > > from itertools import izip >> > >>> > > >> > >>> > > with tb.openFile(...) as f: >> > >>> > > data = f.root.data >> > >>> > > data_i = iter(data) >> > >>> > > data_j = iter(data) >> > >>> > > data_i.next() # throw the first value away >> > >>> > > for i, j in izip(data_i, data_j): >> > >>> > > compare(i, j) >> > >>> > > >> > >>> > > You get the idea ;) >> > >>> > > >> > >>> > > Be Well >> > >>> > > Anthony >> > >>> > > >> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >> > >>> > > >> > >>> > > >> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> dav...@gm... >> > > >> > >>> > wrote: >> > >>> > > >> > >>> > >> I was hoping someone could help me out here. >> > >>> > >> >> > >>> > >> This is from a post I put up on StackOverflow, >> > >>> > >> >> > >>> > >> I am have a fairly large dataset that I store in HDF5 and >> access >> > >>> using >> > >>> > >> PyTables. One operation I need to do on this dataset are >> pairwise >> > >>> > >> comparisons between each of the elements. This requires 2 >> loops, >> > >>> one to >> > >>> > >> iterate over each element, and an inner loop to iterate over >> every >> > >>> other >> > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. >> > >>> > >> >> > >>> > >> For fairly small sets I found it to be faster to dump the >> contents >> > >>> into >> > >>> > a >> > >>> > >> multdimensional numpy array and then do my iteration. I run >> into >> > >>> > problems >> > >>> > >> with large sets because of memory issues and need to access >> each >> > >>> > element of >> > >>> > >> the dataset at run time. >> > >>> > >> >> > >>> > >> Putting the elements into an array gives me about 600 >> comparisons >> > >>> per >> > >>> > >> second, while operating on hdf5 data itself gives me about 300 >> > >>> > comparisons >> > >>> > >> per second. >> > >>> > >> >> > >>> > >> Is there a way to speed this process up? >> > >>> > >> >> > >>> > >> Example follows (this is not my real code, just an example): >> > >>> > >> >> > >>> > >> *Small Set*: >> > >>> > >> >> > >>> > >> >> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> > >>> > >> data = f.root.data >> > >>> > >> >> > >>> > >> N_elements = len(data) >> > >>> > >> elements = np.empty((N_irises, 1e5)) >> > >>> > >> >> > >>> > >> for ii, d in enumerate(data): >> > >>> > >> elements[ii] = data['element'] >> > >>> > >> >> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in >> xrange(N_elements): >> > >>> > >> for jj in xrange(ii+1, N_elements): >> > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> > >>> > >> >> > >>> > >> *Large Set*: >> > >>> > >> >> > >>> > >> >> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> > >>> > >> data = f.root.data >> > >>> > >> >> > >>> > >> N_elements = len(data) >> > >>> > >> >> > >>> > >> D = np.empty((N_irises, N_irises)) >> > >>> > >> for ii in xrange(N_elements): >> > >>> > >> for jj in xrange(ii+1, N_elements): >> > >>> > >> D[ii, jj] = compare(data['element'][ii], >> > >>> > data['element'][jj]) >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> > >>> CSS, >> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > >>> current >> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > >> http://p.sf.net/sfu/learnmore_122712 >> > >>> > >> _______________________________________________ >> > >>> > >> Pytables-users mailing list >> > >>> > >> Pyt...@li... >> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > >> >> > >>> > >> >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> > CSS, >> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > >>> current >> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> > >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> > >>> > > _______________________________________________ >> > >>> > > Pytables-users mailing list >> > >>> > > Pyt...@li... >> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > > >> > >>> > > >> > >>> > -------------- next part -------------- >> > >>> > An HTML attachment was scrubbed... >> > >>> > >> > >>> > ------------------------------ >> > >>> > >> > >>> > >> > >>> > >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> > current >> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> > >>> > http://p.sf.net/sfu/learnmore_122712 >> > >>> > >> > >>> > ------------------------------ >> > >>> > >> > >>> > _______________________________________________ >> > >>> > Pytables-users mailing list >> > >>> > Pyt...@li... >> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> > >> > >>> > >> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 >> > >>> > ********************************************* >> > >>> > >> > >>> -------------- next part -------------- >> > >>> An HTML attachment was scrubbed... >> > >>> >> > >>> ------------------------------ >> > >>> >> > >>> >> > >>> >> > >> ------------------------------------------------------------------------------ >> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >>> MVPs and experts. ON SALE this month only -- learn more at: >> > >>> http://p.sf.net/sfu/learnmore_122712 >> > >>> >> > >>> ------------------------------ >> > >>> >> > >>> _______________________________________________ >> > >>> Pytables-users mailing list >> > >>> Pyt...@li... >> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >>> >> > >>> >> > >>> End of Pytables-users Digest, Vol 80, Issue 4 >> > >>> ********************************************* >> > >>> >> > >> >> > >> >> > >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> MVPs and experts. ON SALE this month only -- learn more at: >> > >> http://p.sf.net/sfu/learnmore_122712 >> > >> _______________________________________________ >> > >> Pytables-users mailing list >> > >> Pyt...@li... >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> > >> >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------------ >> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > > MVPs and experts. ON SALE this month only -- learn more at: >> > > http://p.sf.net/sfu/learnmore_122712 >> > > _______________________________________________ >> > > Pytables-users mailing list >> > > Pyt...@li... >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > >> > ------------------------------ >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > >> > ------------------------------ >> > >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > End of Pytables-users Digest, Vol 80, Issue 8 >> > ********************************************* >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and >> much more. Get web development skills now with LearnDevNow - >> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. >> SALE $99.99 this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122812 >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 80, Issue 9 >> ********************************************* >> > > |
From: David R. <dav...@gm...> - 2013-01-30 22:00:53
|
I think I have to reopen this issue. I have been running fine for awhile using the combinations method from itertools, but have recently run into a memory since I have recently quadrupled the size of the hdf file. Here is my code again: from itertools import combinations, izip with tb.openFile(h5_all, 'r') as f: irises = f.root.irises templates = f.root.irises.cols.templates masks = f.root.irises.cols.masks1 N_irises = len(irises) index = np.ones((20 * 480), np.bool) print '%i Comparisons' % (N_irises*(N_irises - 1)/2) D = np.empty((N_irises, N_irises)) for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, range(N_irises)), 2): # print ii D[ii, jj] = ham_dist( t1[8, index], t2[:, index], m1[8, index], m2[:, index], ) And here is the error: In [10]: get_hd3() 10669890 Comparisons --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) <ipython-input-10-cfb255ce7bd1> in <module>() ----> 1 get_hd3() 118 print '%i Comparisons' % (N_irises*(N_irises - 1)/2) 119 D = np.empty((N_irises, N_irises)) --> 120 for (t1, m1, ii), (t2, m2, jj) in combinations(izip(temp lates, masks, range(N_irises)), 2): 121 # print ii 122 D[ii, jj] = ham_dist( c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) 3274 for start_row in xrange(0, len(self), nrowsinbuf): 3275 end_row = min([start_row + nrowsinbuf, max_row]) -> 3276 buf = table.read(start_row, end_row, 1, field=self.pathname) 3277 for row in buf: 3278 yield row c:\python27\lib\site-packages\tables\table.pyc in read(self, start, stop, step, field) 1772 (start, stop, step) = self._processRangeRead(start, stop, step) 1773 -> 1774 arr = self._read(start, stop, step, field) 1775 return internal_to_flavor(arr, self.flavor) 1776 c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, stop, step, field) 1719 if field: 1720 # Create a container for the results -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) 1722 else: 1723 # Recarray case MemoryError: > c:\python27\lib\site-packages\tables\table.py(1721)_read() 1720 # Create a container for the results -> 1721 result = numpy.empty(shape=nrows, dtype=dtypeField) 1722 else: Also, if you guys see any performance problems in my code, please let me know. Thank you so much for the help. -Dave On Fri, Jan 4, 2013 at 8:57 AM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 4 Jan 2013 08:56:28 -0500 > From: David Reed <dav...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8 > To: pyt...@li... > Message-ID: > < > CAM...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > I can't thank you guys enough for the help. I was able to add the __iter__ > function to the table.py file and everything seems to be working great! > I'm not quite as fast as I was with iterating right of a matrix but pretty > close. I was at 555 comparisons per second, and now im at 420. > > I handled the problem I mentioned earlier by doing this, and it seems to > work great: > > A = f.root.data.cols.A > B = f.root.data.cols.B > > D = np.empty((len(A), len(A)) > for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), > 2): > D[ii, jj] = compare(a1, a2, b1, b2) > > Again, thanks a lot. > > -Dave > > > On Thu, Jan 3, 2013 at 6:31 PM, < > pyt...@li...> wrote: > > > Send Pytables-users mailing list submissions to > > pyt...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > or, via email, send a message with subject or body 'help' to > > pyt...@li... > > > > You can reach the person managing the list at > > pyt...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Pytables-users digest..." > > > > > > Today's Topics: > > > > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) > > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 3 Jan 2013 17:26:55 -0600 > > From: Anthony Scopatz <sc...@gm...> > > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > > Gz...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> > wrote: > > > > > Thanks a lot for the help so far guys! > > > > > > Looking at itertools, I found what I believe to be the perfect function > > > for what I need, itertools.combinations. This appears to be a valid > > > replacement to the method proposed. > > > > > > > Yes, combinations is awesome! > > > > > > > > > > There is a small problem that I didn't mention is that my compare > > function > > > actually takes as inputs 2 columns from the table. Like so: > > > > > > D = np.empty((N_irises, N_irises)) > > > for ii in xrange(N_elements): > > > for jj in xrange(ii+1, N_elements): > > > D[ii, jj] = compare(data['element1'][ii], > > data['element1'][jj],data['element2'][ii], > > > data['element2'][jj]) > > > > > > Is there an efficient way of using itertools with this structure? > > > > > > > You can always make two other iterators for each column. Since you have > > two columns you would have 4 iterators. I am not sure how fast this is > > going to be but I am confident that there is definitely a way to do this > in > > one for-loop, which is going to be way faster than nested loops. > > > > Be Well > > Anthony > > > > > > > > > > > > > On Thu, Jan 3, 2013 at 1:29 PM, < > > > pyt...@li...> wrote: > > > > > >> Send Pytables-users mailing list submissions to > > >> pyt...@li... > > >> > > >> To subscribe or unsubscribe via the World Wide Web, visit > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> or, via email, send a message with subject or body 'help' to > > >> pyt...@li... > > >> > > >> You can reach the person managing the list at > > >> pyt...@li... > > >> > > >> When replying, please edit your Subject line so it is more specific > > >> than "Re: Contents of Pytables-users digest..." > > >> > > >> > > >> Today's Topics: > > >> > > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > > >> > > >> > > >> ---------------------------------------------------------------------- > > >> > > >> Message: 1 > > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > > >> From: Josh Ayers <jos...@gm...> > > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > > >> To: Discussion list for PyTables > > >> <pyt...@li...> > > >> Message-ID: > > >> < > > >> CAC...@ma...> > > >> Content-Type: text/plain; charset="iso-8859-1" > > >> > > >> David, > > >> > > >> The change in issue 27 was only for iteration over a tables.Column > > >> instance. To use it, tweak Anthony's code as follows. This will > > iterate > > >> over the "element" column, as in your original example. > > >> > > >> Note also that this will only work with the development version of > > >> PyTables > > >> available on github. It will be very slow using the released v2.4.0. > > >> > > >> > > >> from itertools import izip > > >> > > >> with tb.openFile(...) as f: > > >> data = f.root.data.cols.element > > >> data_i = iter(data) > > >> data_j = iter(data) > > >> data_i.next() # throw the first value away > > >> for i, j in izip(data_i, data_j): > > >> compare(i, j) > > >> > > >> > > >> Hope that helps, > > >> Josh > > >> > > >> > > >> > > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> > > >> wrote: > > >> > > >> > HI David, > > >> > > > >> > Tables and table column iteration have been overhauled fairly > recently > > >> > [1]. So you might try creating two iterators, offset by one, and > then > > >> > doing the comparison. I am hacking this out super quick so please > > >> forgive > > >> > me: > > >> > > > >> > from itertools import izip > > >> > > > >> > with tb.openFile(...) as f: > > >> > data = f.root.data > > >> > data_i = iter(data) > > >> > data_j = iter(data) > > >> > data_i.next() # throw the first value away > > >> > for i, j in izip(data_i, data_j): > > >> > compare(i, j) > > >> > > > >> > You get the idea ;) > > >> > > > >> > Be Well > > >> > Anthony > > >> > > > >> > 1. https://github.com/PyTables/PyTables/issues/27 > > >> > > > >> > > > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > > >> wrote: > > >> > > > >> >> I was hoping someone could help me out here. > > >> >> > > >> >> This is from a post I put up on StackOverflow, > > >> >> > > >> >> I am have a fairly large dataset that I store in HDF5 and access > > using > > >> >> PyTables. One operation I need to do on this dataset are pairwise > > >> >> comparisons between each of the elements. This requires 2 loops, > one > > to > > >> >> iterate over each element, and an inner loop to iterate over every > > >> other > > >> >> element. This operation thus looks at N(N-1)/2 comparisons. > > >> >> > > >> >> For fairly small sets I found it to be faster to dump the contents > > >> into a > > >> >> multdimensional numpy array and then do my iteration. I run into > > >> problems > > >> >> with large sets because of memory issues and need to access each > > >> element of > > >> >> the dataset at run time. > > >> >> > > >> >> Putting the elements into an array gives me about 600 comparisons > per > > >> >> second, while operating on hdf5 data itself gives me about 300 > > >> comparisons > > >> >> per second. > > >> >> > > >> >> Is there a way to speed this process up? > > >> >> > > >> >> Example follows (this is not my real code, just an example): > > >> >> > > >> >> *Small Set*: > > >> >> > > >> >> > > >> >> with tb.openFile(h5_file, 'r') as f: > > >> >> data = f.root.data > > >> >> > > >> >> N_elements = len(data) > > >> >> elements = np.empty((N_irises, 1e5)) > > >> >> > > >> >> for ii, d in enumerate(data): > > >> >> elements[ii] = data['element'] > > >> >> > > >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > >> >> for jj in xrange(ii+1, N_elements): > > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > > >> >> > > >> >> *Large Set*: > > >> >> > > >> >> > > >> >> with tb.openFile(h5_file, 'r') as f: > > >> >> data = f.root.data > > >> >> > > >> >> N_elements = len(data) > > >> >> > > >> >> D = np.empty((N_irises, N_irises)) > > >> >> for ii in xrange(N_elements): > > >> >> for jj in xrange(ii+1, N_elements): > > >> >> D[ii, jj] = compare(data['element'][ii], > > >> data['element'][jj]) > > >> >> > > >> >> > > >> >> > > >> >> > > >> > > > ------------------------------------------------------------------------------ > > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> >> MVPs and experts. ON SALE this month only -- learn more at: > > >> >> http://p.sf.net/sfu/learnmore_122712 > > >> >> _______________________________________________ > > >> >> Pytables-users mailing list > > >> >> Pyt...@li... > > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> >> > > >> >> > > >> > > > >> > > > >> > > > >> > > > ------------------------------------------------------------------------------ > > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> > MVPs and experts. ON SALE this month only -- learn more at: > > >> > http://p.sf.net/sfu/learnmore_122712 > > >> > _______________________________________________ > > >> > Pytables-users mailing list > > >> > Pyt...@li... > > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> > > > >> > > > >> -------------- next part -------------- > > >> An HTML attachment was scrubbed... > > >> > > >> ------------------------------ > > >> > > >> > > >> > > > ------------------------------------------------------------------------------ > > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> MVPs and experts. ON SALE this month only -- learn more at: > > >> http://p.sf.net/sfu/learnmore_122712 > > >> > > >> ------------------------------ > > >> > > >> _______________________________________________ > > >> Pytables-users mailing list > > >> Pyt...@li... > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> > > >> > > >> End of Pytables-users Digest, Vol 80, Issue 3 > > >> ********************************************* > > >> > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > > MVPs and experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Pytables-users mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 3 Jan 2013 17:30:59 -0600 > > From: Anthony Scopatz <sc...@gm...> > > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > < > > CAP...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > Josh is right that you can just edit the code by hand (which works but > > sucks). > > > > However, on Windows -- on the rare occasion when I also have to develop > on > > it -- I typically use a distribution that includes a compiler, cython, > > hdf5, and pytables already and then I install my development version from > > github OVER this. I recommend either EPD or Anaconda, though other > > distributions listed here [1] might also work. > > > > Be well > > Anthony > > > > 1. http://numfocus.org/projects-2/software-distributions/ > > > > > > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote: > > > > > The change was in pure Python code, so you should be able to just paste > > in > > > the changes to your local copy. Start with the table.Column.__iter__ > > > method (lines 3296-3310) here. > > > > > > > > > > > > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > > > > > > It needs to be modified slightly because it uses some additional > features > > > that aren't available in the released version (the out=buf_slice > argument > > > to table.read). The following should work. > > > > > > def __iter__(self): > > > table = self.table > > > itemsize = self.dtype.itemsize > > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize > > > max_row = len(self) > > > for start_row in xrange(0, len(self), nrowsinbuf): > > > end_row = min([start_row + nrowsinbuf, max_row]) > > > buf = table.read(start_row, end_row, 1, > field=self.pathname) > > > for row in buf: > > > yield row > > > > > > > > > I haven't tested this, but I think it will work. > > > > > > Josh > > > > > > > > > > > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> > > wrote: > > > > > >> I apologize if I'm starting to sound helpless, but I'm forced to work > on > > >> Windows 7 at work and have never had luck compiling python source > > >> successfully. I have had to rely on precompiled binaries and now its > > >> biting me in the butt. > > >> > > >> Is there any quick fix I can do to improve this iteration using > v2.4.0? > > >> > > >> > > >> On Thu, Jan 3, 2013 at 3:17 PM, < > > >> pyt...@li...> wrote: > > >> > > >>> Send Pytables-users mailing list submissions to > > >>> pyt...@li... > > >>> > > >>> To subscribe or unsubscribe via the World Wide Web, visit > > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> or, via email, send a message with subject or body 'help' to > > >>> pyt...@li... > > >>> > > >>> You can reach the person managing the list at > > >>> pyt...@li... > > >>> > > >>> When replying, please edit your Subject line so it is more specific > > >>> than "Re: Contents of Pytables-users digest..." > > >>> > > >>> > > >>> Today's Topics: > > >>> > > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) > > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) > > >>> > > >>> > > >>> > ---------------------------------------------------------------------- > > >>> > > >>> Message: 1 > > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > > >>> From: David Reed <dav...@gm...> > > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 > > >>> To: pyt...@li... > > >>> Message-ID: > > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > > >>> ev...@ma...> > > >>> Content-Type: text/plain; charset="iso-8859-1" > > >>> > > >>> Thanks Anthony, but unless Im missing something I don't think that > > method > > >>> will work since this will only be comparing the ith element with > ith+1 > > >>> element. I still need 2 for loops right? > > >>> > > >>> Using itertools might speed things up though, I've never used them > so I > > >>> will give it a shot and let you know how it goes. Looks like I need > to > > >>> download the latest release before I do that too. Thanks for the > help. > > >>> > > >>> -Dave > > >>> > > >>> > > >>> > > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > > >>> pyt...@li...> wrote: > > >>> > > >>> > Send Pytables-users mailing list submissions to > > >>> > pyt...@li... > > >>> > > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > or, via email, send a message with subject or body 'help' to > > >>> > pyt...@li... > > >>> > > > >>> > You can reach the person managing the list at > > >>> > pyt...@li... > > >>> > > > >>> > When replying, please edit your Subject line so it is more specific > > >>> > than "Re: Contents of Pytables-users digest..." > > >>> > > > >>> > > > >>> > Today's Topics: > > >>> > > > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) > > >>> > > > >>> > > > >>> > > > ---------------------------------------------------------------------- > > >>> > > > >>> > Message: 1 > > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > > >>> > From: Anthony Scopatz <sc...@gm...> > > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > PyTables > > >>> > To: Discussion list for PyTables > > >>> > <pyt...@li...> > > >>> > Message-ID: > > >>> > <CAPk-6T5b= > > >>> > 1EG...@ma...> > > >>> > Content-Type: text/plain; charset="iso-8859-1" > > >>> > > > >>> > HI David, > > >>> > > > >>> > Tables and table column iteration have been overhauled fairly > > recently > > >>> [1]. > > >>> > So you might try creating two iterators, offset by one, and then > > >>> doing the > > >>> > comparison. I am hacking this out super quick so please forgive > me: > > >>> > > > >>> > from itertools import izip > > >>> > > > >>> > with tb.openFile(...) as f: > > >>> > data = f.root.data > > >>> > data_i = iter(data) > > >>> > data_j = iter(data) > > >>> > data_i.next() # throw the first value away > > >>> > for i, j in izip(data_i, data_j): > > >>> > compare(i, j) > > >>> > > > >>> > You get the idea ;) > > >>> > > > >>> > Be Well > > >>> > Anthony > > >>> > > > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > > >>> > > > >>> > > > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm... > > > > >>> wrote: > > >>> > > > >>> > > I was hoping someone could help me out here. > > >>> > > > > >>> > > This is from a post I put up on StackOverflow, > > >>> > > > > >>> > > I am have a fairly large dataset that I store in HDF5 and access > > >>> using > > >>> > > PyTables. One operation I need to do on this dataset are pairwise > > >>> > > comparisons between each of the elements. This requires 2 loops, > > one > > >>> to > > >>> > > iterate over each element, and an inner loop to iterate over > every > > >>> other > > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. > > >>> > > > > >>> > > For fairly small sets I found it to be faster to dump the > contents > > >>> into a > > >>> > > multdimensional numpy array and then do my iteration. I run into > > >>> problems > > >>> > > with large sets because of memory issues and need to access each > > >>> element > > >>> > of > > >>> > > the dataset at run time. > > >>> > > > > >>> > > Putting the elements into an array gives me about 600 comparisons > > per > > >>> > > second, while operating on hdf5 data itself gives me about 300 > > >>> > comparisons > > >>> > > per second. > > >>> > > > > >>> > > Is there a way to speed this process up? > > >>> > > > > >>> > > Example follows (this is not my real code, just an example): > > >>> > > > > >>> > > *Small Set*: > > >>> > > > > >>> > > > > >>> > > with tb.openFile(h5_file, 'r') as f: > > >>> > > data = f.root.data > > >>> > > > > >>> > > N_elements = len(data) > > >>> > > elements = np.empty((N_irises, 1e5)) > > >>> > > > > >>> > > for ii, d in enumerate(data): > > >>> > > elements[ii] = data['element'] > > >>> > > > > >>> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > >>> > > for jj in xrange(ii+1, N_elements): > > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) > > >>> > > > > >>> > > *Large Set*: > > >>> > > > > >>> > > > > >>> > > with tb.openFile(h5_file, 'r') as f: > > >>> > > data = f.root.data > > >>> > > > > >>> > > N_elements = len(data) > > >>> > > > > >>> > > D = np.empty((N_irises, N_irises)) > > >>> > > for ii in xrange(N_elements): > > >>> > > for jj in xrange(ii+1, N_elements): > > >>> > > D[ii, jj] = compare(data['element'][ii], > > >>> > data['element'][jj]) > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > > CSS, > > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > >>> current > > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > > >>> > > http://p.sf.net/sfu/learnmore_122712 > > >>> > > _______________________________________________ > > >>> > > Pytables-users mailing list > > >>> > > Pyt...@li... > > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > > > >>> > > > > >>> > -------------- next part -------------- > > >>> > An HTML attachment was scrubbed... > > >>> > > > >>> > ------------------------------ > > >>> > > > >>> > > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >>> > MVPs and experts. ON SALE this month only -- learn more at: > > >>> > http://p.sf.net/sfu/learnmore_122712 > > >>> > > > >>> > ------------------------------ > > >>> > > > >>> > _______________________________________________ > > >>> > Pytables-users mailing list > > >>> > Pyt...@li... > > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > > >>> > > > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > > >>> > ********************************************* > > >>> > > > >>> -------------- next part -------------- > > >>> An HTML attachment was scrubbed... > > >>> > > >>> ------------------------------ > > >>> > > >>> Message: 2 > > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > > >>> From: David Reed <dav...@gm...> > > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > > >>> To: pyt...@li... > > >>> Message-ID: > > >>> < > > >>> CAM...@ma...> > > >>> Content-Type: text/plain; charset="iso-8859-1" > > >>> > > >>> Thanks a lot for the help so far guys! > > >>> > > >>> Looking at itertools, I found what I believe to be the perfect > function > > >>> for > > >>> what I need, itertools.combinations. This appears to be a valid > > >>> replacement > > >>> to the method proposed. > > >>> > > >>> There is a small problem that I didn't mention is that my compare > > >>> function > > >>> actually takes as inputs 2 columns from the table. Like so: > > >>> > > >>> D = np.empty((N_irises, N_irises)) > > >>> for ii in xrange(N_elements): > > >>> for jj in xrange(ii+1, N_elements): > > >>> D[ii, jj] = compare(data['element1'][ii], > > >>> data['element1'][jj],data['element2'][ii], > > >>> data['element2'][jj]) > > >>> > > >>> Is there an efficient way of using itertools with this structure? > > >>> > > >>> > > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > > >>> pyt...@li...> wrote: > > >>> > > >>> > Send Pytables-users mailing list submissions to > > >>> > pyt...@li... > > >>> > > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > or, via email, send a message with subject or body 'help' to > > >>> > pyt...@li... > > >>> > > > >>> > You can reach the person managing the list at > > >>> > pyt...@li... > > >>> > > > >>> > When replying, please edit your Subject line so it is more specific > > >>> > than "Re: Contents of Pytables-users digest..." > > >>> > > > >>> > > > >>> > Today's Topics: > > >>> > > > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > > >>> > > > >>> > > > >>> > > > ---------------------------------------------------------------------- > > >>> > > > >>> > Message: 1 > > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > > >>> > From: Josh Ayers <jos...@gm...> > > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using > PyTables > > >>> > To: Discussion list for PyTables > > >>> > <pyt...@li...> > > >>> > Message-ID: > > >>> > < > > >>> > CAC...@ma... > > > > >>> > Content-Type: text/plain; charset="iso-8859-1" > > >>> > > > >>> > David, > > >>> > > > >>> > The change in issue 27 was only for iteration over a tables.Column > > >>> > instance. To use it, tweak Anthony's code as follows. This will > > >>> iterate > > >>> > over the "element" column, as in your original example. > > >>> > > > >>> > Note also that this will only work with the development version of > > >>> PyTables > > >>> > available on github. It will be very slow using the released > v2.4.0. > > >>> > > > >>> > > > >>> > from itertools import izip > > >>> > > > >>> > with tb.openFile(...) as f: > > >>> > data = f.root.data.cols.element > > >>> > data_i = iter(data) > > >>> > data_j = iter(data) > > >>> > data_i.next() # throw the first value away > > >>> > for i, j in izip(data_i, data_j): > > >>> > compare(i, j) > > >>> > > > >>> > > > >>> > Hope that helps, > > >>> > Josh > > >>> > > > >>> > > > >>> > > > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm... > > > > >>> wrote: > > >>> > > > >>> > > HI David, > > >>> > > > > >>> > > Tables and table column iteration have been overhauled fairly > > >>> recently > > >>> > > [1]. So you might try creating two iterators, offset by one, and > > >>> then > > >>> > > doing the comparison. I am hacking this out super quick so > please > > >>> > forgive > > >>> > > me: > > >>> > > > > >>> > > from itertools import izip > > >>> > > > > >>> > > with tb.openFile(...) as f: > > >>> > > data = f.root.data > > >>> > > data_i = iter(data) > > >>> > > data_j = iter(data) > > >>> > > data_i.next() # throw the first value away > > >>> > > for i, j in izip(data_i, data_j): > > >>> > > compare(i, j) > > >>> > > > > >>> > > You get the idea ;) > > >>> > > > > >>> > > Be Well > > >>> > > Anthony > > >>> > > > > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > > >>> > > > > >>> > > > > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > dav...@gm... > > > > > >>> > wrote: > > >>> > > > > >>> > >> I was hoping someone could help me out here. > > >>> > >> > > >>> > >> This is from a post I put up on StackOverflow, > > >>> > >> > > >>> > >> I am have a fairly large dataset that I store in HDF5 and access > > >>> using > > >>> > >> PyTables. One operation I need to do on this dataset are > pairwise > > >>> > >> comparisons between each of the elements. This requires 2 loops, > > >>> one to > > >>> > >> iterate over each element, and an inner loop to iterate over > every > > >>> other > > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. > > >>> > >> > > >>> > >> For fairly small sets I found it to be faster to dump the > contents > > >>> into > > >>> > a > > >>> > >> multdimensional numpy array and then do my iteration. I run into > > >>> > problems > > >>> > >> with large sets because of memory issues and need to access each > > >>> > element of > > >>> > >> the dataset at run time. > > >>> > >> > > >>> > >> Putting the elements into an array gives me about 600 > comparisons > > >>> per > > >>> > >> second, while operating on hdf5 data itself gives me about 300 > > >>> > comparisons > > >>> > >> per second. > > >>> > >> > > >>> > >> Is there a way to speed this process up? > > >>> > >> > > >>> > >> Example follows (this is not my real code, just an example): > > >>> > >> > > >>> > >> *Small Set*: > > >>> > >> > > >>> > >> > > >>> > >> with tb.openFile(h5_file, 'r') as f: > > >>> > >> data = f.root.data > > >>> > >> > > >>> > >> N_elements = len(data) > > >>> > >> elements = np.empty((N_irises, 1e5)) > > >>> > >> > > >>> > >> for ii, d in enumerate(data): > > >>> > >> elements[ii] = data['element'] > > >>> > >> > > >>> > >> D = np.empty((N_irises, N_irises)) for ii in > xrange(N_elements): > > >>> > >> for jj in xrange(ii+1, N_elements): > > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) > > >>> > >> > > >>> > >> *Large Set*: > > >>> > >> > > >>> > >> > > >>> > >> with tb.openFile(h5_file, 'r') as f: > > >>> > >> data = f.root.data > > >>> > >> > > >>> > >> N_elements = len(data) > > >>> > >> > > >>> > >> D = np.empty((N_irises, N_irises)) > > >>> > >> for ii in xrange(N_elements): > > >>> > >> for jj in xrange(ii+1, N_elements): > > >>> > >> D[ii, jj] = compare(data['element'][ii], > > >>> > data['element'][jj]) > > >>> > >> > > >>> > >> > > >>> > >> > > >>> > >> > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > > >>> CSS, > > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > >>> current > > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > > >>> > >> http://p.sf.net/sfu/learnmore_122712 > > >>> > >> _______________________________________________ > > >>> > >> Pytables-users mailing list > > >>> > >> Pyt...@li... > > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > >> > > >>> > >> > > >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > > CSS, > > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > >>> current > > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft > > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > > >>> > > http://p.sf.net/sfu/learnmore_122712 > > >>> > > _______________________________________________ > > >>> > > Pytables-users mailing list > > >>> > > Pyt...@li... > > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > > > >>> > > > > >>> > -------------- next part -------------- > > >>> > An HTML attachment was scrubbed... > > >>> > > > >>> > ------------------------------ > > >>> > > > >>> > > > >>> > > > >>> > > > ------------------------------------------------------------------------------ > > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >>> > MVPs and experts. ON SALE this month only -- learn more at: > > >>> > http://p.sf.net/sfu/learnmore_122712 > > >>> > > > >>> > ------------------------------ > > >>> > > > >>> > _______________________________________________ > > >>> > Pytables-users mailing list > > >>> > Pyt...@li... > > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > > >>> > > > >>> > End of Pytables-users Digest, Vol 80, Issue 3 > > >>> > ********************************************* > > >>> > > > >>> -------------- next part -------------- > > >>> An HTML attachment was scrubbed... > > >>> > > >>> ------------------------------ > > >>> > > >>> > > >>> > > > ------------------------------------------------------------------------------ > > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >>> MVPs and experts. ON SALE this month only -- learn more at: > > >>> http://p.sf.net/sfu/learnmore_122712 > > >>> > > >>> ------------------------------ > > >>> > > >>> _______________________________________________ > > >>> Pytables-users mailing list > > >>> Pyt...@li... > > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >>> > > >>> > > >>> End of Pytables-users Digest, Vol 80, Issue 4 > > >>> ********************************************* > > >>> > > >> > > >> > > >> > > >> > > > ------------------------------------------------------------------------------ > > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> MVPs and experts. ON SALE this month only -- learn more at: > > >> http://p.sf.net/sfu/learnmore_122712 > > >> _______________________________________________ > > >> Pytables-users mailing list > > >> Pyt...@li... > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> > > >> > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > > MVPs and experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Pytables-users mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > > > ------------------------------ > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > End of Pytables-users Digest, Vol 80, Issue 8 > > ********************************************* > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > much more. Get web development skills now with LearnDevNow - > 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. > SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122812 > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 80, Issue 9 > ********************************************* > |
From: Anthony S. <sc...@gm...> - 2013-01-23 16:45:24
|
Hi Jeff, I think that this is related to the way that numexpr works (but I don't have time to look into it) [1]. You may be able to set this value dynamically. Alternatively, the limit seems to be 256 (on my machine). So that seems like the threshold to not go above. Hope this helps. Be Well Anthon 1. bottom of the page, http://code.google.com/p/numexpr/ On Wed, Jan 23, 2013 at 10:33 AM, Jeff Reback <jr...@ya...> wrote: > It seems there is a limit to the condition sytax when using readWhere > > I get various exceptions when passing increasing number of terms > > is this some kind of hard coded limit? > > is there a way to pre-compile this and test for it? (e.g. when I am > actually creating the condition) > - my alternative is simple to drop that part of the condition and filter > out after > > thanks, > > Jeff > > ans -> [n->2 ,len_selector->58 ] > --> (399,) > ans -> [n->10 ,len_selector->234 ] > --> (999,) > ans -> [n->100 ,len_selector->2304 ] > --> (999,) > ans -> [n->200 ,len_selector->4704 ] > --> (999,) > ans -> [n->254 ,len_selector->6000 ] > --> chr() arg not in range(256) > ans -> [n->255 ,len_selector->6024 ] > --> chr() arg not in range(256) > ans -> [n->300 ,len_selector->7104 ] > --> chr() arg not in range(256) > ans -> [n->400 ,len_selector->9504 ] > --> maximum recursion depth exceeded while calling a Python object > ans -> [n->500 ,len_selector->11904 ] > --> maximum recursion depth exceeded while calling a Python object > ------------ script to reproduce -------- > #!/usr/local/bin/python > import tables > import numpy as np > import datetime, time > test_file = 'test_select.h5' > handle = tables.openFile(test_file, "w") > node = handle.createGroup(handle.root, 'test') > table = handle.createTable(node, 'table', dict( > index = tables.Int64Col(), > column = tables.StringCol(25), > values = tables.FloatCol(shape=(3)), > )) > > # add data > r = table.row > for i in xrange(1000): > r['index'] = i > r['column'] = ("str-%d" % (i % 5)) > r['values'] = np.arange(3) > r.append() > table.flush() > handle.close() > > def read_for(n): > handle = tables.openFile(test_file,"r") > selector = "(index >= 1) & %s" % '(' + ' | '.join([ "(column == > 'str-%s')" % v for v in range(n) ]) + ')' > #print "selector -> [%s] --> %s" % (n,selector) > try: > ans = handle.root.test.table.readWhere(selector) > print "ans -> [n->%-20.20s,len_selector->%-20.20s] --> %s" % > (n,len(selector),ans.shape) > except (Exception), detail: > print "ans -> [n->%-20.20s,len_selector->%-20.20s] --> %s" % > (n,len(selector),str(detail)) > handle.close() > > for n in [ 2, 10, 100, 200, 254, 255, 300, 400, 500 ]: > read_for(n) > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jeff R. <jr...@ya...> - 2013-01-23 16:33:23
|
It seems there is a limit to the condition sytax when using readWhere I get various exceptions when passing increasing number of terms is this some kind of hard coded limit? is there a way to pre-compile this and test for it? (e.g. when I am actually creating the condition) - my alternative is simple to drop that part of the condition and filter out after thanks, Jeff ans -> [n->2 ,len_selector->58 ] --> (399,) ans -> [n->10 ,len_selector->234 ] --> (999,) ans -> [n->100 ,len_selector->2304 ] --> (999,) ans -> [n->200 ,len_selector->4704 ] --> (999,) ans -> [n->254 ,len_selector->6000 ] --> chr() arg not in range(256) ans -> [n->255 ,len_selector->6024 ] --> chr() arg not in range(256) ans -> [n->300 ,len_selector->7104 ] --> chr() arg not in range(256) ans -> [n->400 ,len_selector->9504 ] --> maximum recursion depth exceeded while calling a Python object ans -> [n->500 ,len_selector->11904 ] --> maximum recursion depth exceeded while calling a Python object ------------ script to reproduce -------- #!/usr/local/bin/python import tables import numpy as np import datetime, time test_file = 'test_select.h5' handle = tables.openFile(test_file, "w") node = handle.createGroup(handle.root, 'test') table = handle.createTable(node, 'table', dict( index = tables.Int64Col(), column = tables.StringCol(25), values = tables.FloatCol(shape=(3)), )) # add data r = table.row for i in xrange(1000): r['index'] = i r['column'] = ("str-%d" % (i % 5)) r['values'] = np.arange(3) r.append() table.flush() handle.close() def read_for(n): handle = tables.openFile(test_file,"r") selector = "(index >= 1) & %s" % '(' + ' | '.join([ "(column == 'str-%s')" % v for v in range(n) ]) + ')' #print "selector -> [%s] --> %s" % (n,selector) try: ans = handle.root.test.table.readWhere(selector) print "ans -> [n->%-20.20s,len_selector->%-20.20s] --> %s" % (n,len(selector),ans.shape) except (Exception), detail: print "ans -> [n->%-20.20s,len_selector->%-20.20s] --> %s" % (n,len(selector),str(detail)) handle.close() for n in [ 2, 10, 100, 200, 254, 255, 300, 400, 500 ]: read_for(n) |
From: Anthony S. <sc...@gm...> - 2013-01-23 16:31:16
|
yeah, indexing with a list (rather than a tuple) has a different meaning. The most notable place I have seen list-indexing used is with numpy structured arrays. In all other locations the tuple slicing is for drilling down different dimensions, as you say. On Wed, Jan 23, 2013 at 10:25 AM, Andreas Hilboll <li...@hi...> wrote: > Am Mi 23 Jan 2013 16:57:27 CET schrieb Anthony Scopatz: > > Hi Andreas, > > > > I think that the problem here is that coord_slice is actually a list > > of slices, which you can't index by. (Though, you may be able to in > > numpy...) > > > > Try something like _ds[coord_slice[0]] instead. > > > > Be Well > > Anthony > > > > B eW > > > > > > > > On Tue, Jan 22, 2013 at 8:44 AM, Andreas Hilboll <li...@hi... > > <mailto:li...@hi...>> wrote: > > > > Hi, > > > > how can I use Python's built-in `slice` object on CArray? > > Currently, I'm > > trying > > > > In: coord_slice > > Out: [slice(0, 31, None), slice(0, 5760, None), slice(0, 2880, > > None)] > > > > In: _ds > > Out: /data/mydata (CArray(31, 5760, 2880), shuffle, blosc(5)) '' > > atom := Float32Atom(shape=(), dflt=0.0) > > maindim := 0 > > flavor := 'numpy' > > byteorder := 'little' > > chunkshape := (1, 45, 2880) > > > > In: _ds[coord_slice] > > Out: *** TypeError: long() argument must be a string or a number, > > not 'slice' > > > > The problem is that I want to write something generic, and I don't > > know > > beforehand how many dimensions the CArray has. My current plan is to > > create a tuple of slice objects programatically (using list > > comprehension), and then use this tuple as index. But apparently it > > doesn't work with pytables 2.3.1. > > > > Any suggestions on how to accomplish my task are greatly > > appreciated :) > > > > Cheers, Andreas. > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET <http://ASP.NET>, > > C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnnow-d2d > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > <mailto:Pyt...@li...> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnnow-d2d > > > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > Hi Anthony, > > thanks for your input. However, I need to slice in multiple dimensions > simultaneously, because my array is very large and I don't want to clog > memory. > > However, I found out that it works with a tuple of slice objects, so > _ds[tuple(coord_slice)] works as expected. > > Cheers, Andreas. > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Andreas H. <li...@hi...> - 2013-01-23 16:25:40
|
Am Mi 23 Jan 2013 16:57:27 CET schrieb Anthony Scopatz: > Hi Andreas, > > I think that the problem here is that coord_slice is actually a list > of slices, which you can't index by. (Though, you may be able to in > numpy...) > > Try something like _ds[coord_slice[0]] instead. > > Be Well > Anthony > > B eW > > > > On Tue, Jan 22, 2013 at 8:44 AM, Andreas Hilboll <li...@hi... > <mailto:li...@hi...>> wrote: > > Hi, > > how can I use Python's built-in `slice` object on CArray? > Currently, I'm > trying > > In: coord_slice > Out: [slice(0, 31, None), slice(0, 5760, None), slice(0, 2880, > None)] > > In: _ds > Out: /data/mydata (CArray(31, 5760, 2880), shuffle, blosc(5)) '' > atom := Float32Atom(shape=(), dflt=0.0) > maindim := 0 > flavor := 'numpy' > byteorder := 'little' > chunkshape := (1, 45, 2880) > > In: _ds[coord_slice] > Out: *** TypeError: long() argument must be a string or a number, > not 'slice' > > The problem is that I want to write something generic, and I don't > know > beforehand how many dimensions the CArray has. My current plan is to > create a tuple of slice objects programatically (using list > comprehension), and then use this tuple as index. But apparently it > doesn't work with pytables 2.3.1. > > Any suggestions on how to accomplish my task are greatly > appreciated :) > > Cheers, Andreas. > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET <http://ASP.NET>, > C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users Hi Anthony, thanks for your input. However, I need to slice in multiple dimensions simultaneously, because my array is very large and I don't want to clog memory. However, I found out that it works with a tuple of slice objects, so _ds[tuple(coord_slice)] works as expected. Cheers, Andreas. |
From: Anthony S. <sc...@gm...> - 2013-01-23 15:57:54
|
Hi Andreas, I think that the problem here is that coord_slice is actually a list of slices, which you can't index by. (Though, you may be able to in numpy...) Try something like _ds[coord_slice[0]] instead. Be Well Anthony B eW On Tue, Jan 22, 2013 at 8:44 AM, Andreas Hilboll <li...@hi...> wrote: > Hi, > > how can I use Python's built-in `slice` object on CArray? Currently, I'm > trying > > In: coord_slice > Out: [slice(0, 31, None), slice(0, 5760, None), slice(0, 2880, None)] > > In: _ds > Out: /data/mydata (CArray(31, 5760, 2880), shuffle, blosc(5)) '' > atom := Float32Atom(shape=(), dflt=0.0) > maindim := 0 > flavor := 'numpy' > byteorder := 'little' > chunkshape := (1, 45, 2880) > > In: _ds[coord_slice] > Out: *** TypeError: long() argument must be a string or a number, > not 'slice' > > The problem is that I want to write something generic, and I don't know > beforehand how many dimensions the CArray has. My current plan is to > create a tuple of slice objects programatically (using list > comprehension), and then use this tuple as index. But apparently it > doesn't work with pytables 2.3.1. > > Any suggestions on how to accomplish my task are greatly appreciated :) > > Cheers, Andreas. > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Andreas H. <li...@hi...> - 2013-01-23 15:38:48
|
Hi, how can I use Python's built-in `slice` object on CArray? Currently, I'm trying In: coord_slice Out: [slice(0, 31, None), slice(0, 5760, None), slice(0, 2880, None)] In: _ds Out: /data/mydata (CArray(31, 5760, 2880), shuffle, blosc(5)) '' atom := Float32Atom(shape=(), dflt=0.0) maindim := 0 flavor := 'numpy' byteorder := 'little' chunkshape := (1, 45, 2880) In: _ds[coord_slice] Out: *** TypeError: long() argument must be a string or a number, not 'slice' The problem is that I want to write something generic, and I don't know beforehand how many dimensions the CArray has. My current plan is to create a tuple of slice objects programatically (using list comprehension), and then use this tuple as index. But apparently it doesn't work with pytables 2.3.1. Any suggestions on how to accomplish my task are greatly appreciated :) Cheers, Andreas. |
From: Anthony S. <sc...@gm...> - 2013-01-04 23:15:50
|
Glad that this worked for you David! On Fri, Jan 4, 2013 at 7:56 AM, David Reed <dav...@gm...> wrote: > I can't thank you guys enough for the help. I was able to add the > __iter__ function to the table.py file and everything seems to be working > great! I'm not quite as fast as I was with iterating right of a matrix but > pretty close. I was at 555 comparisons per second, and now im at 420. > > I handled the problem I mentioned earlier by doing this, and it seems to > work great: > > A = f.root.data.cols.A > B = f.root.data.cols.B > > D = np.empty((len(A), len(A)) > for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), > 2): > D[ii, jj] = compare(a1, a2, b1, b2) > > Again, thanks a lot. > > -Dave > > > On Thu, Jan 3, 2013 at 6:31 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) >> 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 3 Jan 2013 17:26:55 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> Gz...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> >> wrote: >> >> > Thanks a lot for the help so far guys! >> > >> > Looking at itertools, I found what I believe to be the perfect function >> > for what I need, itertools.combinations. This appears to be a valid >> > replacement to the method proposed. >> > >> >> Yes, combinations is awesome! >> >> >> > >> > There is a small problem that I didn't mention is that my compare >> function >> > actually takes as inputs 2 columns from the table. Like so: >> > >> > D = np.empty((N_irises, N_irises)) >> > for ii in xrange(N_elements): >> > for jj in xrange(ii+1, N_elements): >> > D[ii, jj] = compare(data['element1'][ii], >> data['element1'][jj],data['element2'][ii], >> > data['element2'][jj]) >> > >> > Is there an efficient way of using itertools with this structure? >> > >> >> You can always make two other iterators for each column. Since you have >> two columns you would have 4 iterators. I am not sure how fast this is >> going to be but I am confident that there is definitely a way to do this >> in >> one for-loop, which is going to be way faster than nested loops. >> >> Be Well >> Anthony >> >> >> > >> > >> > On Thu, Jan 3, 2013 at 1:29 PM, < >> > pyt...@li...> wrote: >> > >> >> Send Pytables-users mailing list submissions to >> >> pyt...@li... >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> or, via email, send a message with subject or body 'help' to >> >> pyt...@li... >> >> >> >> You can reach the person managing the list at >> >> pyt...@li... >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> From: Josh Ayers <jos...@gm...> >> >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> >> To: Discussion list for PyTables >> >> <pyt...@li...> >> >> Message-ID: >> >> < >> >> CAC...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> David, >> >> >> >> The change in issue 27 was only for iteration over a tables.Column >> >> instance. To use it, tweak Anthony's code as follows. This will >> iterate >> >> over the "element" column, as in your original example. >> >> >> >> Note also that this will only work with the development version of >> >> PyTables >> >> available on github. It will be very slow using the released v2.4.0. >> >> >> >> >> >> from itertools import izip >> >> >> >> with tb.openFile(...) as f: >> >> data = f.root.data.cols.element >> >> data_i = iter(data) >> >> data_j = iter(data) >> >> data_i.next() # throw the first value away >> >> for i, j in izip(data_i, data_j): >> >> compare(i, j) >> >> >> >> >> >> Hope that helps, >> >> Josh >> >> >> >> >> >> >> >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> >> wrote: >> >> >> >> > HI David, >> >> > >> >> > Tables and table column iteration have been overhauled fairly >> recently >> >> > [1]. So you might try creating two iterators, offset by one, and >> then >> >> > doing the comparison. I am hacking this out super quick so please >> >> forgive >> >> > me: >> >> > >> >> > from itertools import izip >> >> > >> >> > with tb.openFile(...) as f: >> >> > data = f.root.data >> >> > data_i = iter(data) >> >> > data_j = iter(data) >> >> > data_i.next() # throw the first value away >> >> > for i, j in izip(data_i, data_j): >> >> > compare(i, j) >> >> > >> >> > You get the idea ;) >> >> > >> >> > Be Well >> >> > Anthony >> >> > >> >> > 1. https://github.com/PyTables/PyTables/issues/27 >> >> > >> >> > >> >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> >> wrote: >> >> > >> >> >> I was hoping someone could help me out here. >> >> >> >> >> >> This is from a post I put up on StackOverflow, >> >> >> >> >> >> I am have a fairly large dataset that I store in HDF5 and access >> using >> >> >> PyTables. One operation I need to do on this dataset are pairwise >> >> >> comparisons between each of the elements. This requires 2 loops, >> one to >> >> >> iterate over each element, and an inner loop to iterate over every >> >> other >> >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> >> >> >> >> >> For fairly small sets I found it to be faster to dump the contents >> >> into a >> >> >> multdimensional numpy array and then do my iteration. I run into >> >> problems >> >> >> with large sets because of memory issues and need to access each >> >> element of >> >> >> the dataset at run time. >> >> >> >> >> >> Putting the elements into an array gives me about 600 comparisons >> per >> >> >> second, while operating on hdf5 data itself gives me about 300 >> >> comparisons >> >> >> per second. >> >> >> >> >> >> Is there a way to speed this process up? >> >> >> >> >> >> Example follows (this is not my real code, just an example): >> >> >> >> >> >> *Small Set*: >> >> >> >> >> >> >> >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> data = f.root.data >> >> >> >> >> >> N_elements = len(data) >> >> >> elements = np.empty((N_irises, 1e5)) >> >> >> >> >> >> for ii, d in enumerate(data): >> >> >> elements[ii] = data['element'] >> >> >> >> >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> >> >> for jj in xrange(ii+1, N_elements): >> >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >> >> >> >> *Large Set*: >> >> >> >> >> >> >> >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> data = f.root.data >> >> >> >> >> >> N_elements = len(data) >> >> >> >> >> >> D = np.empty((N_irises, N_irises)) >> >> >> for ii in xrange(N_elements): >> >> >> for jj in xrange(ii+1, N_elements): >> >> >> D[ii, jj] = compare(data['element'][ii], >> >> data['element'][jj]) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >> _______________________________________________ >> >> >> Pytables-users mailing list >> >> >> Pyt...@li... >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> >> ------------------------------------------------------------------------------ >> >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> > MVPs and experts. ON SALE this month only -- learn more at: >> >> > http://p.sf.net/sfu/learnmore_122712 >> >> > _______________________________________________ >> >> > Pytables-users mailing list >> >> > Pyt...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> > >> >> -------------- next part -------------- >> >> An HTML attachment was scrubbed... >> >> >> >> ------------------------------ >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >> >> ------------------------------ >> >> >> >> _______________________________________________ >> >> Pytables-users mailing list >> >> Pyt...@li... >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> >> End of Pytables-users Digest, Vol 80, Issue 3 >> >> ********************************************* >> >> >> > >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 2 >> Date: Thu, 3 Jan 2013 17:30:59 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Josh is right that you can just edit the code by hand (which works but >> sucks). >> >> However, on Windows -- on the rare occasion when I also have to develop on >> it -- I typically use a distribution that includes a compiler, cython, >> hdf5, and pytables already and then I install my development version from >> github OVER this. I recommend either EPD or Anaconda, though other >> distributions listed here [1] might also work. >> >> Be well >> Anthony >> >> 1. http://numfocus.org/projects-2/software-distributions/ >> >> >> On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote: >> >> > The change was in pure Python code, so you should be able to just paste >> in >> > the changes to your local copy. Start with the table.Column.__iter__ >> > method (lines 3296-3310) here. >> > >> > >> > >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> > >> > It needs to be modified slightly because it uses some additional >> features >> > that aren't available in the released version (the out=buf_slice >> argument >> > to table.read). The following should work. >> > >> > def __iter__(self): >> > table = self.table >> > itemsize = self.dtype.itemsize >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize >> > max_row = len(self) >> > for start_row in xrange(0, len(self), nrowsinbuf): >> > end_row = min([start_row + nrowsinbuf, max_row]) >> > buf = table.read(start_row, end_row, 1, field=self.pathname) >> > for row in buf: >> > yield row >> > >> > >> > I haven't tested this, but I think it will work. >> > >> > Josh >> > >> > >> > >> > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> >> wrote: >> > >> >> I apologize if I'm starting to sound helpless, but I'm forced to work >> on >> >> Windows 7 at work and have never had luck compiling python source >> >> successfully. I have had to rely on precompiled binaries and now its >> >> biting me in the butt. >> >> >> >> Is there any quick fix I can do to improve this iteration using v2.4.0? >> >> >> >> >> >> On Thu, Jan 3, 2013 at 3:17 PM, < >> >> pyt...@li...> wrote: >> >> >> >>> Send Pytables-users mailing list submissions to >> >>> pyt...@li... >> >>> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> or, via email, send a message with subject or body 'help' to >> >>> pyt...@li... >> >>> >> >>> You can reach the person managing the list at >> >>> pyt...@li... >> >>> >> >>> When replying, please edit your Subject line so it is more specific >> >>> than "Re: Contents of Pytables-users digest..." >> >>> >> >>> >> >>> Today's Topics: >> >>> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> >>> >> >>> >> >>> ---------------------------------------------------------------------- >> >>> >> >>> Message: 1 >> >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> >>> From: David Reed <dav...@gm...> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 >> >>> To: pyt...@li... >> >>> Message-ID: >> >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> >>> ev...@ma...> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> >> >>> Thanks Anthony, but unless Im missing something I don't think that >> method >> >>> will work since this will only be comparing the ith element with ith+1 >> >>> element. I still need 2 for loops right? >> >>> >> >>> Using itertools might speed things up though, I've never used them so >> I >> >>> will give it a shot and let you know how it goes. Looks like I need >> to >> >>> download the latest release before I do that too. Thanks for the >> help. >> >>> >> >>> -Dave >> >>> >> >>> >> >>> >> >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> >>> pyt...@li...> wrote: >> >>> >> >>> > Send Pytables-users mailing list submissions to >> >>> > pyt...@li... >> >>> > >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > or, via email, send a message with subject or body 'help' to >> >>> > pyt...@li... >> >>> > >> >>> > You can reach the person managing the list at >> >>> > pyt...@li... >> >>> > >> >>> > When replying, please edit your Subject line so it is more specific >> >>> > than "Re: Contents of Pytables-users digest..." >> >>> > >> >>> > >> >>> > Today's Topics: >> >>> > >> >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) >> >>> > >> >>> > >> >>> > >> ---------------------------------------------------------------------- >> >>> > >> >>> > Message: 1 >> >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> >>> > From: Anthony Scopatz <sc...@gm...> >> >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> PyTables >> >>> > To: Discussion list for PyTables >> >>> > <pyt...@li...> >> >>> > Message-ID: >> >>> > <CAPk-6T5b= >> >>> > 1EG...@ma...> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >>> > >> >>> > HI David, >> >>> > >> >>> > Tables and table column iteration have been overhauled fairly >> recently >> >>> [1]. >> >>> > So you might try creating two iterators, offset by one, and then >> >>> doing the >> >>> > comparison. I am hacking this out super quick so please forgive me: >> >>> > >> >>> > from itertools import izip >> >>> > >> >>> > with tb.openFile(...) as f: >> >>> > data = f.root.data >> >>> > data_i = iter(data) >> >>> > data_j = iter(data) >> >>> > data_i.next() # throw the first value away >> >>> > for i, j in izip(data_i, data_j): >> >>> > compare(i, j) >> >>> > >> >>> > You get the idea ;) >> >>> > >> >>> > Be Well >> >>> > Anthony >> >>> > >> >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> >>> > >> >>> > >> >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> >>> wrote: >> >>> > >> >>> > > I was hoping someone could help me out here. >> >>> > > >> >>> > > This is from a post I put up on StackOverflow, >> >>> > > >> >>> > > I am have a fairly large dataset that I store in HDF5 and access >> >>> using >> >>> > > PyTables. One operation I need to do on this dataset are pairwise >> >>> > > comparisons between each of the elements. This requires 2 loops, >> one >> >>> to >> >>> > > iterate over each element, and an inner loop to iterate over every >> >>> other >> >>> > > element. This operation thus looks at N(N-1)/2 comparisons. >> >>> > > >> >>> > > For fairly small sets I found it to be faster to dump the contents >> >>> into a >> >>> > > multdimensional numpy array and then do my iteration. I run into >> >>> problems >> >>> > > with large sets because of memory issues and need to access each >> >>> element >> >>> > of >> >>> > > the dataset at run time. >> >>> > > >> >>> > > Putting the elements into an array gives me about 600 comparisons >> per >> >>> > > second, while operating on hdf5 data itself gives me about 300 >> >>> > comparisons >> >>> > > per second. >> >>> > > >> >>> > > Is there a way to speed this process up? >> >>> > > >> >>> > > Example follows (this is not my real code, just an example): >> >>> > > >> >>> > > *Small Set*: >> >>> > > >> >>> > > >> >>> > > with tb.openFile(h5_file, 'r') as f: >> >>> > > data = f.root.data >> >>> > > >> >>> > > N_elements = len(data) >> >>> > > elements = np.empty((N_irises, 1e5)) >> >>> > > >> >>> > > for ii, d in enumerate(data): >> >>> > > elements[ii] = data['element'] >> >>> > > >> >>> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> >>> > > for jj in xrange(ii+1, N_elements): >> >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> >>> > > >> >>> > > *Large Set*: >> >>> > > >> >>> > > >> >>> > > with tb.openFile(h5_file, 'r') as f: >> >>> > > data = f.root.data >> >>> > > >> >>> > > N_elements = len(data) >> >>> > > >> >>> > > D = np.empty((N_irises, N_irises)) >> >>> > > for ii in xrange(N_elements): >> >>> > > for jj in xrange(ii+1, N_elements): >> >>> > > D[ii, jj] = compare(data['element'][ii], >> >>> > data['element'][jj]) >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >>> > > _______________________________________________ >> >>> > > Pytables-users mailing list >> >>> > > Pyt...@li... >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > > >> >>> > > >> >>> > -------------- next part -------------- >> >>> > An HTML attachment was scrubbed... >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > http://p.sf.net/sfu/learnmore_122712 >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > _______________________________________________ >> >>> > Pytables-users mailing list >> >>> > Pyt...@li... >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >>> > >> >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> >>> > ********************************************* >> >>> > >> >>> -------------- next part -------------- >> >>> An HTML attachment was scrubbed... >> >>> >> >>> ------------------------------ >> >>> >> >>> Message: 2 >> >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> >>> From: David Reed <dav...@gm...> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> >>> To: pyt...@li... >> >>> Message-ID: >> >>> < >> >>> CAM...@ma...> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> >> >>> Thanks a lot for the help so far guys! >> >>> >> >>> Looking at itertools, I found what I believe to be the perfect >> function >> >>> for >> >>> what I need, itertools.combinations. This appears to be a valid >> >>> replacement >> >>> to the method proposed. >> >>> >> >>> There is a small problem that I didn't mention is that my compare >> >>> function >> >>> actually takes as inputs 2 columns from the table. Like so: >> >>> >> >>> D = np.empty((N_irises, N_irises)) >> >>> for ii in xrange(N_elements): >> >>> for jj in xrange(ii+1, N_elements): >> >>> D[ii, jj] = compare(data['element1'][ii], >> >>> data['element1'][jj],data['element2'][ii], >> >>> data['element2'][jj]) >> >>> >> >>> Is there an efficient way of using itertools with this structure? >> >>> >> >>> >> >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> >>> pyt...@li...> wrote: >> >>> >> >>> > Send Pytables-users mailing list submissions to >> >>> > pyt...@li... >> >>> > >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > or, via email, send a message with subject or body 'help' to >> >>> > pyt...@li... >> >>> > >> >>> > You can reach the person managing the list at >> >>> > pyt...@li... >> >>> > >> >>> > When replying, please edit your Subject line so it is more specific >> >>> > than "Re: Contents of Pytables-users digest..." >> >>> > >> >>> > >> >>> > Today's Topics: >> >>> > >> >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >>> > >> >>> > >> >>> > >> ---------------------------------------------------------------------- >> >>> > >> >>> > Message: 1 >> >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >>> > From: Josh Ayers <jos...@gm...> >> >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> PyTables >> >>> > To: Discussion list for PyTables >> >>> > <pyt...@li...> >> >>> > Message-ID: >> >>> > < >> >>> > CAC...@ma...> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >>> > >> >>> > David, >> >>> > >> >>> > The change in issue 27 was only for iteration over a tables.Column >> >>> > instance. To use it, tweak Anthony's code as follows. This will >> >>> iterate >> >>> > over the "element" column, as in your original example. >> >>> > >> >>> > Note also that this will only work with the development version of >> >>> PyTables >> >>> > available on github. It will be very slow using the released >> v2.4.0. >> >>> > >> >>> > >> >>> > from itertools import izip >> >>> > >> >>> > with tb.openFile(...) as f: >> >>> > data = f.root.data.cols.element >> >>> > data_i = iter(data) >> >>> > data_j = iter(data) >> >>> > data_i.next() # throw the first value away >> >>> > for i, j in izip(data_i, data_j): >> >>> > compare(i, j) >> >>> > >> >>> > >> >>> > Hope that helps, >> >>> > Josh >> >>> > >> >>> > >> >>> > >> >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> >>> wrote: >> >>> > >> >>> > > HI David, >> >>> > > >> >>> > > Tables and table column iteration have been overhauled fairly >> >>> recently >> >>> > > [1]. So you might try creating two iterators, offset by one, and >> >>> then >> >>> > > doing the comparison. I am hacking this out super quick so please >> >>> > forgive >> >>> > > me: >> >>> > > >> >>> > > from itertools import izip >> >>> > > >> >>> > > with tb.openFile(...) as f: >> >>> > > data = f.root.data >> >>> > > data_i = iter(data) >> >>> > > data_j = iter(data) >> >>> > > data_i.next() # throw the first value away >> >>> > > for i, j in izip(data_i, data_j): >> >>> > > compare(i, j) >> >>> > > >> >>> > > You get the idea ;) >> >>> > > >> >>> > > Be Well >> >>> > > Anthony >> >>> > > >> >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >> >>> > > >> >>> > > >> >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> dav...@gm...> >> >>> > wrote: >> >>> > > >> >>> > >> I was hoping someone could help me out here. >> >>> > >> >> >>> > >> This is from a post I put up on StackOverflow, >> >>> > >> >> >>> > >> I am have a fairly large dataset that I store in HDF5 and access >> >>> using >> >>> > >> PyTables. One operation I need to do on this dataset are pairwise >> >>> > >> comparisons between each of the elements. This requires 2 loops, >> >>> one to >> >>> > >> iterate over each element, and an inner loop to iterate over >> every >> >>> other >> >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. >> >>> > >> >> >>> > >> For fairly small sets I found it to be faster to dump the >> contents >> >>> into >> >>> > a >> >>> > >> multdimensional numpy array and then do my iteration. I run into >> >>> > problems >> >>> > >> with large sets because of memory issues and need to access each >> >>> > element of >> >>> > >> the dataset at run time. >> >>> > >> >> >>> > >> Putting the elements into an array gives me about 600 comparisons >> >>> per >> >>> > >> second, while operating on hdf5 data itself gives me about 300 >> >>> > comparisons >> >>> > >> per second. >> >>> > >> >> >>> > >> Is there a way to speed this process up? >> >>> > >> >> >>> > >> Example follows (this is not my real code, just an example): >> >>> > >> >> >>> > >> *Small Set*: >> >>> > >> >> >>> > >> >> >>> > >> with tb.openFile(h5_file, 'r') as f: >> >>> > >> data = f.root.data >> >>> > >> >> >>> > >> N_elements = len(data) >> >>> > >> elements = np.empty((N_irises, 1e5)) >> >>> > >> >> >>> > >> for ii, d in enumerate(data): >> >>> > >> elements[ii] = data['element'] >> >>> > >> >> >>> > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> >>> > >> for jj in xrange(ii+1, N_elements): >> >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >>> > >> >> >>> > >> *Large Set*: >> >>> > >> >> >>> > >> >> >>> > >> with tb.openFile(h5_file, 'r') as f: >> >>> > >> data = f.root.data >> >>> > >> >> >>> > >> N_elements = len(data) >> >>> > >> >> >>> > >> D = np.empty((N_irises, N_irises)) >> >>> > >> for ii in xrange(N_elements): >> >>> > >> for jj in xrange(ii+1, N_elements): >> >>> > >> D[ii, jj] = compare(data['element'][ii], >> >>> > data['element'][jj]) >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> >>> CSS, >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >>> > >> _______________________________________________ >> >>> > >> Pytables-users mailing list >> >>> > >> Pyt...@li... >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >> >>> > >> >> >>> > > >> >>> > > >> >>> > > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >>> current >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >>> > > _______________________________________________ >> >>> > > Pytables-users mailing list >> >>> > > Pyt...@li... >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > > >> >>> > > >> >>> > -------------- next part -------------- >> >>> > An HTML attachment was scrubbed... >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > >> >>> > >> >>> >> ------------------------------------------------------------------------------ >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> CSS, >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >>> > http://p.sf.net/sfu/learnmore_122712 >> >>> > >> >>> > ------------------------------ >> >>> > >> >>> > _______________________________________________ >> >>> > Pytables-users mailing list >> >>> > Pyt...@li... >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> > >> >>> > >> >>> > End of Pytables-users Digest, Vol 80, Issue 3 >> >>> > ********************************************* >> >>> > >> >>> -------------- next part -------------- >> >>> An HTML attachment was scrubbed... >> >>> >> >>> ------------------------------ >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >>> MVPs and experts. ON SALE this month only -- learn more at: >> >>> http://p.sf.net/sfu/learnmore_122712 >> >>> >> >>> ------------------------------ >> >>> >> >>> _______________________________________________ >> >>> Pytables-users mailing list >> >>> Pyt...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> >> >>> >> >>> End of Pytables-users Digest, Vol 80, Issue 4 >> >>> ********************************************* >> >>> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> http://p.sf.net/sfu/learnmore_122712 >> >> _______________________________________________ >> >> Pytables-users mailing list >> >> Pyt...@li... >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 80, Issue 8 >> ********************************************* >> > > > > ------------------------------------------------------------------------------ > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > much more. Get web development skills now with LearnDevNow - > 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. > SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122812 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: David R. <dav...@gm...> - 2013-01-04 13:56:58
|
I can't thank you guys enough for the help. I was able to add the __iter__ function to the table.py file and everything seems to be working great! I'm not quite as fast as I was with iterating right of a matrix but pretty close. I was at 555 comparisons per second, and now im at 420. I handled the problem I mentioned earlier by doing this, and it seems to work great: A = f.root.data.cols.A B = f.root.data.cols.B D = np.empty((len(A), len(A)) for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))), 2): D[ii, jj] = compare(a1, a2, b1, b2) Again, thanks a lot. -Dave On Thu, Jan 3, 2013 at 6:31 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz) > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 3 Jan 2013 17:26:55 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > Gz...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> wrote: > > > Thanks a lot for the help so far guys! > > > > Looking at itertools, I found what I believe to be the perfect function > > for what I need, itertools.combinations. This appears to be a valid > > replacement to the method proposed. > > > > Yes, combinations is awesome! > > > > > > There is a small problem that I didn't mention is that my compare > function > > actually takes as inputs 2 columns from the table. Like so: > > > > D = np.empty((N_irises, N_irises)) > > for ii in xrange(N_elements): > > for jj in xrange(ii+1, N_elements): > > D[ii, jj] = compare(data['element1'][ii], > data['element1'][jj],data['element2'][ii], > > data['element2'][jj]) > > > > Is there an efficient way of using itertools with this structure? > > > > You can always make two other iterators for each column. Since you have > two columns you would have 4 iterators. I am not sure how fast this is > going to be but I am confident that there is definitely a way to do this in > one for-loop, which is going to be way faster than nested loops. > > Be Well > Anthony > > > > > > > > On Thu, Jan 3, 2013 at 1:29 PM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> From: Josh Ayers <jos...@gm...> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> < > >> CAC...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> David, > >> > >> The change in issue 27 was only for iteration over a tables.Column > >> instance. To use it, tweak Anthony's code as follows. This will > iterate > >> over the "element" column, as in your original example. > >> > >> Note also that this will only work with the development version of > >> PyTables > >> available on github. It will be very slow using the released v2.4.0. > >> > >> > >> from itertools import izip > >> > >> with tb.openFile(...) as f: > >> data = f.root.data.cols.element > >> data_i = iter(data) > >> data_j = iter(data) > >> data_i.next() # throw the first value away > >> for i, j in izip(data_i, data_j): > >> compare(i, j) > >> > >> > >> Hope that helps, > >> Josh > >> > >> > >> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> > >> wrote: > >> > >> > HI David, > >> > > >> > Tables and table column iteration have been overhauled fairly recently > >> > [1]. So you might try creating two iterators, offset by one, and then > >> > doing the comparison. I am hacking this out super quick so please > >> forgive > >> > me: > >> > > >> > from itertools import izip > >> > > >> > with tb.openFile(...) as f: > >> > data = f.root.data > >> > data_i = iter(data) > >> > data_j = iter(data) > >> > data_i.next() # throw the first value away > >> > for i, j in izip(data_i, data_j): > >> > compare(i, j) > >> > > >> > You get the idea ;) > >> > > >> > Be Well > >> > Anthony > >> > > >> > 1. https://github.com/PyTables/PyTables/issues/27 > >> > > >> > > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > >> wrote: > >> > > >> >> I was hoping someone could help me out here. > >> >> > >> >> This is from a post I put up on StackOverflow, > >> >> > >> >> I am have a fairly large dataset that I store in HDF5 and access > using > >> >> PyTables. One operation I need to do on this dataset are pairwise > >> >> comparisons between each of the elements. This requires 2 loops, one > to > >> >> iterate over each element, and an inner loop to iterate over every > >> other > >> >> element. This operation thus looks at N(N-1)/2 comparisons. > >> >> > >> >> For fairly small sets I found it to be faster to dump the contents > >> into a > >> >> multdimensional numpy array and then do my iteration. I run into > >> problems > >> >> with large sets because of memory issues and need to access each > >> element of > >> >> the dataset at run time. > >> >> > >> >> Putting the elements into an array gives me about 600 comparisons per > >> >> second, while operating on hdf5 data itself gives me about 300 > >> comparisons > >> >> per second. > >> >> > >> >> Is there a way to speed this process up? > >> >> > >> >> Example follows (this is not my real code, just an example): > >> >> > >> >> *Small Set*: > >> >> > >> >> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> data = f.root.data > >> >> > >> >> N_elements = len(data) > >> >> elements = np.empty((N_irises, 1e5)) > >> >> > >> >> for ii, d in enumerate(data): > >> >> elements[ii] = data['element'] > >> >> > >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > >> >> for jj in xrange(ii+1, N_elements): > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) > >> >> > >> >> *Large Set*: > >> >> > >> >> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> data = f.root.data > >> >> > >> >> N_elements = len(data) > >> >> > >> >> D = np.empty((N_irises, N_irises)) > >> >> for ii in xrange(N_elements): > >> >> for jj in xrange(ii+1, N_elements): > >> >> D[ii, jj] = compare(data['element'][ii], > >> data['element'][jj]) > >> >> > >> >> > >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> >> MVPs and experts. ON SALE this month only -- learn more at: > >> >> http://p.sf.net/sfu/learnmore_122712 > >> >> _______________________________________________ > >> >> Pytables-users mailing list > >> >> Pyt...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> > >> > > >> > > >> > > >> > ------------------------------------------------------------------------------ > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> > MVPs and experts. ON SALE this month only -- learn more at: > >> > http://p.sf.net/sfu/learnmore_122712 > >> > _______________________________________________ > >> > Pytables-users mailing list > >> > Pyt...@li... > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > > >> > > >> -------------- next part -------------- > >> An HTML attachment was scrubbed... > >> > >> ------------------------------ > >> > >> > >> > ------------------------------------------------------------------------------ > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> MVPs and experts. ON SALE this month only -- learn more at: > >> http://p.sf.net/sfu/learnmore_122712 > >> > >> ------------------------------ > >> > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> ********************************************* > >> > > > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Thu, 3 Jan 2013 17:30:59 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Josh is right that you can just edit the code by hand (which works but > sucks). > > However, on Windows -- on the rare occasion when I also have to develop on > it -- I typically use a distribution that includes a compiler, cython, > hdf5, and pytables already and then I install my development version from > github OVER this. I recommend either EPD or Anaconda, though other > distributions listed here [1] might also work. > > Be well > Anthony > > 1. http://numfocus.org/projects-2/software-distributions/ > > > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote: > > > The change was in pure Python code, so you should be able to just paste > in > > the changes to your local copy. Start with the table.Column.__iter__ > > method (lines 3296-3310) here. > > > > > > > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > > > > It needs to be modified slightly because it uses some additional features > > that aren't available in the released version (the out=buf_slice argument > > to table.read). The following should work. > > > > def __iter__(self): > > table = self.table > > itemsize = self.dtype.itemsize > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize > > max_row = len(self) > > for start_row in xrange(0, len(self), nrowsinbuf): > > end_row = min([start_row + nrowsinbuf, max_row]) > > buf = table.read(start_row, end_row, 1, field=self.pathname) > > for row in buf: > > yield row > > > > > > I haven't tested this, but I think it will work. > > > > Josh > > > > > > > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> > wrote: > > > >> I apologize if I'm starting to sound helpless, but I'm forced to work on > >> Windows 7 at work and have never had luck compiling python source > >> successfully. I have had to rely on precompiled binaries and now its > >> biting me in the butt. > >> > >> Is there any quick fix I can do to improve this iteration using v2.4.0? > >> > >> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > >> pyt...@li...> wrote: > >> > >>> Send Pytables-users mailing list submissions to > >>> pyt...@li... > >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> or, via email, send a message with subject or body 'help' to > >>> pyt...@li... > >>> > >>> You can reach the person managing the list at > >>> pyt...@li... > >>> > >>> When replying, please edit your Subject line so it is more specific > >>> than "Re: Contents of Pytables-users digest..." > >>> > >>> > >>> Today's Topics: > >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) > >>> > >>> > >>> ---------------------------------------------------------------------- > >>> > >>> Message: 1 > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > >>> From: David Reed <dav...@gm...> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 > >>> To: pyt...@li... > >>> Message-ID: > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > >>> ev...@ma...> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> Thanks Anthony, but unless Im missing something I don't think that > method > >>> will work since this will only be comparing the ith element with ith+1 > >>> element. I still need 2 for loops right? > >>> > >>> Using itertools might speed things up though, I've never used them so I > >>> will give it a shot and let you know how it goes. Looks like I need to > >>> download the latest release before I do that too. Thanks for the help. > >>> > >>> -Dave > >>> > >>> > >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > >>> pyt...@li...> wrote: > >>> > >>> > Send Pytables-users mailing list submissions to > >>> > pyt...@li... > >>> > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > or, via email, send a message with subject or body 'help' to > >>> > pyt...@li... > >>> > > >>> > You can reach the person managing the list at > >>> > pyt...@li... > >>> > > >>> > When replying, please edit your Subject line so it is more specific > >>> > than "Re: Contents of Pytables-users digest..." > >>> > > >>> > > >>> > Today's Topics: > >>> > > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) > >>> > > >>> > > >>> > > ---------------------------------------------------------------------- > >>> > > >>> > Message: 1 > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > >>> > From: Anthony Scopatz <sc...@gm...> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > >>> > To: Discussion list for PyTables > >>> > <pyt...@li...> > >>> > Message-ID: > >>> > <CAPk-6T5b= > >>> > 1EG...@ma...> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > > >>> > HI David, > >>> > > >>> > Tables and table column iteration have been overhauled fairly > recently > >>> [1]. > >>> > So you might try creating two iterators, offset by one, and then > >>> doing the > >>> > comparison. I am hacking this out super quick so please forgive me: > >>> > > >>> > from itertools import izip > >>> > > >>> > with tb.openFile(...) as f: > >>> > data = f.root.data > >>> > data_i = iter(data) > >>> > data_j = iter(data) > >>> > data_i.next() # throw the first value away > >>> > for i, j in izip(data_i, data_j): > >>> > compare(i, j) > >>> > > >>> > You get the idea ;) > >>> > > >>> > Be Well > >>> > Anthony > >>> > > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > >>> > > >>> > > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > >>> wrote: > >>> > > >>> > > I was hoping someone could help me out here. > >>> > > > >>> > > This is from a post I put up on StackOverflow, > >>> > > > >>> > > I am have a fairly large dataset that I store in HDF5 and access > >>> using > >>> > > PyTables. One operation I need to do on this dataset are pairwise > >>> > > comparisons between each of the elements. This requires 2 loops, > one > >>> to > >>> > > iterate over each element, and an inner loop to iterate over every > >>> other > >>> > > element. This operation thus looks at N(N-1)/2 comparisons. > >>> > > > >>> > > For fairly small sets I found it to be faster to dump the contents > >>> into a > >>> > > multdimensional numpy array and then do my iteration. I run into > >>> problems > >>> > > with large sets because of memory issues and need to access each > >>> element > >>> > of > >>> > > the dataset at run time. > >>> > > > >>> > > Putting the elements into an array gives me about 600 comparisons > per > >>> > > second, while operating on hdf5 data itself gives me about 300 > >>> > comparisons > >>> > > per second. > >>> > > > >>> > > Is there a way to speed this process up? > >>> > > > >>> > > Example follows (this is not my real code, just an example): > >>> > > > >>> > > *Small Set*: > >>> > > > >>> > > > >>> > > with tb.openFile(h5_file, 'r') as f: > >>> > > data = f.root.data > >>> > > > >>> > > N_elements = len(data) > >>> > > elements = np.empty((N_irises, 1e5)) > >>> > > > >>> > > for ii, d in enumerate(data): > >>> > > elements[ii] = data['element'] > >>> > > > >>> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > >>> > > for jj in xrange(ii+1, N_elements): > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) > >>> > > > >>> > > *Large Set*: > >>> > > > >>> > > > >>> > > with tb.openFile(h5_file, 'r') as f: > >>> > > data = f.root.data > >>> > > > >>> > > N_elements = len(data) > >>> > > > >>> > > D = np.empty((N_irises, N_irises)) > >>> > > for ii in xrange(N_elements): > >>> > > for jj in xrange(ii+1, N_elements): > >>> > > D[ii, jj] = compare(data['element'][ii], > >>> > data['element'][jj]) > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >>> > > http://p.sf.net/sfu/learnmore_122712 > >>> > > _______________________________________________ > >>> > > Pytables-users mailing list > >>> > > Pyt...@li... > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > > >>> > > > >>> > -------------- next part -------------- > >>> > An HTML attachment was scrubbed... > >>> > > >>> > ------------------------------ > >>> > > >>> > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >>> > http://p.sf.net/sfu/learnmore_122712 > >>> > > >>> > ------------------------------ > >>> > > >>> > _______________________________________________ > >>> > Pytables-users mailing list > >>> > Pyt...@li... > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > >>> > > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > >>> > ********************************************* > >>> > > >>> -------------- next part -------------- > >>> An HTML attachment was scrubbed... > >>> > >>> ------------------------------ > >>> > >>> Message: 2 > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 > >>> From: David Reed <dav...@gm...> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > >>> To: pyt...@li... > >>> Message-ID: > >>> < > >>> CAM...@ma...> > >>> Content-Type: text/plain; charset="iso-8859-1" > >>> > >>> Thanks a lot for the help so far guys! > >>> > >>> Looking at itertools, I found what I believe to be the perfect function > >>> for > >>> what I need, itertools.combinations. This appears to be a valid > >>> replacement > >>> to the method proposed. > >>> > >>> There is a small problem that I didn't mention is that my compare > >>> function > >>> actually takes as inputs 2 columns from the table. Like so: > >>> > >>> D = np.empty((N_irises, N_irises)) > >>> for ii in xrange(N_elements): > >>> for jj in xrange(ii+1, N_elements): > >>> D[ii, jj] = compare(data['element1'][ii], > >>> data['element1'][jj],data['element2'][ii], > >>> data['element2'][jj]) > >>> > >>> Is there an efficient way of using itertools with this structure? > >>> > >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < > >>> pyt...@li...> wrote: > >>> > >>> > Send Pytables-users mailing list submissions to > >>> > pyt...@li... > >>> > > >>> > To subscribe or unsubscribe via the World Wide Web, visit > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > or, via email, send a message with subject or body 'help' to > >>> > pyt...@li... > >>> > > >>> > You can reach the person managing the list at > >>> > pyt...@li... > >>> > > >>> > When replying, please edit your Subject line so it is more specific > >>> > than "Re: Contents of Pytables-users digest..." > >>> > > >>> > > >>> > Today's Topics: > >>> > > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > >>> > > >>> > > >>> > > ---------------------------------------------------------------------- > >>> > > >>> > Message: 1 > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 > >>> > From: Josh Ayers <jos...@gm...> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > >>> > To: Discussion list for PyTables > >>> > <pyt...@li...> > >>> > Message-ID: > >>> > < > >>> > CAC...@ma...> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > > >>> > David, > >>> > > >>> > The change in issue 27 was only for iteration over a tables.Column > >>> > instance. To use it, tweak Anthony's code as follows. This will > >>> iterate > >>> > over the "element" column, as in your original example. > >>> > > >>> > Note also that this will only work with the development version of > >>> PyTables > >>> > available on github. It will be very slow using the released v2.4.0. > >>> > > >>> > > >>> > from itertools import izip > >>> > > >>> > with tb.openFile(...) as f: > >>> > data = f.root.data.cols.element > >>> > data_i = iter(data) > >>> > data_j = iter(data) > >>> > data_i.next() # throw the first value away > >>> > for i, j in izip(data_i, data_j): > >>> > compare(i, j) > >>> > > >>> > > >>> > Hope that helps, > >>> > Josh > >>> > > >>> > > >>> > > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> > >>> wrote: > >>> > > >>> > > HI David, > >>> > > > >>> > > Tables and table column iteration have been overhauled fairly > >>> recently > >>> > > [1]. So you might try creating two iterators, offset by one, and > >>> then > >>> > > doing the comparison. I am hacking this out super quick so please > >>> > forgive > >>> > > me: > >>> > > > >>> > > from itertools import izip > >>> > > > >>> > > with tb.openFile(...) as f: > >>> > > data = f.root.data > >>> > > data_i = iter(data) > >>> > > data_j = iter(data) > >>> > > data_i.next() # throw the first value away > >>> > > for i, j in izip(data_i, data_j): > >>> > > compare(i, j) > >>> > > > >>> > > You get the idea ;) > >>> > > > >>> > > Be Well > >>> > > Anthony > >>> > > > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 > >>> > > > >>> > > > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm... > > > >>> > wrote: > >>> > > > >>> > >> I was hoping someone could help me out here. > >>> > >> > >>> > >> This is from a post I put up on StackOverflow, > >>> > >> > >>> > >> I am have a fairly large dataset that I store in HDF5 and access > >>> using > >>> > >> PyTables. One operation I need to do on this dataset are pairwise > >>> > >> comparisons between each of the elements. This requires 2 loops, > >>> one to > >>> > >> iterate over each element, and an inner loop to iterate over every > >>> other > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. > >>> > >> > >>> > >> For fairly small sets I found it to be faster to dump the contents > >>> into > >>> > a > >>> > >> multdimensional numpy array and then do my iteration. I run into > >>> > problems > >>> > >> with large sets because of memory issues and need to access each > >>> > element of > >>> > >> the dataset at run time. > >>> > >> > >>> > >> Putting the elements into an array gives me about 600 comparisons > >>> per > >>> > >> second, while operating on hdf5 data itself gives me about 300 > >>> > comparisons > >>> > >> per second. > >>> > >> > >>> > >> Is there a way to speed this process up? > >>> > >> > >>> > >> Example follows (this is not my real code, just an example): > >>> > >> > >>> > >> *Small Set*: > >>> > >> > >>> > >> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >>> > >> data = f.root.data > >>> > >> > >>> > >> N_elements = len(data) > >>> > >> elements = np.empty((N_irises, 1e5)) > >>> > >> > >>> > >> for ii, d in enumerate(data): > >>> > >> elements[ii] = data['element'] > >>> > >> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > >>> > >> for jj in xrange(ii+1, N_elements): > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) > >>> > >> > >>> > >> *Large Set*: > >>> > >> > >>> > >> > >>> > >> with tb.openFile(h5_file, 'r') as f: > >>> > >> data = f.root.data > >>> > >> > >>> > >> N_elements = len(data) > >>> > >> > >>> > >> D = np.empty((N_irises, N_irises)) > >>> > >> for ii in xrange(N_elements): > >>> > >> for jj in xrange(ii+1, N_elements): > >>> > >> D[ii, jj] = compare(data['element'][ii], > >>> > data['element'][jj]) > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >>> CSS, > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > >> MVPs and experts. ON SALE this month only -- learn more at: > >>> > >> http://p.sf.net/sfu/learnmore_122712 > >>> > >> _______________________________________________ > >>> > >> Pytables-users mailing list > >>> > >> Pyt...@li... > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >> > >>> > >> > >>> > > > >>> > > > >>> > > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>> current > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > > MVPs and experts. ON SALE this month only -- learn more at: > >>> > > http://p.sf.net/sfu/learnmore_122712 > >>> > > _______________________________________________ > >>> > > Pytables-users mailing list > >>> > > Pyt...@li... > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > > >>> > > > >>> > -------------- next part -------------- > >>> > An HTML attachment was scrubbed... > >>> > > >>> > ------------------------------ > >>> > > >>> > > >>> > > >>> > ------------------------------------------------------------------------------ > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> > MVPs and experts. ON SALE this month only -- learn more at: > >>> > http://p.sf.net/sfu/learnmore_122712 > >>> > > >>> > ------------------------------ > >>> > > >>> > _______________________________________________ > >>> > Pytables-users mailing list > >>> > Pyt...@li... > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > > >>> > > >>> > End of Pytables-users Digest, Vol 80, Issue 3 > >>> > ********************************************* > >>> > > >>> -------------- next part -------------- > >>> An HTML attachment was scrubbed... > >>> > >>> ------------------------------ > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>> MVPs and experts. ON SALE this month only -- learn more at: > >>> http://p.sf.net/sfu/learnmore_122712 > >>> > >>> ------------------------------ > >>> > >>> _______________________________________________ > >>> Pytables-users mailing list > >>> Pyt...@li... > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 > >>> ********************************************* > >>> > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> MVPs and experts. ON SALE this month only -- learn more at: > >> http://p.sf.net/sfu/learnmore_122712 > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 80, Issue 8 > ********************************************* > |
From: Anthony S. <sc...@gm...> - 2013-01-03 23:31:29
|
Josh is right that you can just edit the code by hand (which works but sucks). However, on Windows -- on the rare occasion when I also have to develop on it -- I typically use a distribution that includes a compiler, cython, hdf5, and pytables already and then I install my development version from github OVER this. I recommend either EPD or Anaconda, though other distributions listed here [1] might also work. Be well Anthony 1. http://numfocus.org/projects-2/software-distributions/ On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote: > The change was in pure Python code, so you should be able to just paste in > the changes to your local copy. Start with the table.Column.__iter__ > method (lines 3296-3310) here. > > > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > > It needs to be modified slightly because it uses some additional features > that aren't available in the released version (the out=buf_slice argument > to table.read). The following should work. > > def __iter__(self): > table = self.table > itemsize = self.dtype.itemsize > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize > max_row = len(self) > for start_row in xrange(0, len(self), nrowsinbuf): > end_row = min([start_row + nrowsinbuf, max_row]) > buf = table.read(start_row, end_row, 1, field=self.pathname) > for row in buf: > yield row > > > I haven't tested this, but I think it will work. > > Josh > > > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> wrote: > >> I apologize if I'm starting to sound helpless, but I'm forced to work on >> Windows 7 at work and have never had luck compiling python source >> successfully. I have had to rely on precompiled binaries and now its >> biting me in the butt. >> >> Is there any quick fix I can do to improve this iteration using v2.4.0? >> >> >> On Thu, Jan 3, 2013 at 3:17 PM, < >> pyt...@li...> wrote: >> >>> Send Pytables-users mailing list submissions to >>> pyt...@li... >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> or, via email, send a message with subject or body 'help' to >>> pyt...@li... >>> >>> You can reach the person managing the list at >>> pyt...@li... >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of Pytables-users digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >>> From: David Reed <dav...@gm...> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 >>> To: pyt...@li... >>> Message-ID: >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >>> ev...@ma...> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> Thanks Anthony, but unless Im missing something I don't think that method >>> will work since this will only be comparing the ith element with ith+1 >>> element. I still need 2 for loops right? >>> >>> Using itertools might speed things up though, I've never used them so I >>> will give it a shot and let you know how it goes. Looks like I need to >>> download the latest release before I do that too. Thanks for the help. >>> >>> -Dave >>> >>> >>> >>> On Thu, Jan 3, 2013 at 12:12 PM, < >>> pyt...@li...> wrote: >>> >>> > Send Pytables-users mailing list submissions to >>> > pyt...@li... >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > or, via email, send a message with subject or body 'help' to >>> > pyt...@li... >>> > >>> > You can reach the person managing the list at >>> > pyt...@li... >>> > >>> > When replying, please edit your Subject line so it is more specific >>> > than "Re: Contents of Pytables-users digest..." >>> > >>> > >>> > Today's Topics: >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) >>> > >>> > >>> > ---------------------------------------------------------------------- >>> > >>> > Message: 1 >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >>> > From: Anthony Scopatz <sc...@gm...> >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >>> > To: Discussion list for PyTables >>> > <pyt...@li...> >>> > Message-ID: >>> > <CAPk-6T5b= >>> > 1EG...@ma...> >>> > Content-Type: text/plain; charset="iso-8859-1" >>> > >>> > HI David, >>> > >>> > Tables and table column iteration have been overhauled fairly recently >>> [1]. >>> > So you might try creating two iterators, offset by one, and then >>> doing the >>> > comparison. I am hacking this out super quick so please forgive me: >>> > >>> > from itertools import izip >>> > >>> > with tb.openFile(...) as f: >>> > data = f.root.data >>> > data_i = iter(data) >>> > data_j = iter(data) >>> > data_i.next() # throw the first value away >>> > for i, j in izip(data_i, data_j): >>> > compare(i, j) >>> > >>> > You get the idea ;) >>> > >>> > Be Well >>> > Anthony >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >>> > >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >>> wrote: >>> > >>> > > I was hoping someone could help me out here. >>> > > >>> > > This is from a post I put up on StackOverflow, >>> > > >>> > > I am have a fairly large dataset that I store in HDF5 and access >>> using >>> > > PyTables. One operation I need to do on this dataset are pairwise >>> > > comparisons between each of the elements. This requires 2 loops, one >>> to >>> > > iterate over each element, and an inner loop to iterate over every >>> other >>> > > element. This operation thus looks at N(N-1)/2 comparisons. >>> > > >>> > > For fairly small sets I found it to be faster to dump the contents >>> into a >>> > > multdimensional numpy array and then do my iteration. I run into >>> problems >>> > > with large sets because of memory issues and need to access each >>> element >>> > of >>> > > the dataset at run time. >>> > > >>> > > Putting the elements into an array gives me about 600 comparisons per >>> > > second, while operating on hdf5 data itself gives me about 300 >>> > comparisons >>> > > per second. >>> > > >>> > > Is there a way to speed this process up? >>> > > >>> > > Example follows (this is not my real code, just an example): >>> > > >>> > > *Small Set*: >>> > > >>> > > >>> > > with tb.openFile(h5_file, 'r') as f: >>> > > data = f.root.data >>> > > >>> > > N_elements = len(data) >>> > > elements = np.empty((N_irises, 1e5)) >>> > > >>> > > for ii, d in enumerate(data): >>> > > elements[ii] = data['element'] >>> > > >>> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >>> > > for jj in xrange(ii+1, N_elements): >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >>> > > >>> > > *Large Set*: >>> > > >>> > > >>> > > with tb.openFile(h5_file, 'r') as f: >>> > > data = f.root.data >>> > > >>> > > N_elements = len(data) >>> > > >>> > > D = np.empty((N_irises, N_irises)) >>> > > for ii in xrange(N_elements): >>> > > for jj in xrange(ii+1, N_elements): >>> > > D[ii, jj] = compare(data['element'][ii], >>> > data['element'][jj]) >>> > > >>> > > >>> > > >>> > > >>> > >>> ------------------------------------------------------------------------------ >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> current >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > > MVPs and experts. ON SALE this month only -- learn more at: >>> > > http://p.sf.net/sfu/learnmore_122712 >>> > > _______________________________________________ >>> > > Pytables-users mailing list >>> > > Pyt...@li... >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > > >>> > > >>> > -------------- next part -------------- >>> > An HTML attachment was scrubbed... >>> > >>> > ------------------------------ >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > MVPs and experts. ON SALE this month only -- learn more at: >>> > http://p.sf.net/sfu/learnmore_122712 >>> > >>> > ------------------------------ >>> > >>> > _______________________________________________ >>> > Pytables-users mailing list >>> > Pyt...@li... >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >>> > ********************************************* >>> > >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> >>> ------------------------------ >>> >>> Message: 2 >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >>> From: David Reed <dav...@gm...> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >>> To: pyt...@li... >>> Message-ID: >>> < >>> CAM...@ma...> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> Thanks a lot for the help so far guys! >>> >>> Looking at itertools, I found what I believe to be the perfect function >>> for >>> what I need, itertools.combinations. This appears to be a valid >>> replacement >>> to the method proposed. >>> >>> There is a small problem that I didn't mention is that my compare >>> function >>> actually takes as inputs 2 columns from the table. Like so: >>> >>> D = np.empty((N_irises, N_irises)) >>> for ii in xrange(N_elements): >>> for jj in xrange(ii+1, N_elements): >>> D[ii, jj] = compare(data['element1'][ii], >>> data['element1'][jj],data['element2'][ii], >>> data['element2'][jj]) >>> >>> Is there an efficient way of using itertools with this structure? >>> >>> >>> On Thu, Jan 3, 2013 at 1:29 PM, < >>> pyt...@li...> wrote: >>> >>> > Send Pytables-users mailing list submissions to >>> > pyt...@li... >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > or, via email, send a message with subject or body 'help' to >>> > pyt...@li... >>> > >>> > You can reach the person managing the list at >>> > pyt...@li... >>> > >>> > When replying, please edit your Subject line so it is more specific >>> > than "Re: Contents of Pytables-users digest..." >>> > >>> > >>> > Today's Topics: >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >>> > >>> > >>> > ---------------------------------------------------------------------- >>> > >>> > Message: 1 >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >>> > From: Josh Ayers <jos...@gm...> >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >>> > To: Discussion list for PyTables >>> > <pyt...@li...> >>> > Message-ID: >>> > < >>> > CAC...@ma...> >>> > Content-Type: text/plain; charset="iso-8859-1" >>> > >>> > David, >>> > >>> > The change in issue 27 was only for iteration over a tables.Column >>> > instance. To use it, tweak Anthony's code as follows. This will >>> iterate >>> > over the "element" column, as in your original example. >>> > >>> > Note also that this will only work with the development version of >>> PyTables >>> > available on github. It will be very slow using the released v2.4.0. >>> > >>> > >>> > from itertools import izip >>> > >>> > with tb.openFile(...) as f: >>> > data = f.root.data.cols.element >>> > data_i = iter(data) >>> > data_j = iter(data) >>> > data_i.next() # throw the first value away >>> > for i, j in izip(data_i, data_j): >>> > compare(i, j) >>> > >>> > >>> > Hope that helps, >>> > Josh >>> > >>> > >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >>> wrote: >>> > >>> > > HI David, >>> > > >>> > > Tables and table column iteration have been overhauled fairly >>> recently >>> > > [1]. So you might try creating two iterators, offset by one, and >>> then >>> > > doing the comparison. I am hacking this out super quick so please >>> > forgive >>> > > me: >>> > > >>> > > from itertools import izip >>> > > >>> > > with tb.openFile(...) as f: >>> > > data = f.root.data >>> > > data_i = iter(data) >>> > > data_j = iter(data) >>> > > data_i.next() # throw the first value away >>> > > for i, j in izip(data_i, data_j): >>> > > compare(i, j) >>> > > >>> > > You get the idea ;) >>> > > >>> > > Be Well >>> > > Anthony >>> > > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >>> > > >>> > > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >>> > wrote: >>> > > >>> > >> I was hoping someone could help me out here. >>> > >> >>> > >> This is from a post I put up on StackOverflow, >>> > >> >>> > >> I am have a fairly large dataset that I store in HDF5 and access >>> using >>> > >> PyTables. One operation I need to do on this dataset are pairwise >>> > >> comparisons between each of the elements. This requires 2 loops, >>> one to >>> > >> iterate over each element, and an inner loop to iterate over every >>> other >>> > >> element. This operation thus looks at N(N-1)/2 comparisons. >>> > >> >>> > >> For fairly small sets I found it to be faster to dump the contents >>> into >>> > a >>> > >> multdimensional numpy array and then do my iteration. I run into >>> > problems >>> > >> with large sets because of memory issues and need to access each >>> > element of >>> > >> the dataset at run time. >>> > >> >>> > >> Putting the elements into an array gives me about 600 comparisons >>> per >>> > >> second, while operating on hdf5 data itself gives me about 300 >>> > comparisons >>> > >> per second. >>> > >> >>> > >> Is there a way to speed this process up? >>> > >> >>> > >> Example follows (this is not my real code, just an example): >>> > >> >>> > >> *Small Set*: >>> > >> >>> > >> >>> > >> with tb.openFile(h5_file, 'r') as f: >>> > >> data = f.root.data >>> > >> >>> > >> N_elements = len(data) >>> > >> elements = np.empty((N_irises, 1e5)) >>> > >> >>> > >> for ii, d in enumerate(data): >>> > >> elements[ii] = data['element'] >>> > >> >>> > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >>> > >> for jj in xrange(ii+1, N_elements): >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >>> > >> >>> > >> *Large Set*: >>> > >> >>> > >> >>> > >> with tb.openFile(h5_file, 'r') as f: >>> > >> data = f.root.data >>> > >> >>> > >> N_elements = len(data) >>> > >> >>> > >> D = np.empty((N_irises, N_irises)) >>> > >> for ii in xrange(N_elements): >>> > >> for jj in xrange(ii+1, N_elements): >>> > >> D[ii, jj] = compare(data['element'][ii], >>> > data['element'][jj]) >>> > >> >>> > >> >>> > >> >>> > >> >>> > >>> ------------------------------------------------------------------------------ >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>> CSS, >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> current >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >>> > >> http://p.sf.net/sfu/learnmore_122712 >>> > >> _______________________________________________ >>> > >> Pytables-users mailing list >>> > >> Pyt...@li... >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >> >>> > >> >>> > > >>> > > >>> > > >>> > >>> ------------------------------------------------------------------------------ >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>> current >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > > MVPs and experts. ON SALE this month only -- learn more at: >>> > > http://p.sf.net/sfu/learnmore_122712 >>> > > _______________________________________________ >>> > > Pytables-users mailing list >>> > > Pyt...@li... >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > > >>> > > >>> > -------------- next part -------------- >>> > An HTML attachment was scrubbed... >>> > >>> > ------------------------------ >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> > MVPs and experts. ON SALE this month only -- learn more at: >>> > http://p.sf.net/sfu/learnmore_122712 >>> > >>> > ------------------------------ >>> > >>> > _______________________________________________ >>> > Pytables-users mailing list >>> > Pyt...@li... >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 >>> > ********************************************* >>> > >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> >>> ------------------------------ >>> >>> >>> ------------------------------------------------------------------------------ >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> MVPs and experts. ON SALE this month only -- learn more at: >>> http://p.sf.net/sfu/learnmore_122712 >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >>> End of Pytables-users Digest, Vol 80, Issue 4 >>> ********************************************* >>> >> >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2013-01-03 23:27:23
|
On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> wrote: > Thanks a lot for the help so far guys! > > Looking at itertools, I found what I believe to be the perfect function > for what I need, itertools.combinations. This appears to be a valid > replacement to the method proposed. > Yes, combinations is awesome! > > There is a small problem that I didn't mention is that my compare function > actually takes as inputs 2 columns from the table. Like so: > > D = np.empty((N_irises, N_irises)) > for ii in xrange(N_elements): > for jj in xrange(ii+1, N_elements): > D[ii, jj] = compare(data['element1'][ii], data['element1'][jj],data['element2'][ii], > data['element2'][jj]) > > Is there an efficient way of using itertools with this structure? > You can always make two other iterators for each column. Since you have two columns you would have 4 iterators. I am not sure how fast this is going to be but I am confident that there is definitely a way to do this in one for-loop, which is going to be way faster than nested loops. Be Well Anthony > > > On Thu, Jan 3, 2013 at 1:29 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> From: Josh Ayers <jos...@gm...> >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAC...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> David, >> >> The change in issue 27 was only for iteration over a tables.Column >> instance. To use it, tweak Anthony's code as follows. This will iterate >> over the "element" column, as in your original example. >> >> Note also that this will only work with the development version of >> PyTables >> available on github. It will be very slow using the released v2.4.0. >> >> >> from itertools import izip >> >> with tb.openFile(...) as f: >> data = f.root.data.cols.element >> data_i = iter(data) >> data_j = iter(data) >> data_i.next() # throw the first value away >> for i, j in izip(data_i, data_j): >> compare(i, j) >> >> >> Hope that helps, >> Josh >> >> >> >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> wrote: >> >> > HI David, >> > >> > Tables and table column iteration have been overhauled fairly recently >> > [1]. So you might try creating two iterators, offset by one, and then >> > doing the comparison. I am hacking this out super quick so please >> forgive >> > me: >> > >> > from itertools import izip >> > >> > with tb.openFile(...) as f: >> > data = f.root.data >> > data_i = iter(data) >> > data_j = iter(data) >> > data_i.next() # throw the first value away >> > for i, j in izip(data_i, data_j): >> > compare(i, j) >> > >> > You get the idea ;) >> > >> > Be Well >> > Anthony >> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> wrote: >> > >> >> I was hoping someone could help me out here. >> >> >> >> This is from a post I put up on StackOverflow, >> >> >> >> I am have a fairly large dataset that I store in HDF5 and access using >> >> PyTables. One operation I need to do on this dataset are pairwise >> >> comparisons between each of the elements. This requires 2 loops, one to >> >> iterate over each element, and an inner loop to iterate over every >> other >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> >> >> >> For fairly small sets I found it to be faster to dump the contents >> into a >> >> multdimensional numpy array and then do my iteration. I run into >> problems >> >> with large sets because of memory issues and need to access each >> element of >> >> the dataset at run time. >> >> >> >> Putting the elements into an array gives me about 600 comparisons per >> >> second, while operating on hdf5 data itself gives me about 300 >> comparisons >> >> per second. >> >> >> >> Is there a way to speed this process up? >> >> >> >> Example follows (this is not my real code, just an example): >> >> >> >> *Small Set*: >> >> >> >> >> >> with tb.openFile(h5_file, 'r') as f: >> >> data = f.root.data >> >> >> >> N_elements = len(data) >> >> elements = np.empty((N_irises, 1e5)) >> >> >> >> for ii, d in enumerate(data): >> >> elements[ii] = data['element'] >> >> >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> >> for jj in xrange(ii+1, N_elements): >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >> >> *Large Set*: >> >> >> >> >> >> with tb.openFile(h5_file, 'r') as f: >> >> data = f.root.data >> >> >> >> N_elements = len(data) >> >> >> >> D = np.empty((N_irises, N_irises)) >> >> for ii in xrange(N_elements): >> >> for jj in xrange(ii+1, N_elements): >> >> D[ii, jj] = compare(data['element'][ii], >> data['element'][jj]) >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> http://p.sf.net/sfu/learnmore_122712 >> >> _______________________________________________ >> >> Pytables-users mailing list >> >> Pyt...@li... >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 80, Issue 3 >> ********************************************* >> > > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2013-01-03 23:15:08
|
Yup, that is right, thanks Josh! On Thu, Jan 3, 2013 at 12:29 PM, Josh Ayers <jos...@gm...> wrote: > David, > > The change in issue 27 was only for iteration over a tables.Column > instance. To use it, tweak Anthony's code as follows. This will iterate > over the "element" column, as in your original example. > > Note also that this will only work with the development version of > PyTables available on github. It will be very slow using the released > v2.4.0. > > > from itertools import izip > > with tb.openFile(...) as f: > data = f.root.data.cols.element > data_i = iter(data) > data_j = iter(data) > data_i.next() # throw the first value away > for i, j in izip(data_i, data_j): > compare(i, j) > > > Hope that helps, > Josh > > > > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> wrote: > >> HI David, >> >> Tables and table column iteration have been overhauled fairly recently >> [1]. So you might try creating two iterators, offset by one, and then >> doing the comparison. I am hacking this out super quick so please forgive >> me: >> >> from itertools import izip >> >> with tb.openFile(...) as f: >> data = f.root.data >> data_i = iter(data) >> data_j = iter(data) >> data_i.next() # throw the first value away >> for i, j in izip(data_i, data_j): >> compare(i, j) >> >> You get the idea ;) >> >> Be Well >> Anthony >> >> 1. https://github.com/PyTables/PyTables/issues/27 >> >> >> On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>wrote: >> >>> I was hoping someone could help me out here. >>> >>> This is from a post I put up on StackOverflow, >>> >>> I am have a fairly large dataset that I store in HDF5 and access using >>> PyTables. One operation I need to do on this dataset are pairwise >>> comparisons between each of the elements. This requires 2 loops, one to >>> iterate over each element, and an inner loop to iterate over every other >>> element. This operation thus looks at N(N-1)/2 comparisons. >>> >>> For fairly small sets I found it to be faster to dump the contents into >>> a multdimensional numpy array and then do my iteration. I run into problems >>> with large sets because of memory issues and need to access each element of >>> the dataset at run time. >>> >>> Putting the elements into an array gives me about 600 comparisons per >>> second, while operating on hdf5 data itself gives me about 300 comparisons >>> per second. >>> >>> Is there a way to speed this process up? >>> >>> Example follows (this is not my real code, just an example): >>> >>> *Small Set*: >>> >>> >>> >>> with tb.openFile(h5_file, 'r') as f: >>> data = f.root.data >>> >>> N_elements = len(data) >>> elements = np.empty((N_irises, 1e5)) >>> >>> for ii, d in enumerate(data): >>> elements[ii] = data['element'] >>> >>> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >>> for jj in xrange(ii+1, N_elements): >>> D[ii, jj] = compare(elements[ii], elements[jj]) >>> >>> *Large Set*: >>> >>> >>> >>> with tb.openFile(h5_file, 'r') as f: >>> data = f.root.data >>> >>> N_elements = len(data) >>> >>> D = np.empty((N_irises, N_irises)) >>> for ii in xrange(N_elements): >>> for jj in xrange(ii+1, N_elements): >>> D[ii, jj] = compare(data['element'][ii], data['element'][jj]) >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> MVPs and experts. ON SALE this month only -- learn more at: >>> http://p.sf.net/sfu/learnmore_122712 >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Josh A. <jos...@gm...> - 2013-01-03 21:46:22
|
The change was in pure Python code, so you should be able to just paste in the changes to your local copy. Start with the table.Column.__iter__ method (lines 3296-3310) here. https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py It needs to be modified slightly because it uses some additional features that aren't available in the released version (the out=buf_slice argument to table.read). The following should work. def __iter__(self): table = self.table itemsize = self.dtype.itemsize nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize max_row = len(self) for start_row in xrange(0, len(self), nrowsinbuf): end_row = min([start_row + nrowsinbuf, max_row]) buf = table.read(start_row, end_row, 1, field=self.pathname) for row in buf: yield row I haven't tested this, but I think it will work. Josh On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> wrote: > I apologize if I'm starting to sound helpless, but I'm forced to work on > Windows 7 at work and have never had luck compiling python source > successfully. I have had to rely on precompiled binaries and now its > biting me in the butt. > > Is there any quick fix I can do to improve this iteration using v2.4.0? > > > On Thu, Jan 3, 2013 at 3:17 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> From: David Reed <dav...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 >> To: pyt...@li... >> Message-ID: >> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> ev...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Thanks Anthony, but unless Im missing something I don't think that method >> will work since this will only be comparing the ith element with ith+1 >> element. I still need 2 for loops right? >> >> Using itertools might speed things up though, I've never used them so I >> will give it a shot and let you know how it goes. Looks like I need to >> download the latest release before I do that too. Thanks for the help. >> >> -Dave >> >> >> >> On Thu, Jan 3, 2013 at 12:12 PM, < >> pyt...@li...> wrote: >> >> > Send Pytables-users mailing list submissions to >> > pyt...@li... >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > or, via email, send a message with subject or body 'help' to >> > pyt...@li... >> > >> > You can reach the person managing the list at >> > pyt...@li... >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Pytables-users digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> > From: Anthony Scopatz <sc...@gm...> >> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > <CAPk-6T5b= >> > 1EG...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > HI David, >> > >> > Tables and table column iteration have been overhauled fairly recently >> [1]. >> > So you might try creating two iterators, offset by one, and then doing >> the >> > comparison. I am hacking this out super quick so please forgive me: >> > >> > from itertools import izip >> > >> > with tb.openFile(...) as f: >> > data = f.root.data >> > data_i = iter(data) >> > data_j = iter(data) >> > data_i.next() # throw the first value away >> > for i, j in izip(data_i, data_j): >> > compare(i, j) >> > >> > You get the idea ;) >> > >> > Be Well >> > Anthony >> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> wrote: >> > >> > > I was hoping someone could help me out here. >> > > >> > > This is from a post I put up on StackOverflow, >> > > >> > > I am have a fairly large dataset that I store in HDF5 and access using >> > > PyTables. One operation I need to do on this dataset are pairwise >> > > comparisons between each of the elements. This requires 2 loops, one >> to >> > > iterate over each element, and an inner loop to iterate over every >> other >> > > element. This operation thus looks at N(N-1)/2 comparisons. >> > > >> > > For fairly small sets I found it to be faster to dump the contents >> into a >> > > multdimensional numpy array and then do my iteration. I run into >> problems >> > > with large sets because of memory issues and need to access each >> element >> > of >> > > the dataset at run time. >> > > >> > > Putting the elements into an array gives me about 600 comparisons per >> > > second, while operating on hdf5 data itself gives me about 300 >> > comparisons >> > > per second. >> > > >> > > Is there a way to speed this process up? >> > > >> > > Example follows (this is not my real code, just an example): >> > > >> > > *Small Set*: >> > > >> > > >> > > with tb.openFile(h5_file, 'r') as f: >> > > data = f.root.data >> > > >> > > N_elements = len(data) >> > > elements = np.empty((N_irises, 1e5)) >> > > >> > > for ii, d in enumerate(data): >> > > elements[ii] = data['element'] >> > > >> > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> > > for jj in xrange(ii+1, N_elements): >> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> > > >> > > *Large Set*: >> > > >> > > >> > > with tb.openFile(h5_file, 'r') as f: >> > > data = f.root.data >> > > >> > > N_elements = len(data) >> > > >> > > D = np.empty((N_irises, N_irises)) >> > > for ii in xrange(N_elements): >> > > for jj in xrange(ii+1, N_elements): >> > > D[ii, jj] = compare(data['element'][ii], >> > data['element'][jj]) >> > > >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------------ >> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > > MVPs and experts. ON SALE this month only -- learn more at: >> > > http://p.sf.net/sfu/learnmore_122712 >> > > _______________________________________________ >> > > Pytables-users mailing list >> > > Pyt...@li... >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > >> > ------------------------------ >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > >> > ------------------------------ >> > >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > End of Pytables-users Digest, Vol 80, Issue 2 >> > ********************************************* >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 2 >> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> From: David Reed <dav...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 >> To: pyt...@li... >> Message-ID: >> < >> CAM...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Thanks a lot for the help so far guys! >> >> Looking at itertools, I found what I believe to be the perfect function >> for >> what I need, itertools.combinations. This appears to be a valid >> replacement >> to the method proposed. >> >> There is a small problem that I didn't mention is that my compare function >> actually takes as inputs 2 columns from the table. Like so: >> >> D = np.empty((N_irises, N_irises)) >> for ii in xrange(N_elements): >> for jj in xrange(ii+1, N_elements): >> D[ii, jj] = compare(data['element1'][ii], >> data['element1'][jj],data['element2'][ii], >> data['element2'][jj]) >> >> Is there an efficient way of using itertools with this structure? >> >> >> On Thu, Jan 3, 2013 at 1:29 PM, < >> pyt...@li...> wrote: >> >> > Send Pytables-users mailing list submissions to >> > pyt...@li... >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > or, via email, send a message with subject or body 'help' to >> > pyt...@li... >> > >> > You can reach the person managing the list at >> > pyt...@li... >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Pytables-users digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> > From: Josh Ayers <jos...@gm...> >> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > < >> > CAC...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > David, >> > >> > The change in issue 27 was only for iteration over a tables.Column >> > instance. To use it, tweak Anthony's code as follows. This will >> iterate >> > over the "element" column, as in your original example. >> > >> > Note also that this will only work with the development version of >> PyTables >> > available on github. It will be very slow using the released v2.4.0. >> > >> > >> > from itertools import izip >> > >> > with tb.openFile(...) as f: >> > data = f.root.data.cols.element >> > data_i = iter(data) >> > data_j = iter(data) >> > data_i.next() # throw the first value away >> > for i, j in izip(data_i, data_j): >> > compare(i, j) >> > >> > >> > Hope that helps, >> > Josh >> > >> > >> > >> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> wrote: >> > >> > > HI David, >> > > >> > > Tables and table column iteration have been overhauled fairly recently >> > > [1]. So you might try creating two iterators, offset by one, and then >> > > doing the comparison. I am hacking this out super quick so please >> > forgive >> > > me: >> > > >> > > from itertools import izip >> > > >> > > with tb.openFile(...) as f: >> > > data = f.root.data >> > > data_i = iter(data) >> > > data_j = iter(data) >> > > data_i.next() # throw the first value away >> > > for i, j in izip(data_i, data_j): >> > > compare(i, j) >> > > >> > > You get the idea ;) >> > > >> > > Be Well >> > > Anthony >> > > >> > > 1. https://github.com/PyTables/PyTables/issues/27 >> > > >> > > >> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> > wrote: >> > > >> > >> I was hoping someone could help me out here. >> > >> >> > >> This is from a post I put up on StackOverflow, >> > >> >> > >> I am have a fairly large dataset that I store in HDF5 and access >> using >> > >> PyTables. One operation I need to do on this dataset are pairwise >> > >> comparisons between each of the elements. This requires 2 loops, one >> to >> > >> iterate over each element, and an inner loop to iterate over every >> other >> > >> element. This operation thus looks at N(N-1)/2 comparisons. >> > >> >> > >> For fairly small sets I found it to be faster to dump the contents >> into >> > a >> > >> multdimensional numpy array and then do my iteration. I run into >> > problems >> > >> with large sets because of memory issues and need to access each >> > element of >> > >> the dataset at run time. >> > >> >> > >> Putting the elements into an array gives me about 600 comparisons per >> > >> second, while operating on hdf5 data itself gives me about 300 >> > comparisons >> > >> per second. >> > >> >> > >> Is there a way to speed this process up? >> > >> >> > >> Example follows (this is not my real code, just an example): >> > >> >> > >> *Small Set*: >> > >> >> > >> >> > >> with tb.openFile(h5_file, 'r') as f: >> > >> data = f.root.data >> > >> >> > >> N_elements = len(data) >> > >> elements = np.empty((N_irises, 1e5)) >> > >> >> > >> for ii, d in enumerate(data): >> > >> elements[ii] = data['element'] >> > >> >> > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> > >> for jj in xrange(ii+1, N_elements): >> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> > >> >> > >> *Large Set*: >> > >> >> > >> >> > >> with tb.openFile(h5_file, 'r') as f: >> > >> data = f.root.data >> > >> >> > >> N_elements = len(data) >> > >> >> > >> D = np.empty((N_irises, N_irises)) >> > >> for ii in xrange(N_elements): >> > >> for jj in xrange(ii+1, N_elements): >> > >> D[ii, jj] = compare(data['element'][ii], >> > data['element'][jj]) >> > >> >> > >> >> > >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > >> MVPs and experts. ON SALE this month only -- learn more at: >> > >> http://p.sf.net/sfu/learnmore_122712 >> > >> _______________________________________________ >> > >> Pytables-users mailing list >> > >> Pyt...@li... >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> > >> >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------------ >> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> current >> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > > MVPs and experts. ON SALE this month only -- learn more at: >> > > http://p.sf.net/sfu/learnmore_122712 >> > > _______________________________________________ >> > > Pytables-users mailing list >> > > Pyt...@li... >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > >> > ------------------------------ >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > >> > ------------------------------ >> > >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > End of Pytables-users Digest, Vol 80, Issue 3 >> > ********************************************* >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 80, Issue 4 >> ********************************************* >> > > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: David R. <dav...@gm...> - 2013-01-03 21:25:51
|
I apologize if I'm starting to sound helpless, but I'm forced to work on Windows 7 at work and have never had luck compiling python source successfully. I have had to rely on precompiled binaries and now its biting me in the butt. Is there any quick fix I can do to improve this iteration using v2.4.0? On Thu, Jan 3, 2013 at 3:17 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) > 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 3 Jan 2013 13:44:29 -0500 > From: David Reed <dav...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 > To: pyt...@li... > Message-ID: > <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > ev...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Thanks Anthony, but unless Im missing something I don't think that method > will work since this will only be comparing the ith element with ith+1 > element. I still need 2 for loops right? > > Using itertools might speed things up though, I've never used them so I > will give it a shot and let you know how it goes. Looks like I need to > download the latest release before I do that too. Thanks for the help. > > -Dave > > > > On Thu, Jan 3, 2013 at 12:12 PM, < > pyt...@li...> wrote: > > > Send Pytables-users mailing list submissions to > > pyt...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > or, via email, send a message with subject or body 'help' to > > pyt...@li... > > > > You can reach the person managing the list at > > pyt...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Pytables-users digest..." > > > > > > Today's Topics: > > > > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 3 Jan 2013 11:11:47 -0600 > > From: Anthony Scopatz <sc...@gm...> > > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > <CAPk-6T5b= > > 1EG...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > HI David, > > > > Tables and table column iteration have been overhauled fairly recently > [1]. > > So you might try creating two iterators, offset by one, and then doing > the > > comparison. I am hacking this out super quick so please forgive me: > > > > from itertools import izip > > > > with tb.openFile(...) as f: > > data = f.root.data > > data_i = iter(data) > > data_j = iter(data) > > data_i.next() # throw the first value away > > for i, j in izip(data_i, data_j): > > compare(i, j) > > > > You get the idea ;) > > > > Be Well > > Anthony > > > > 1. https://github.com/PyTables/PyTables/issues/27 > > > > > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > wrote: > > > > > I was hoping someone could help me out here. > > > > > > This is from a post I put up on StackOverflow, > > > > > > I am have a fairly large dataset that I store in HDF5 and access using > > > PyTables. One operation I need to do on this dataset are pairwise > > > comparisons between each of the elements. This requires 2 loops, one to > > > iterate over each element, and an inner loop to iterate over every > other > > > element. This operation thus looks at N(N-1)/2 comparisons. > > > > > > For fairly small sets I found it to be faster to dump the contents > into a > > > multdimensional numpy array and then do my iteration. I run into > problems > > > with large sets because of memory issues and need to access each > element > > of > > > the dataset at run time. > > > > > > Putting the elements into an array gives me about 600 comparisons per > > > second, while operating on hdf5 data itself gives me about 300 > > comparisons > > > per second. > > > > > > Is there a way to speed this process up? > > > > > > Example follows (this is not my real code, just an example): > > > > > > *Small Set*: > > > > > > > > > with tb.openFile(h5_file, 'r') as f: > > > data = f.root.data > > > > > > N_elements = len(data) > > > elements = np.empty((N_irises, 1e5)) > > > > > > for ii, d in enumerate(data): > > > elements[ii] = data['element'] > > > > > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > > for jj in xrange(ii+1, N_elements): > > > D[ii, jj] = compare(elements[ii], elements[jj]) > > > > > > *Large Set*: > > > > > > > > > with tb.openFile(h5_file, 'r') as f: > > > data = f.root.data > > > > > > N_elements = len(data) > > > > > > D = np.empty((N_irises, N_irises)) > > > for ii in xrange(N_elements): > > > for jj in xrange(ii+1, N_elements): > > > D[ii, jj] = compare(data['element'][ii], > > data['element'][jj]) > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > > MVPs and experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Pytables-users mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > > > ------------------------------ > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > End of Pytables-users Digest, Vol 80, Issue 2 > > ********************************************* > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Thu, 3 Jan 2013 15:17:01 -0500 > From: David Reed <dav...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > To: pyt...@li... > Message-ID: > < > CAM...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Thanks a lot for the help so far guys! > > Looking at itertools, I found what I believe to be the perfect function for > what I need, itertools.combinations. This appears to be a valid replacement > to the method proposed. > > There is a small problem that I didn't mention is that my compare function > actually takes as inputs 2 columns from the table. Like so: > > D = np.empty((N_irises, N_irises)) > for ii in xrange(N_elements): > for jj in xrange(ii+1, N_elements): > D[ii, jj] = compare(data['element1'][ii], > data['element1'][jj],data['element2'][ii], > data['element2'][jj]) > > Is there an efficient way of using itertools with this structure? > > > On Thu, Jan 3, 2013 at 1:29 PM, < > pyt...@li...> wrote: > > > Send Pytables-users mailing list submissions to > > pyt...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > or, via email, send a message with subject or body 'help' to > > pyt...@li... > > > > You can reach the person managing the list at > > pyt...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Pytables-users digest..." > > > > > > Today's Topics: > > > > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 3 Jan 2013 10:29:33 -0800 > > From: Josh Ayers <jos...@gm...> > > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > < > > CAC...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > David, > > > > The change in issue 27 was only for iteration over a tables.Column > > instance. To use it, tweak Anthony's code as follows. This will iterate > > over the "element" column, as in your original example. > > > > Note also that this will only work with the development version of > PyTables > > available on github. It will be very slow using the released v2.4.0. > > > > > > from itertools import izip > > > > with tb.openFile(...) as f: > > data = f.root.data.cols.element > > data_i = iter(data) > > data_j = iter(data) > > data_i.next() # throw the first value away > > for i, j in izip(data_i, data_j): > > compare(i, j) > > > > > > Hope that helps, > > Josh > > > > > > > > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> > wrote: > > > > > HI David, > > > > > > Tables and table column iteration have been overhauled fairly recently > > > [1]. So you might try creating two iterators, offset by one, and then > > > doing the comparison. I am hacking this out super quick so please > > forgive > > > me: > > > > > > from itertools import izip > > > > > > with tb.openFile(...) as f: > > > data = f.root.data > > > data_i = iter(data) > > > data_j = iter(data) > > > data_i.next() # throw the first value away > > > for i, j in izip(data_i, data_j): > > > compare(i, j) > > > > > > You get the idea ;) > > > > > > Be Well > > > Anthony > > > > > > 1. https://github.com/PyTables/PyTables/issues/27 > > > > > > > > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > > wrote: > > > > > >> I was hoping someone could help me out here. > > >> > > >> This is from a post I put up on StackOverflow, > > >> > > >> I am have a fairly large dataset that I store in HDF5 and access using > > >> PyTables. One operation I need to do on this dataset are pairwise > > >> comparisons between each of the elements. This requires 2 loops, one > to > > >> iterate over each element, and an inner loop to iterate over every > other > > >> element. This operation thus looks at N(N-1)/2 comparisons. > > >> > > >> For fairly small sets I found it to be faster to dump the contents > into > > a > > >> multdimensional numpy array and then do my iteration. I run into > > problems > > >> with large sets because of memory issues and need to access each > > element of > > >> the dataset at run time. > > >> > > >> Putting the elements into an array gives me about 600 comparisons per > > >> second, while operating on hdf5 data itself gives me about 300 > > comparisons > > >> per second. > > >> > > >> Is there a way to speed this process up? > > >> > > >> Example follows (this is not my real code, just an example): > > >> > > >> *Small Set*: > > >> > > >> > > >> with tb.openFile(h5_file, 'r') as f: > > >> data = f.root.data > > >> > > >> N_elements = len(data) > > >> elements = np.empty((N_irises, 1e5)) > > >> > > >> for ii, d in enumerate(data): > > >> elements[ii] = data['element'] > > >> > > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > >> for jj in xrange(ii+1, N_elements): > > >> D[ii, jj] = compare(elements[ii], elements[jj]) > > >> > > >> *Large Set*: > > >> > > >> > > >> with tb.openFile(h5_file, 'r') as f: > > >> data = f.root.data > > >> > > >> N_elements = len(data) > > >> > > >> D = np.empty((N_irises, N_irises)) > > >> for ii in xrange(N_elements): > > >> for jj in xrange(ii+1, N_elements): > > >> D[ii, jj] = compare(data['element'][ii], > > data['element'][jj]) > > >> > > >> > > >> > > >> > > > ------------------------------------------------------------------------------ > > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> MVPs and experts. ON SALE this month only -- learn more at: > > >> http://p.sf.net/sfu/learnmore_122712 > > >> _______________________________________________ > > >> Pytables-users mailing list > > >> Pyt...@li... > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> > > >> > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > > MVPs and experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Pytables-users mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > > > ------------------------------ > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > End of Pytables-users Digest, Vol 80, Issue 3 > > ********************************************* > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 80, Issue 4 > ********************************************* > |
From: David R. <dav...@gm...> - 2013-01-03 20:17:28
|
Thanks a lot for the help so far guys! Looking at itertools, I found what I believe to be the perfect function for what I need, itertools.combinations. This appears to be a valid replacement to the method proposed. There is a small problem that I didn't mention is that my compare function actually takes as inputs 2 columns from the table. Like so: D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): for jj in xrange(ii+1, N_elements): D[ii, jj] = compare(data['element1'][ii], data['element1'][jj],data['element2'][ii], data['element2'][jj]) Is there an efficient way of using itertools with this structure? On Thu, Jan 3, 2013 at 1:29 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 3 Jan 2013 10:29:33 -0800 > From: Josh Ayers <jos...@gm...> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAC...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > David, > > The change in issue 27 was only for iteration over a tables.Column > instance. To use it, tweak Anthony's code as follows. This will iterate > over the "element" column, as in your original example. > > Note also that this will only work with the development version of PyTables > available on github. It will be very slow using the released v2.4.0. > > > from itertools import izip > > with tb.openFile(...) as f: > data = f.root.data.cols.element > data_i = iter(data) > data_j = iter(data) > data_i.next() # throw the first value away > for i, j in izip(data_i, data_j): > compare(i, j) > > > Hope that helps, > Josh > > > > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> wrote: > > > HI David, > > > > Tables and table column iteration have been overhauled fairly recently > > [1]. So you might try creating two iterators, offset by one, and then > > doing the comparison. I am hacking this out super quick so please > forgive > > me: > > > > from itertools import izip > > > > with tb.openFile(...) as f: > > data = f.root.data > > data_i = iter(data) > > data_j = iter(data) > > data_i.next() # throw the first value away > > for i, j in izip(data_i, data_j): > > compare(i, j) > > > > You get the idea ;) > > > > Be Well > > Anthony > > > > 1. https://github.com/PyTables/PyTables/issues/27 > > > > > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > wrote: > > > >> I was hoping someone could help me out here. > >> > >> This is from a post I put up on StackOverflow, > >> > >> I am have a fairly large dataset that I store in HDF5 and access using > >> PyTables. One operation I need to do on this dataset are pairwise > >> comparisons between each of the elements. This requires 2 loops, one to > >> iterate over each element, and an inner loop to iterate over every other > >> element. This operation thus looks at N(N-1)/2 comparisons. > >> > >> For fairly small sets I found it to be faster to dump the contents into > a > >> multdimensional numpy array and then do my iteration. I run into > problems > >> with large sets because of memory issues and need to access each > element of > >> the dataset at run time. > >> > >> Putting the elements into an array gives me about 600 comparisons per > >> second, while operating on hdf5 data itself gives me about 300 > comparisons > >> per second. > >> > >> Is there a way to speed this process up? > >> > >> Example follows (this is not my real code, just an example): > >> > >> *Small Set*: > >> > >> > >> with tb.openFile(h5_file, 'r') as f: > >> data = f.root.data > >> > >> N_elements = len(data) > >> elements = np.empty((N_irises, 1e5)) > >> > >> for ii, d in enumerate(data): > >> elements[ii] = data['element'] > >> > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > >> for jj in xrange(ii+1, N_elements): > >> D[ii, jj] = compare(elements[ii], elements[jj]) > >> > >> *Large Set*: > >> > >> > >> with tb.openFile(h5_file, 'r') as f: > >> data = f.root.data > >> > >> N_elements = len(data) > >> > >> D = np.empty((N_irises, N_irises)) > >> for ii in xrange(N_elements): > >> for jj in xrange(ii+1, N_elements): > >> D[ii, jj] = compare(data['element'][ii], > data['element'][jj]) > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> MVPs and experts. ON SALE this month only -- learn more at: > >> http://p.sf.net/sfu/learnmore_122712 > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 80, Issue 3 > ********************************************* > |
From: David R. <dav...@gm...> - 2013-01-03 18:45:01
|
Thanks Anthony, but unless Im missing something I don't think that method will work since this will only be comparing the ith element with ith+1 element. I still need 2 for loops right? Using itertools might speed things up though, I've never used them so I will give it a shot and let you know how it goes. Looks like I need to download the latest release before I do that too. Thanks for the help. -Dave On Thu, Jan 3, 2013 at 12:12 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 3 Jan 2013 11:11:47 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > <CAPk-6T5b= > 1EG...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > HI David, > > Tables and table column iteration have been overhauled fairly recently [1]. > So you might try creating two iterators, offset by one, and then doing the > comparison. I am hacking this out super quick so please forgive me: > > from itertools import izip > > with tb.openFile(...) as f: > data = f.root.data > data_i = iter(data) > data_j = iter(data) > data_i.next() # throw the first value away > for i, j in izip(data_i, data_j): > compare(i, j) > > You get the idea ;) > > Be Well > Anthony > > 1. https://github.com/PyTables/PyTables/issues/27 > > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> wrote: > > > I was hoping someone could help me out here. > > > > This is from a post I put up on StackOverflow, > > > > I am have a fairly large dataset that I store in HDF5 and access using > > PyTables. One operation I need to do on this dataset are pairwise > > comparisons between each of the elements. This requires 2 loops, one to > > iterate over each element, and an inner loop to iterate over every other > > element. This operation thus looks at N(N-1)/2 comparisons. > > > > For fairly small sets I found it to be faster to dump the contents into a > > multdimensional numpy array and then do my iteration. I run into problems > > with large sets because of memory issues and need to access each element > of > > the dataset at run time. > > > > Putting the elements into an array gives me about 600 comparisons per > > second, while operating on hdf5 data itself gives me about 300 > comparisons > > per second. > > > > Is there a way to speed this process up? > > > > Example follows (this is not my real code, just an example): > > > > *Small Set*: > > > > > > with tb.openFile(h5_file, 'r') as f: > > data = f.root.data > > > > N_elements = len(data) > > elements = np.empty((N_irises, 1e5)) > > > > for ii, d in enumerate(data): > > elements[ii] = data['element'] > > > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > for jj in xrange(ii+1, N_elements): > > D[ii, jj] = compare(elements[ii], elements[jj]) > > > > *Large Set*: > > > > > > with tb.openFile(h5_file, 'r') as f: > > data = f.root.data > > > > N_elements = len(data) > > > > D = np.empty((N_irises, N_irises)) > > for ii in xrange(N_elements): > > for jj in xrange(ii+1, N_elements): > > D[ii, jj] = compare(data['element'][ii], > data['element'][jj]) > > > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 80, Issue 2 > ********************************************* > |