From: Anthony S. <sc...@gm...> - 2013-02-01 21:50:43
|
On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> wrote: > at the error: > > result = numpy.empty(shape=nrows, dtype=dtypeField) > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > > I'm not sure what that means as a dtype, but thats what it is. > > Forgive me if I'm being totally naive, but I thought the whole point of > __iter__ with pyttables was to do iteration on the fly, so there is no > preallocation. > Nope you are not being naive at all. That is the point. > If you have any ideas on this I'm all ears. > If you could send a minimal script which reproduces this error, that would help a lot. Be Well Anthony > > > Thanks again. > > Dave > > > On Fri, Feb 1, 2013 at 3:45 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> >> wrote: >> >> > Hi Anthony, >> > >> > Thanks for the reply. >> > >> > I honestly don't know how to monitor my Python memory usage, but I'm >> sure >> > that its caused by out of memory. >> > >> >> Well, I would just run top or process monitor or something while running >> the python script to see what happens to memory usage as the script chugs >> along... >> >> >> > I'm just trying to find out how to fix it. My HDF5 table has 4620 rows >> > and the column I'm iterating over is a 17x9600 boolean matrix. The >> > __iter__ method is preallocating an array that is this size which >> appears >> > to be root of the error. I was hoping there is a fix somewhere in here >> to >> > not have to do this preallocation. >> > >> >> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of >> these is ~760 MB. If you have 2 GB of memory and you are iterating over 2 >> of these (templates & masks) it is conceivable that you are just running >> out of memory. Maybe there is a way that __iter__ could not preallocate >> something that is basically a temporary. What is the dtype of the >> templates array? >> >> Be Well >> Anthony >> >> >> > >> > Thanks again. >> > >> > >> > >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < >> > pyt...@li...> wrote: >> > >> >> Send Pytables-users mailing list submissions to >> >> pyt...@li... >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> or, via email, send a message with subject or body 'help' to >> >> pyt...@li... >> >> >> >> You can reach the person managing the list at >> >> pyt...@li... >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> >> From: Anthony Scopatz <sc...@gm...> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 9 >> >> To: Discussion list for PyTables >> >> <pyt...@li...> >> >> Message-ID: >> >> < >> >> CAP...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> Hi David, >> >> >> >> Sorry, I haven't had a ton of time recently. You seem to be getting a >> >> memory error on creating a numpy array. This kind of thing typically >> >> happens when you are out of memory. Does this seem to be the case with >> >> you? When this dies, is your memory usage at 100%? If so, this >> algorithm >> >> might require a little tweaking... >> >> >> >> Be Well >> >> Anthony >> >> >> >> >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm...> >> >> wrote: >> >> >> >> > I'm still having problems with this one. I can't tell if this >> something >> >> > dumb Im doing with itertools, or if its something in pytables. >> >> > >> >> > Would appreciate any help. >> >> > >> >> > Thanks >> >> > >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed <dav...@gm... >> >> >wrote: >> >> > >> >> >> I think I have to reopen this issue. I have been running fine for >> >> awhile >> >> >> using the combinations method from itertools, but have recently run >> >> into a >> >> >> memory since I have recently quadrupled the size of the hdf file. >> >> >> >> >> >> Here is my code again: >> >> >> >> >> >> from itertools import combinations, izip >> >> >> with tb.openFile(h5_all, 'r') as f: >> >> >> irises = f.root.irises >> >> >> >> >> >> templates = f.root.irises.cols.templates >> >> >> masks = f.root.irises.cols.masks1 >> >> >> >> >> >> N_irises = len(irises) >> >> >> index = np.ones((20 * 480), np.bool) >> >> >> >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> >> D = np.empty((N_irises, N_irises)) >> >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, >> masks, >> >> >> range(N_irises)), 2): >> >> >> # print ii >> >> >> D[ii, jj] = ham_dist( >> >> >> t1[8, index], >> >> >> t2[:, index], >> >> >> m1[8, index], >> >> >> m2[:, index], >> >> >> ) >> >> >> >> >> >> And here is the error: >> >> >> >> >> >> In [10]: get_hd3() >> >> >> 10669890 Comparisons >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------------- >> >> >> MemoryError Traceback (most recent >> call >> >> >> last) >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> >> >> ----> 1 get_hd3() >> >> >> >> >> >> >> >> >> 118 print '%i Comparisons' % >> (N_irises*(N_irises - >> >> >> 1)/2) >> >> >> 119 D = np.empty((N_irises, N_irises)) >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in >> >> >> combinations(izip(temp >> >> >> lates, masks, range(N_irises)), 2): >> >> >> 121 # print ii >> >> >> 122 D[ii, jj] = ham_dist( >> >> >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self) >> >> >> 3274 for start_row in xrange(0, len(self), nrowsinbuf): >> >> >> 3275 end_row = min([start_row + nrowsinbuf, max_row]) >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, >> >> >> field=self.pathname) >> >> >> >> >> >> 3277 for row in buf: >> >> >> 3278 yield row >> >> >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, start, >> >> stop, >> >> >> step, >> >> >> field) >> >> >> 1772 (start, stop, step) = self._processRangeRead(start, >> >> stop, >> >> >> step) >> >> >> 1773 >> >> >> -> 1774 arr = self._read(start, stop, step, field) >> >> >> 1775 return internal_to_flavor(arr, self.flavor) >> >> >> 1776 >> >> >> >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, >> >> >> stop, step, >> >> >> field) >> >> >> 1719 if field: >> >> >> 1720 # Create a container for the results >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> dtype=dtypeField) >> >> >> 1722 else: >> >> >> 1723 # Recarray case >> >> >> >> >> >> MemoryError: >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() >> >> >> 1720 # Create a container for the results >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> dtype=dtypeField) >> >> >> 1722 else: >> >> >> >> >> >> Also, if you guys see any performance problems in my code, please >> let >> >> me >> >> >> know. >> >> >> >> >> >> Thank you so much for the help. >> >> >> >> >> >> -Dave >> >> >> >> >> >> >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> >> >> pyt...@li...> wrote: >> >> >> >> >> >>> Send Pytables-users mailing list submissions to >> >> >>> pyt...@li... >> >> >>> >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> or, via email, send a message with subject or body 'help' to >> >> >>> pyt...@li... >> >> >>> >> >> >>> You can reach the person managing the list at >> >> >>> pyt...@li... >> >> >>> >> >> >>> When replying, please edit your Subject line so it is more specific >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >>> >> >> >>> >> >> >>> Today's Topics: >> >> >>> >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) >> >> >>> >> >> >>> >> >> >>> >> ---------------------------------------------------------------------- >> >> >>> >> >> >>> Message: 1 >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> >> >>> From: David Reed <dav...@gm...> >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue >> 8 >> >> >>> To: pyt...@li... >> >> >>> Message-ID: >> >> >>> < >> >> >>> CAM...@ma... >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> >> >> >>> I can't thank you guys enough for the help. I was able to add the >> >> >>> __iter__ >> >> >>> function to the table.py file and everything seems to be working >> >> great! >> >> >>> I'm not quite as fast as I was with iterating right of a matrix >> but >> >> >>> pretty >> >> >>> close. I was at 555 comparisons per second, and now im at 420. >> >> >>> >> >> >>> I handled the problem I mentioned earlier by doing this, and it >> seems >> >> to >> >> >>> work great: >> >> >>> >> >> >>> A = f.root.data.cols.A >> >> >>> B = f.root.data.cols.B >> >> >>> >> >> >>> D = np.empty((len(A), len(A)) >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, >> >> range(len(A))), >> >> >>> 2): >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> >> >>> >> >> >>> Again, thanks a lot. >> >> >>> >> >> >>> -Dave >> >> >>> >> >> >>> >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> >> >>> pyt...@li...> wrote: >> >> >>> >> >> >>> > Send Pytables-users mailing list submissions to >> >> >>> > pyt...@li... >> >> >>> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > or, via email, send a message with subject or body 'help' to >> >> >>> > pyt...@li... >> >> >>> > >> >> >>> > You can reach the person managing the list at >> >> >>> > pyt...@li... >> >> >>> > >> >> >>> > When replying, please edit your Subject line so it is more >> specific >> >> >>> > than "Re: Contents of Pytables-users digest..." >> >> >>> > >> >> >>> > >> >> >>> > Today's Topics: >> >> >>> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony >> Scopatz) >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony >> Scopatz) >> >> >>> > >> >> >>> > >> >> >>> > >> >> ---------------------------------------------------------------------- >> >> >>> > >> >> >>> > Message: 1 >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> Issue 3 >> >> >>> > To: Discussion list for PyTables >> >> >>> > <pyt...@li...> >> >> >>> > Message-ID: >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> >> >>> > Gz...@ma...> >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < >> dav...@gm...> >> >> >>> wrote: >> >> >>> > >> >> >>> > > Thanks a lot for the help so far guys! >> >> >>> > > >> >> >>> > > Looking at itertools, I found what I believe to be the perfect >> >> >>> function >> >> >>> > > for what I need, itertools.combinations. This appears to be a >> >> valid >> >> >>> > > replacement to the method proposed. >> >> >>> > > >> >> >>> > >> >> >>> > Yes, combinations is awesome! >> >> >>> > >> >> >>> > >> >> >>> > > >> >> >>> > > There is a small problem that I didn't mention is that my >> compare >> >> >>> > function >> >> >>> > > actually takes as inputs 2 columns from the table. Like so: >> >> >>> > > >> >> >>> > > D = np.empty((N_irises, N_irises)) >> >> >>> > > for ii in xrange(N_elements): >> >> >>> > > for jj in xrange(ii+1, N_elements): >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], >> >> >>> > data['element1'][jj],data['element2'][ii], >> >> >>> > > data['element2'][jj]) >> >> >>> > > >> >> >>> > > Is there an efficient way of using itertools with this >> structure? >> >> >>> > > >> >> >>> > >> >> >>> > You can always make two other iterators for each column. Since >> you >> >> >>> have >> >> >>> > two columns you would have 4 iterators. I am not sure how fast >> >> this is >> >> >>> > going to be but I am confident that there is definitely a way to >> do >> >> >>> this in >> >> >>> > one for-loop, which is going to be way faster than nested loops. >> >> >>> > >> >> >>> > Be Well >> >> >>> > Anthony >> >> >>> > >> >> >>> > >> >> >>> > > >> >> >>> > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> >> >>> > > pyt...@li...> wrote: >> >> >>> > > >> >> >>> > >> Send Pytables-users mailing list submissions to >> >> >>> > >> pyt...@li... >> >> >>> > >> >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> or, via email, send a message with subject or body 'help' to >> >> >>> > >> pyt...@li... >> >> >>> > >> >> >> >>> > >> You can reach the person managing the list at >> >> >>> > >> pyt...@li... >> >> >>> > >> >> >> >>> > >> When replying, please edit your Subject line so it is more >> >> specific >> >> >>> > >> than "Re: Contents of Pytables-users digest..." >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> Today's Topics: >> >> >>> > >> >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> >> ---------------------------------------------------------------------- >> >> >>> > >> >> >> >>> > >> Message: 1 >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> >>> > >> From: Josh Ayers <jos...@gm...> >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using >> >> >>> PyTables >> >> >>> > >> To: Discussion list for PyTables >> >> >>> > >> <pyt...@li...> >> >> >>> > >> Message-ID: >> >> >>> > >> < >> >> >>> > >> >> >> CAC...@ma...> >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >> >> >> >>> > >> David, >> >> >>> > >> >> >> >>> > >> The change in issue 27 was only for iteration over a >> >> tables.Column >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. This >> will >> >> >>> > iterate >> >> >>> > >> over the "element" column, as in your original example. >> >> >>> > >> >> >> >>> > >> Note also that this will only work with the development >> version >> >> of >> >> >>> > >> PyTables >> >> >>> > >> available on github. It will be very slow using the released >> >> >>> v2.4.0. >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> from itertools import izip >> >> >>> > >> >> >> >>> > >> with tb.openFile(...) as f: >> >> >>> > >> data = f.root.data.cols.element >> >> >>> > >> data_i = iter(data) >> >> >>> > >> data_j = iter(data) >> >> >>> > >> data_i.next() # throw the first value away >> >> >>> > >> for i, j in izip(data_i, data_j): >> >> >>> > >> compare(i, j) >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> Hope that helps, >> >> >>> > >> Josh >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> >> sc...@gm...> >> >> >>> > >> wrote: >> >> >>> > >> >> >> >>> > >> > HI David, >> >> >>> > >> > >> >> >>> > >> > Tables and table column iteration have been overhauled >> fairly >> >> >>> recently >> >> >>> > >> > [1]. So you might try creating two iterators, offset by >> one, >> >> and >> >> >>> then >> >> >>> > >> > doing the comparison. I am hacking this out super quick so >> >> please >> >> >>> > >> forgive >> >> >>> > >> > me: >> >> >>> > >> > >> >> >>> > >> > from itertools import izip >> >> >>> > >> > >> >> >>> > >> > with tb.openFile(...) as f: >> >> >>> > >> > data = f.root.data >> >> >>> > >> > data_i = iter(data) >> >> >>> > >> > data_j = iter(data) >> >> >>> > >> > data_i.next() # throw the first value away >> >> >>> > >> > for i, j in izip(data_i, data_j): >> >> >>> > >> > compare(i, j) >> >> >>> > >> > >> >> >>> > >> > You get the idea ;) >> >> >>> > >> > >> >> >>> > >> > Be Well >> >> >>> > >> > Anthony >> >> >>> > >> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >>> dav...@gm...> >> >> >>> > >> wrote: >> >> >>> > >> > >> >> >>> > >> >> I was hoping someone could help me out here. >> >> >>> > >> >> >> >> >>> > >> >> This is from a post I put up on StackOverflow, >> >> >>> > >> >> >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 and >> >> access >> >> >>> > using >> >> >>> > >> >> PyTables. One operation I need to do on this dataset are >> >> pairwise >> >> >>> > >> >> comparisons between each of the elements. This requires 2 >> >> loops, >> >> >>> one >> >> >>> > to >> >> >>> > >> >> iterate over each element, and an inner loop to iterate >> over >> >> >>> every >> >> >>> > >> other >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> >> >>> > >> >> >> >> >>> > >> >> For fairly small sets I found it to be faster to dump the >> >> >>> contents >> >> >>> > >> into a >> >> >>> > >> >> multdimensional numpy array and then do my iteration. I run >> >> into >> >> >>> > >> problems >> >> >>> > >> >> with large sets because of memory issues and need to access >> >> each >> >> >>> > >> element of >> >> >>> > >> >> the dataset at run time. >> >> >>> > >> >> >> >> >>> > >> >> Putting the elements into an array gives me about 600 >> >> >>> comparisons per >> >> >>> > >> >> second, while operating on hdf5 data itself gives me about >> 300 >> >> >>> > >> comparisons >> >> >>> > >> >> per second. >> >> >>> > >> >> >> >> >>> > >> >> Is there a way to speed this process up? >> >> >>> > >> >> >> >> >>> > >> >> Example follows (this is not my real code, just an >> example): >> >> >>> > >> >> >> >> >>> > >> >> *Small Set*: >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >>> > >> >> data = f.root.data >> >> >>> > >> >> >> >> >>> > >> >> N_elements = len(data) >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> >> >>> > >> >> >> >> >>> > >> >> for ii, d in enumerate(data): >> >> >>> > >> >> elements[ii] = data['element'] >> >> >>> > >> >> >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> >> xrange(N_elements): >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >>> > >> >> >> >> >>> > >> >> *Large Set*: >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >>> > >> >> data = f.root.data >> >> >>> > >> >> >> >> >>> > >> >> N_elements = len(data) >> >> >>> > >> >> >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> >> >>> > >> >> for ii in xrange(N_elements): >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], >> >> >>> > >> data['element'][jj]) >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> HTML5, >> >> >>> CSS, >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > current >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> >> _______________________________________________ >> >> >>> > >> >> Pytables-users mailing list >> >> >>> > >> >> Pyt...@li... >> >> >>> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> HTML5, >> >> >>> CSS, >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >> >>> > current >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> Microsoft >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> > _______________________________________________ >> >> >>> > >> > Pytables-users mailing list >> >> >>> > >> > Pyt...@li... >> >> >>> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> -------------- next part -------------- >> >> >>> > >> An HTML attachment was scrubbed... >> >> >>> > >> >> >> >>> > >> ------------------------------ >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> >>> CSS, >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >> >>> current >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> Microsoft >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> >> >> >>> > >> ------------------------------ >> >> >>> > >> >> >> >>> > >> _______________________________________________ >> >> >>> > >> Pytables-users mailing list >> >> >>> > >> Pyt...@li... >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> >> >>> > >> ********************************************* >> >> >>> > >> >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> CSS, >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >> >>> current >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > > _______________________________________________ >> >> >>> > > Pytables-users mailing list >> >> >>> > > Pyt...@li... >> >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > > >> >> >>> > > >> >> >>> > -------------- next part -------------- >> >> >>> > An HTML attachment was scrubbed... >> >> >>> > >> >> >>> > ------------------------------ >> >> >>> > >> >> >>> > Message: 2 >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> Issue 4 >> >> >>> > To: Discussion list for PyTables >> >> >>> > <pyt...@li...> >> >> >>> > Message-ID: >> >> >>> > < >> >> >>> > >> CAP...@ma...> >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >> >> >>> > Josh is right that you can just edit the code by hand (which >> works >> >> but >> >> >>> > sucks). >> >> >>> > >> >> >>> > However, on Windows -- on the rare occasion when I also have to >> >> >>> develop on >> >> >>> > it -- I typically use a distribution that includes a compiler, >> >> cython, >> >> >>> > hdf5, and pytables already and then I install my development >> version >> >> >>> from >> >> >>> > github OVER this. I recommend either EPD or Anaconda, though >> other >> >> >>> > distributions listed here [1] might also work. >> >> >>> > >> >> >>> > Be well >> >> >>> > Anthony >> >> >>> > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ >> >> >>> > >> >> >>> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm... >> > >> >> >>> wrote: >> >> >>> > >> >> >>> > > The change was in pure Python code, so you should be able to >> just >> >> >>> paste >> >> >>> > in >> >> >>> > > the changes to your local copy. Start with the >> >> table.Column.__iter__ >> >> >>> > > method (lines 3296-3310) here. >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > >> >> >>> >> >> >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> >> >>> > > >> >> >>> > > It needs to be modified slightly because it uses some >> additional >> >> >>> features >> >> >>> > > that aren't available in the released version (the >> out=buf_slice >> >> >>> argument >> >> >>> > > to table.read). The following should work. >> >> >>> > > >> >> >>> > > def __iter__(self): >> >> >>> > > table = self.table >> >> >>> > > itemsize = self.dtype.itemsize >> >> >>> > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >> >> >>> itemsize >> >> >>> > > max_row = len(self) >> >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): >> >> >>> > > end_row = min([start_row + nrowsinbuf, max_row]) >> >> >>> > > buf = table.read(start_row, end_row, 1, >> >> >>> field=self.pathname) >> >> >>> > > for row in buf: >> >> >>> > > yield row >> >> >>> > > >> >> >>> > > >> >> >>> > > I haven't tested this, but I think it will work. >> >> >>> > > >> >> >>> > > Josh >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < >> >> dav...@gm...> >> >> >>> > wrote: >> >> >>> > > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm forced >> to >> >> >>> work on >> >> >>> > >> Windows 7 at work and have never had luck compiling python >> source >> >> >>> > >> successfully. I have had to rely on precompiled binaries and >> now >> >> >>> its >> >> >>> > >> biting me in the butt. >> >> >>> > >> >> >> >>> > >> Is there any quick fix I can do to improve this iteration >> using >> >> >>> v2.4.0? >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> >> >>> > >> pyt...@li...> wrote: >> >> >>> > >> >> >> >>> > >>> Send Pytables-users mailing list submissions to >> >> >>> > >>> pyt...@li... >> >> >>> > >>> >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >>> >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> or, via email, send a message with subject or body 'help' to >> >> >>> > >>> pyt...@li... >> >> >>> > >>> >> >> >>> > >>> You can reach the person managing the list at >> >> >>> > >>> pyt...@li... >> >> >>> > >>> >> >> >>> > >>> When replying, please edit your Subject line so it is more >> >> specific >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> Today's Topics: >> >> >>> > >>> >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> >> ---------------------------------------------------------------------- >> >> >>> > >>> >> >> >>> > >>> Message: 1 >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> >> >>> > >>> From: David Reed <dav...@gm...> >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> >> Issue >> >> >>> 2 >> >> >>> > >>> To: pyt...@li... >> >> >>> > >>> Message-ID: >> >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> >> >>> > >>> ev...@ma...> >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >>> >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't think >> >> that >> >> >>> > method >> >> >>> > >>> will work since this will only be comparing the ith element >> with >> >> >>> ith+1 >> >> >>> > >>> element. I still need 2 for loops right? >> >> >>> > >>> >> >> >>> > >>> Using itertools might speed things up though, I've never used >> >> them >> >> >>> so I >> >> >>> > >>> will give it a shot and let you know how it goes. Looks >> like I >> >> >>> need to >> >> >>> > >>> download the latest release before I do that too. Thanks for >> >> the >> >> >>> help. >> >> >>> > >>> >> >> >>> > >>> -Dave >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> >> >>> > >>> pyt...@li...> wrote: >> >> >>> > >>> >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >>> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > or, via email, send a message with subject or body 'help' >> to >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > You can reach the person managing the list at >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > When replying, please edit your Subject line so it is more >> >> >>> specific >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > Today's Topics: >> >> >>> > >>> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony >> >> >>> Scopatz) >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >> >> ---------------------------------------------------------------------- >> >> >>> > >>> > >> >> >>> > >>> > Message: 1 >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> using >> >> >>> PyTables >> >> >>> > >>> > To: Discussion list for PyTables >> >> >>> > >>> > <pyt...@li...> >> >> >>> > >>> > Message-ID: >> >> >>> > >>> > <CAPk-6T5b= >> >> >>> > >>> > 1EG...@ma...> >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >>> > >> >> >>> > >>> > HI David, >> >> >>> > >>> > >> >> >>> > >>> > Tables and table column iteration have been overhauled >> fairly >> >> >>> > recently >> >> >>> > >>> [1]. >> >> >>> > >>> > So you might try creating two iterators, offset by one, >> and >> >> then >> >> >>> > >>> doing the >> >> >>> > >>> > comparison. I am hacking this out super quick so please >> >> forgive >> >> >>> me: >> >> >>> > >>> > >> >> >>> > >>> > from itertools import izip >> >> >>> > >>> > >> >> >>> > >>> > with tb.openFile(...) as f: >> >> >>> > >>> > data = f.root.data >> >> >>> > >>> > data_i = iter(data) >> >> >>> > >>> > data_j = iter(data) >> >> >>> > >>> > data_i.next() # throw the first value away >> >> >>> > >>> > for i, j in izip(data_i, data_j): >> >> >>> > >>> > compare(i, j) >> >> >>> > >>> > >> >> >>> > >>> > You get the idea ;) >> >> >>> > >>> > >> >> >>> > >>> > Be Well >> >> >>> > >>> > Anthony >> >> >>> > >>> > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >>> dav...@gm...> >> >> >>> > >>> wrote: >> >> >>> > >>> > >> >> >>> > >>> > > I was hoping someone could help me out here. >> >> >>> > >>> > > >> >> >>> > >>> > > This is from a post I put up on StackOverflow, >> >> >>> > >>> > > >> >> >>> > >>> > > I am have a fairly large dataset that I store in HDF5 and >> >> >>> access >> >> >>> > >>> using >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset are >> >> >>> pairwise >> >> >>> > >>> > > comparisons between each of the elements. This requires 2 >> >> >>> loops, >> >> >>> > one >> >> >>> > >>> to >> >> >>> > >>> > > iterate over each element, and an inner loop to iterate >> over >> >> >>> every >> >> >>> > >>> other >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 >> comparisons. >> >> >>> > >>> > > >> >> >>> > >>> > > For fairly small sets I found it to be faster to dump the >> >> >>> contents >> >> >>> > >>> into a >> >> >>> > >>> > > multdimensional numpy array and then do my iteration. I >> run >> >> >>> into >> >> >>> > >>> problems >> >> >>> > >>> > > with large sets because of memory issues and need to >> access >> >> >>> each >> >> >>> > >>> element >> >> >>> > >>> > of >> >> >>> > >>> > > the dataset at run time. >> >> >>> > >>> > > >> >> >>> > >>> > > Putting the elements into an array gives me about 600 >> >> >>> comparisons >> >> >>> > per >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me >> about >> >> 300 >> >> >>> > >>> > comparisons >> >> >>> > >>> > > per second. >> >> >>> > >>> > > >> >> >>> > >>> > > Is there a way to speed this process up? >> >> >>> > >>> > > >> >> >>> > >>> > > Example follows (this is not my real code, just an >> example): >> >> >>> > >>> > > >> >> >>> > >>> > > *Small Set*: >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> >> >>> > >>> > > data = f.root.data >> >> >>> > >>> > > >> >> >>> > >>> > > N_elements = len(data) >> >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) >> >> >>> > >>> > > >> >> >>> > >>> > > for ii, d in enumerate(data): >> >> >>> > >>> > > elements[ii] = data['element'] >> >> >>> > >>> > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in >> >> >>> xrange(N_elements): >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> >> >>> > >>> > > D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >>> > >>> > > >> >> >>> > >>> > > *Large Set*: >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> >> >>> > >>> > > data = f.root.data >> >> >>> > >>> > > >> >> >>> > >>> > > N_elements = len(data) >> >> >>> > >>> > > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) >> >> >>> > >>> > > for ii in xrange(N_elements): >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> >> >>> > >>> > > D[ii, jj] = compare(data['element'][ii], >> >> >>> > >>> > data['element'][jj]) >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> >>> HTML5, >> >> >>> > CSS, >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > >>> current >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more >> at: >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > > _______________________________________________ >> >> >>> > >>> > > Pytables-users mailing list >> >> >>> > >>> > > Pyt...@li... >> >> >>> > >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > -------------- next part -------------- >> >> >>> > >>> > An HTML attachment was scrubbed... >> >> >>> > >>> > >> >> >>> > >>> > ------------------------------ >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> HTML5, >> >> >>> CSS, >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > current >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > >> >> >>> > >>> > ------------------------------ >> >> >>> > >>> > >> >> >>> > >>> > _______________________________________________ >> >> >>> > >>> > Pytables-users mailing list >> >> >>> > >>> > Pyt...@li... >> >> >>> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> >> >>> > >>> > ********************************************* >> >> >>> > >>> > >> >> >>> > >>> -------------- next part -------------- >> >> >>> > >>> An HTML attachment was scrubbed... >> >> >>> > >>> >> >> >>> > >>> ------------------------------ >> >> >>> > >>> >> >> >>> > >>> Message: 2 >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> >> >>> > >>> From: David Reed <dav...@gm...> >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> >> Issue >> >> >>> 3 >> >> >>> > >>> To: pyt...@li... >> >> >>> > >>> Message-ID: >> >> >>> > >>> < >> >> >>> > >>> >> >> CAM...@ma... >> >> >>> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >>> >> >> >>> > >>> Thanks a lot for the help so far guys! >> >> >>> > >>> >> >> >>> > >>> Looking at itertools, I found what I believe to be the >> perfect >> >> >>> function >> >> >>> > >>> for >> >> >>> > >>> what I need, itertools.combinations. This appears to be a >> valid >> >> >>> > >>> replacement >> >> >>> > >>> to the method proposed. >> >> >>> > >>> >> >> >>> > >>> There is a small problem that I didn't mention is that my >> >> compare >> >> >>> > >>> function >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like so: >> >> >>> > >>> >> >> >>> > >>> D = np.empty((N_irises, N_irises)) >> >> >>> > >>> for ii in xrange(N_elements): >> >> >>> > >>> for jj in xrange(ii+1, N_elements): >> >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], >> >> >>> > >>> data['element1'][jj],data['element2'][ii], >> >> >>> > >>> data['element2'][jj]) >> >> >>> > >>> >> >> >>> > >>> Is there an efficient way of using itertools with this >> >> structure? >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> >> >>> > >>> pyt...@li...> wrote: >> >> >>> > >>> >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> > >>> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > or, via email, send a message with subject or body 'help' >> to >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > You can reach the person managing the list at >> >> >>> > >>> > pyt...@li... >> >> >>> > >>> > >> >> >>> > >>> > When replying, please edit your Subject line so it is more >> >> >>> specific >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > Today's Topics: >> >> >>> > >>> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables (Josh >> Ayers) >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >> >> ---------------------------------------------------------------------- >> >> >>> > >>> > >> >> >>> > >>> > Message: 1 >> >> >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> >>> > >>> > From: Josh Ayers <jos...@gm...> >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> using >> >> >>> PyTables >> >> >>> > >>> > To: Discussion list for PyTables >> >> >>> > >>> > <pyt...@li...> >> >> >>> > >>> > Message-ID: >> >> >>> > >>> > < >> >> >>> > >>> > >> >> >>> CAC...@ma... >> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >>> > >>> > >> >> >>> > >>> > David, >> >> >>> > >>> > >> >> >>> > >>> > The change in issue 27 was only for iteration over a >> >> >>> tables.Column >> >> >>> > >>> > instance. To use it, tweak Anthony's code as follows. >> This >> >> will >> >> >>> > >>> iterate >> >> >>> > >>> > over the "element" column, as in your original example. >> >> >>> > >>> > >> >> >>> > >>> > Note also that this will only work with the development >> >> version >> >> >>> of >> >> >>> > >>> PyTables >> >> >>> > >>> > available on github. It will be very slow using the >> released >> >> >>> v2.4.0. >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > from itertools import izip >> >> >>> > >>> > >> >> >>> > >>> > with tb.openFile(...) as f: >> >> >>> > >>> > data = f.root.data.cols.element >> >> >>> > >>> > data_i = iter(data) >> >> >>> > >>> > data_j = iter(data) >> >> >>> > >>> > data_i.next() # throw the first value away >> >> >>> > >>> > for i, j in izip(data_i, data_j): >> >> >>> > >>> > compare(i, j) >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > Hope that helps, >> >> >>> > >>> > Josh >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> >> >>> sc...@gm...> >> >> >>> > >>> wrote: >> >> >>> > >>> > >> >> >>> > >>> > > HI David, >> >> >>> > >>> > > >> >> >>> > >>> > > Tables and table column iteration have been overhauled >> >> fairly >> >> >>> > >>> recently >> >> >>> > >>> > > [1]. So you might try creating two iterators, offset by >> >> one, >> >> >>> and >> >> >>> > >>> then >> >> >>> > >>> > > doing the comparison. I am hacking this out super quick >> so >> >> >>> please >> >> >>> > >>> > forgive >> >> >>> > >>> > > me: >> >> >>> > >>> > > >> >> >>> > >>> > > from itertools import izip >> >> >>> > >>> > > >> >> >>> > >>> > > with tb.openFile(...) as f: >> >> >>> > >>> > > data = f.root.data >> >> >>> > >>> > > data_i = iter(data) >> >> >>> > >>> > > data_j = iter(data) >> >> >>> > >>> > > data_i.next() # throw the first value away >> >> >>> > >>> > > for i, j in izip(data_i, data_j): >> >> >>> > >>> > > compare(i, j) >> >> >>> > >>> > > >> >> >>> > >>> > > You get the idea ;) >> >> >>> > >>> > > >> >> >>> > >>> > > Be Well >> >> >>> > >>> > > Anthony >> >> >>> > >>> > > >> >> >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27 >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >>> dav...@gm... >> >> >>> > > >> >> >>> > >>> > wrote: >> >> >>> > >>> > > >> >> >>> > >>> > >> I was hoping someone could help me out here. >> >> >>> > >>> > >> >> >> >>> > >>> > >> This is from a post I put up on StackOverflow, >> >> >>> > >>> > >> >> >> >>> > >>> > >> I am have a fairly large dataset that I store in HDF5 >> and >> >> >>> access >> >> >>> > >>> using >> >> >>> > >>> > >> PyTables. One operation I need to do on this dataset are >> >> >>> pairwise >> >> >>> > >>> > >> comparisons between each of the elements. This requires >> 2 >> >> >>> loops, >> >> >>> > >>> one to >> >> >>> > >>> > >> iterate over each element, and an inner loop to iterate >> >> over >> >> >>> every >> >> >>> > >>> other >> >> >>> > >>> > >> element. This operation thus looks at N(N-1)/2 >> comparisons. >> >> >>> > >>> > >> >> >> >>> > >>> > >> For fairly small sets I found it to be faster to dump >> the >> >> >>> contents >> >> >>> > >>> into >> >> >>> > >>> > a >> >> >>> > >>> > >> multdimensional numpy array and then do my iteration. I >> run >> >> >>> into >> >> >>> > >>> > problems >> >> >>> > >>> > >> with large sets because of memory issues and need to >> access >> >> >>> each >> >> >>> > >>> > element of >> >> >>> > >>> > >> the dataset at run time. >> >> >>> > >>> > >> >> >> >>> > >>> > >> Putting the elements into an array gives me about 600 >> >> >>> comparisons >> >> >>> > >>> per >> >> >>> > >>> > >> second, while operating on hdf5 data itself gives me >> about >> >> 300 >> >> >>> > >>> > comparisons >> >> >>> > >>> > >> per second. >> >> >>> > >>> > >> >> >> >>> > >>> > >> Is there a way to speed this process up? >> >> >>> > >>> > >> >> >> >>> > >>> > >> Example follows (this is not my real code, just an >> >> example): >> >> >>> > >>> > >> >> >> >>> > >>> > >> *Small Set*: >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> >> >>> > >>> > >> data = f.root.data >> >> >>> > >>> > >> >> >> >>> > >>> > >> N_elements = len(data) >> >> >>> > >>> > >> elements = np.empty((N_irises, 1e5)) >> >> >>> > >>> > >> >> >> >>> > >>> > >> for ii, d in enumerate(data): >> >> >>> > >>> > >> elements[ii] = data['element'] >> >> >>> > >>> > >> >> >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) for ii in >> >> >>> xrange(N_elements): >> >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): >> >> >>> > >>> > >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >>> > >>> > >> >> >> >>> > >>> > >> *Large Set*: >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> with tb.openFile(h5_file, 'r') as f: >> >> >>> > >>> > >> data = f.root.data >> >> >>> > >>> > >> >> >> >>> > >>> > >> N_elements = len(data) >> >> >>> > >>> > >> >> >> >>> > >>> > >> D = np.empty((N_irises, N_irises)) >> >> >>> > >>> > >> for ii in xrange(N_elements): >> >> >>> > >>> > >> for jj in xrange(ii+1, N_elements): >> >> >>> > >>> > >> D[ii, jj] = compare(data['element'][ii], >> >> >>> > >>> > data['element'][jj]) >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> >> >>> HTML5, >> >> >>> > >>> CSS, >> >> >>> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> >>> skills >> >> >>> > >>> current >> >> >>> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > >> MVPs and experts. ON SALE this month only -- learn more >> at: >> >> >>> > >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > >> _______________________________________________ >> >> >>> > >>> > >> Pytables-users mailing list >> >> >>> > >>> > >> Pyt...@li... >> >> >>> > >>> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > >> >> >> >>> > >>> > >> >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> >>> HTML5, >> >> >>> > CSS, >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > >>> current >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn more >> at: >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > > _______________________________________________ >> >> >>> > >>> > > Pytables-users mailing list >> >> >>> > >>> > > Pyt...@li... >> >> >>> > >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > > >> >> >>> > >>> > > >> >> >>> > >>> > -------------- next part -------------- >> >> >>> > >>> > An HTML attachment was scrubbed... >> >> >>> > >>> > >> >> >>> > >>> > ------------------------------ >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> >> HTML5, >> >> >>> CSS, >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> >> skills >> >> >>> > current >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >> >> >>> Microsoft >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> > >> >> >>> > >>> > ------------------------------ >> >> >>> > >>> > >> >> >>> > >>> > _______________________________________________ >> >> >>> > >>> > Pytables-users mailing list >> >> >>> > >>> > Pyt...@li... >> >> >>> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> > >> >> >>> > >>> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 3 >> >> >>> > >>> > ********************************************* >> >> >>> > >>> > >> >> >>> > >>> -------------- next part -------------- >> >> >>> > >>> An HTML attachment was scrubbed... >> >> >>> > >>> >> >> >>> > >>> ------------------------------ >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> >>> CSS, >> >> >>> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >> >>> current >> >> >>> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> Microsoft >> >> >>> > >>> MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >>> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >>> >> >> >>> > >>> ------------------------------ >> >> >>> > >>> >> >> >>> > >>> _______________________________________________ >> >> >>> > >>> Pytables-users mailing list >> >> >>> > >>> Pyt...@li... >> >> >>> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> End of Pytables-users Digest, Vol 80, Issue 4 >> >> >>> > >>> ********************************************* >> >> >>> > >>> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> >>> CSS, >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> skills >> >> >>> current >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> >> Microsoft >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> _______________________________________________ >> >> >>> > >> Pytables-users mailing list >> >> >>> > >> Pyt...@li... >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> >> >> >>> > >> >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> HTML5, >> >> CSS, >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >> >>> current >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > > _______________________________________________ >> >> >>> > > Pytables-users mailing list >> >> >>> > > Pyt...@li... >> >> >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > > >> >> >>> > > >> >> >>> > -------------- next part -------------- >> >> >>> > An HTML attachment was scrubbed... >> >> >>> > >> >> >>> > ------------------------------ >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >> >> CSS, >> >> >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >> >> current >> >> >>> > with LearnDevNow - 3,200 step-by-step video tutorials by >> Microsoft >> >> >>> > MVPs and experts. ON SALE this month only -- learn more at: >> >> >>> > http://p.sf.net/sfu/learnmore_122712 >> >> >>> > >> >> >>> > ------------------------------ >> >> >>> > >> >> >>> > _______________________________________________ >> >> >>> > Pytables-users mailing list >> >> >>> > Pyt...@li... >> >> >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> > >> >> >>> > >> >> >>> > End of Pytables-users Digest, Vol 80, Issue 8 >> >> >>> > ********************************************* >> >> >>> > >> >> >>> -------------- next part -------------- >> >> >>> An HTML attachment was scrubbed... >> >> >>> >> >> >>> ------------------------------ >> >> >>> >> >> >>> >> >> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and >> >> >>> much more. Get web development skills now with LearnDevNow - >> >> >>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and >> >> experts. >> >> >>> SALE $99.99 this month only -- learn more at: >> >> >>> http://p.sf.net/sfu/learnmore_122812 >> >> >>> >> >> >>> ------------------------------ >> >> >>> >> >> >>> _______________________________________________ >> >> >>> Pytables-users mailing list >> >> >>> Pyt...@li... >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> >> >> >>> >> >> >>> End of Pytables-users Digest, Vol 80, Issue 9 >> >> >>> ********************************************* >> >> >>> >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> >> ------------------------------------------------------------------------------ >> >> > Everyone hates slow websites. So do we. >> >> > Make your web apps faster with AppDynamics >> >> > Download AppDynamics Lite for free today: >> >> > http://p.sf.net/sfu/appdyn_d2d_jan >> >> > _______________________________________________ >> >> > Pytables-users mailing list >> >> > Pyt...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> > >> >> -------------- next part -------------- >> >> An HTML attachment was scrubbed... >> >> >> >> ------------------------------ >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Everyone hates slow websites. So do we. >> >> Make your web apps faster with AppDynamics >> >> Download AppDynamics Lite for free today: >> >> http://p.sf.net/sfu/appdyn_d2d_jan >> >> >> >> ------------------------------ >> >> >> >> _______________________________________________ >> >> Pytables-users > > ... > > [Message clipped] > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |