From: Anthony S. <sc...@gm...> - 2013-01-03 23:27:23
|
On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...> wrote: > Thanks a lot for the help so far guys! > > Looking at itertools, I found what I believe to be the perfect function > for what I need, itertools.combinations. This appears to be a valid > replacement to the method proposed. > Yes, combinations is awesome! > > There is a small problem that I didn't mention is that my compare function > actually takes as inputs 2 columns from the table. Like so: > > D = np.empty((N_irises, N_irises)) > for ii in xrange(N_elements): > for jj in xrange(ii+1, N_elements): > D[ii, jj] = compare(data['element1'][ii], data['element1'][jj],data['element2'][ii], > data['element2'][jj]) > > Is there an efficient way of using itertools with this structure? > You can always make two other iterators for each column. Since you have two columns you would have 4 iterators. I am not sure how fast this is going to be but I am confident that there is definitely a way to do this in one for-loop, which is going to be way faster than nested loops. Be Well Anthony > > > On Thu, Jan 3, 2013 at 1:29 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> From: Josh Ayers <jos...@gm...> >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAC...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> David, >> >> The change in issue 27 was only for iteration over a tables.Column >> instance. To use it, tweak Anthony's code as follows. This will iterate >> over the "element" column, as in your original example. >> >> Note also that this will only work with the development version of >> PyTables >> available on github. It will be very slow using the released v2.4.0. >> >> >> from itertools import izip >> >> with tb.openFile(...) as f: >> data = f.root.data.cols.element >> data_i = iter(data) >> data_j = iter(data) >> data_i.next() # throw the first value away >> for i, j in izip(data_i, data_j): >> compare(i, j) >> >> >> Hope that helps, >> Josh >> >> >> >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> >> wrote: >> >> > HI David, >> > >> > Tables and table column iteration have been overhauled fairly recently >> > [1]. So you might try creating two iterators, offset by one, and then >> > doing the comparison. I am hacking this out super quick so please >> forgive >> > me: >> > >> > from itertools import izip >> > >> > with tb.openFile(...) as f: >> > data = f.root.data >> > data_i = iter(data) >> > data_j = iter(data) >> > data_i.next() # throw the first value away >> > for i, j in izip(data_i, data_j): >> > compare(i, j) >> > >> > You get the idea ;) >> > >> > Be Well >> > Anthony >> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> >> wrote: >> > >> >> I was hoping someone could help me out here. >> >> >> >> This is from a post I put up on StackOverflow, >> >> >> >> I am have a fairly large dataset that I store in HDF5 and access using >> >> PyTables. One operation I need to do on this dataset are pairwise >> >> comparisons between each of the elements. This requires 2 loops, one to >> >> iterate over each element, and an inner loop to iterate over every >> other >> >> element. This operation thus looks at N(N-1)/2 comparisons. >> >> >> >> For fairly small sets I found it to be faster to dump the contents >> into a >> >> multdimensional numpy array and then do my iteration. I run into >> problems >> >> with large sets because of memory issues and need to access each >> element of >> >> the dataset at run time. >> >> >> >> Putting the elements into an array gives me about 600 comparisons per >> >> second, while operating on hdf5 data itself gives me about 300 >> comparisons >> >> per second. >> >> >> >> Is there a way to speed this process up? >> >> >> >> Example follows (this is not my real code, just an example): >> >> >> >> *Small Set*: >> >> >> >> >> >> with tb.openFile(h5_file, 'r') as f: >> >> data = f.root.data >> >> >> >> N_elements = len(data) >> >> elements = np.empty((N_irises, 1e5)) >> >> >> >> for ii, d in enumerate(data): >> >> elements[ii] = data['element'] >> >> >> >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >> >> for jj in xrange(ii+1, N_elements): >> >> D[ii, jj] = compare(elements[ii], elements[jj]) >> >> >> >> *Large Set*: >> >> >> >> >> >> with tb.openFile(h5_file, 'r') as f: >> >> data = f.root.data >> >> >> >> N_elements = len(data) >> >> >> >> D = np.empty((N_irises, N_irises)) >> >> for ii in xrange(N_elements): >> >> for jj in xrange(ii+1, N_elements): >> >> D[ii, jj] = compare(data['element'][ii], >> data['element'][jj]) >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> >> MVPs and experts. ON SALE this month only -- learn more at: >> >> http://p.sf.net/sfu/learnmore_122712 >> >> _______________________________________________ >> >> Pytables-users mailing list >> >> Pyt...@li... >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> > MVPs and experts. ON SALE this month only -- learn more at: >> > http://p.sf.net/sfu/learnmore_122712 >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> >> ------------------------------ >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> End of Pytables-users Digest, Vol 80, Issue 3 >> ********************************************* >> > > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |