From: David R. <dav...@gm...> - 2013-01-03 21:25:51
|
I apologize if I'm starting to sound helpless, but I'm forced to work on Windows 7 at work and have never had luck compiling python source successfully. I have had to rely on precompiled binaries and now its biting me in the butt. Is there any quick fix I can do to improve this iteration using v2.4.0? On Thu, Jan 3, 2013 at 3:17 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed) > 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 3 Jan 2013 13:44:29 -0500 > From: David Reed <dav...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2 > To: pyt...@li... > Message-ID: > <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > ev...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Thanks Anthony, but unless Im missing something I don't think that method > will work since this will only be comparing the ith element with ith+1 > element. I still need 2 for loops right? > > Using itertools might speed things up though, I've never used them so I > will give it a shot and let you know how it goes. Looks like I need to > download the latest release before I do that too. Thanks for the help. > > -Dave > > > > On Thu, Jan 3, 2013 at 12:12 PM, < > pyt...@li...> wrote: > > > Send Pytables-users mailing list submissions to > > pyt...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > or, via email, send a message with subject or body 'help' to > > pyt...@li... > > > > You can reach the person managing the list at > > pyt...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Pytables-users digest..." > > > > > > Today's Topics: > > > > 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 3 Jan 2013 11:11:47 -0600 > > From: Anthony Scopatz <sc...@gm...> > > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > <CAPk-6T5b= > > 1EG...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > HI David, > > > > Tables and table column iteration have been overhauled fairly recently > [1]. > > So you might try creating two iterators, offset by one, and then doing > the > > comparison. I am hacking this out super quick so please forgive me: > > > > from itertools import izip > > > > with tb.openFile(...) as f: > > data = f.root.data > > data_i = iter(data) > > data_j = iter(data) > > data_i.next() # throw the first value away > > for i, j in izip(data_i, data_j): > > compare(i, j) > > > > You get the idea ;) > > > > Be Well > > Anthony > > > > 1. https://github.com/PyTables/PyTables/issues/27 > > > > > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > wrote: > > > > > I was hoping someone could help me out here. > > > > > > This is from a post I put up on StackOverflow, > > > > > > I am have a fairly large dataset that I store in HDF5 and access using > > > PyTables. One operation I need to do on this dataset are pairwise > > > comparisons between each of the elements. This requires 2 loops, one to > > > iterate over each element, and an inner loop to iterate over every > other > > > element. This operation thus looks at N(N-1)/2 comparisons. > > > > > > For fairly small sets I found it to be faster to dump the contents > into a > > > multdimensional numpy array and then do my iteration. I run into > problems > > > with large sets because of memory issues and need to access each > element > > of > > > the dataset at run time. > > > > > > Putting the elements into an array gives me about 600 comparisons per > > > second, while operating on hdf5 data itself gives me about 300 > > comparisons > > > per second. > > > > > > Is there a way to speed this process up? > > > > > > Example follows (this is not my real code, just an example): > > > > > > *Small Set*: > > > > > > > > > with tb.openFile(h5_file, 'r') as f: > > > data = f.root.data > > > > > > N_elements = len(data) > > > elements = np.empty((N_irises, 1e5)) > > > > > > for ii, d in enumerate(data): > > > elements[ii] = data['element'] > > > > > > D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > > for jj in xrange(ii+1, N_elements): > > > D[ii, jj] = compare(elements[ii], elements[jj]) > > > > > > *Large Set*: > > > > > > > > > with tb.openFile(h5_file, 'r') as f: > > > data = f.root.data > > > > > > N_elements = len(data) > > > > > > D = np.empty((N_irises, N_irises)) > > > for ii in xrange(N_elements): > > > for jj in xrange(ii+1, N_elements): > > > D[ii, jj] = compare(data['element'][ii], > > data['element'][jj]) > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > > MVPs and experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Pytables-users mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > > > ------------------------------ > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > End of Pytables-users Digest, Vol 80, Issue 2 > > ********************************************* > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Thu, 3 Jan 2013 15:17:01 -0500 > From: David Reed <dav...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3 > To: pyt...@li... > Message-ID: > < > CAM...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Thanks a lot for the help so far guys! > > Looking at itertools, I found what I believe to be the perfect function for > what I need, itertools.combinations. This appears to be a valid replacement > to the method proposed. > > There is a small problem that I didn't mention is that my compare function > actually takes as inputs 2 columns from the table. Like so: > > D = np.empty((N_irises, N_irises)) > for ii in xrange(N_elements): > for jj in xrange(ii+1, N_elements): > D[ii, jj] = compare(data['element1'][ii], > data['element1'][jj],data['element2'][ii], > data['element2'][jj]) > > Is there an efficient way of using itertools with this structure? > > > On Thu, Jan 3, 2013 at 1:29 PM, < > pyt...@li...> wrote: > > > Send Pytables-users mailing list submissions to > > pyt...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > or, via email, send a message with subject or body 'help' to > > pyt...@li... > > > > You can reach the person managing the list at > > pyt...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Pytables-users digest..." > > > > > > Today's Topics: > > > > 1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 3 Jan 2013 10:29:33 -0800 > > From: Josh Ayers <jos...@gm...> > > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables > > To: Discussion list for PyTables > > <pyt...@li...> > > Message-ID: > > < > > CAC...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > David, > > > > The change in issue 27 was only for iteration over a tables.Column > > instance. To use it, tweak Anthony's code as follows. This will iterate > > over the "element" column, as in your original example. > > > > Note also that this will only work with the development version of > PyTables > > available on github. It will be very slow using the released v2.4.0. > > > > > > from itertools import izip > > > > with tb.openFile(...) as f: > > data = f.root.data.cols.element > > data_i = iter(data) > > data_j = iter(data) > > data_i.next() # throw the first value away > > for i, j in izip(data_i, data_j): > > compare(i, j) > > > > > > Hope that helps, > > Josh > > > > > > > > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> > wrote: > > > > > HI David, > > > > > > Tables and table column iteration have been overhauled fairly recently > > > [1]. So you might try creating two iterators, offset by one, and then > > > doing the comparison. I am hacking this out super quick so please > > forgive > > > me: > > > > > > from itertools import izip > > > > > > with tb.openFile(...) as f: > > > data = f.root.data > > > data_i = iter(data) > > > data_j = iter(data) > > > data_i.next() # throw the first value away > > > for i, j in izip(data_i, data_j): > > > compare(i, j) > > > > > > You get the idea ;) > > > > > > Be Well > > > Anthony > > > > > > 1. https://github.com/PyTables/PyTables/issues/27 > > > > > > > > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...> > > wrote: > > > > > >> I was hoping someone could help me out here. > > >> > > >> This is from a post I put up on StackOverflow, > > >> > > >> I am have a fairly large dataset that I store in HDF5 and access using > > >> PyTables. One operation I need to do on this dataset are pairwise > > >> comparisons between each of the elements. This requires 2 loops, one > to > > >> iterate over each element, and an inner loop to iterate over every > other > > >> element. This operation thus looks at N(N-1)/2 comparisons. > > >> > > >> For fairly small sets I found it to be faster to dump the contents > into > > a > > >> multdimensional numpy array and then do my iteration. I run into > > problems > > >> with large sets because of memory issues and need to access each > > element of > > >> the dataset at run time. > > >> > > >> Putting the elements into an array gives me about 600 comparisons per > > >> second, while operating on hdf5 data itself gives me about 300 > > comparisons > > >> per second. > > >> > > >> Is there a way to speed this process up? > > >> > > >> Example follows (this is not my real code, just an example): > > >> > > >> *Small Set*: > > >> > > >> > > >> with tb.openFile(h5_file, 'r') as f: > > >> data = f.root.data > > >> > > >> N_elements = len(data) > > >> elements = np.empty((N_irises, 1e5)) > > >> > > >> for ii, d in enumerate(data): > > >> elements[ii] = data['element'] > > >> > > >> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): > > >> for jj in xrange(ii+1, N_elements): > > >> D[ii, jj] = compare(elements[ii], elements[jj]) > > >> > > >> *Large Set*: > > >> > > >> > > >> with tb.openFile(h5_file, 'r') as f: > > >> data = f.root.data > > >> > > >> N_elements = len(data) > > >> > > >> D = np.empty((N_irises, N_irises)) > > >> for ii in xrange(N_elements): > > >> for jj in xrange(ii+1, N_elements): > > >> D[ii, jj] = compare(data['element'][ii], > > data['element'][jj]) > > >> > > >> > > >> > > >> > > > ------------------------------------------------------------------------------ > > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > >> MVPs and experts. ON SALE this month only -- learn more at: > > >> http://p.sf.net/sfu/learnmore_122712 > > >> _______________________________________________ > > >> Pytables-users mailing list > > >> Pyt...@li... > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > >> > > >> > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > > MVPs and experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Pytables-users mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > > > ------------------------------ > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > End of Pytables-users Digest, Vol 80, Issue 3 > > ********************************************* > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 80, Issue 4 > ********************************************* > |