From: Anthony S. <sc...@gm...> - 2013-01-03 23:15:08
|
Yup, that is right, thanks Josh! On Thu, Jan 3, 2013 at 12:29 PM, Josh Ayers <jos...@gm...> wrote: > David, > > The change in issue 27 was only for iteration over a tables.Column > instance. To use it, tweak Anthony's code as follows. This will iterate > over the "element" column, as in your original example. > > Note also that this will only work with the development version of > PyTables available on github. It will be very slow using the released > v2.4.0. > > > from itertools import izip > > with tb.openFile(...) as f: > data = f.root.data.cols.element > data_i = iter(data) > data_j = iter(data) > data_i.next() # throw the first value away > for i, j in izip(data_i, data_j): > compare(i, j) > > > Hope that helps, > Josh > > > > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...> wrote: > >> HI David, >> >> Tables and table column iteration have been overhauled fairly recently >> [1]. So you might try creating two iterators, offset by one, and then >> doing the comparison. I am hacking this out super quick so please forgive >> me: >> >> from itertools import izip >> >> with tb.openFile(...) as f: >> data = f.root.data >> data_i = iter(data) >> data_j = iter(data) >> data_i.next() # throw the first value away >> for i, j in izip(data_i, data_j): >> compare(i, j) >> >> You get the idea ;) >> >> Be Well >> Anthony >> >> 1. https://github.com/PyTables/PyTables/issues/27 >> >> >> On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>wrote: >> >>> I was hoping someone could help me out here. >>> >>> This is from a post I put up on StackOverflow, >>> >>> I am have a fairly large dataset that I store in HDF5 and access using >>> PyTables. One operation I need to do on this dataset are pairwise >>> comparisons between each of the elements. This requires 2 loops, one to >>> iterate over each element, and an inner loop to iterate over every other >>> element. This operation thus looks at N(N-1)/2 comparisons. >>> >>> For fairly small sets I found it to be faster to dump the contents into >>> a multdimensional numpy array and then do my iteration. I run into problems >>> with large sets because of memory issues and need to access each element of >>> the dataset at run time. >>> >>> Putting the elements into an array gives me about 600 comparisons per >>> second, while operating on hdf5 data itself gives me about 300 comparisons >>> per second. >>> >>> Is there a way to speed this process up? >>> >>> Example follows (this is not my real code, just an example): >>> >>> *Small Set*: >>> >>> >>> >>> with tb.openFile(h5_file, 'r') as f: >>> data = f.root.data >>> >>> N_elements = len(data) >>> elements = np.empty((N_irises, 1e5)) >>> >>> for ii, d in enumerate(data): >>> elements[ii] = data['element'] >>> >>> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements): >>> for jj in xrange(ii+1, N_elements): >>> D[ii, jj] = compare(elements[ii], elements[jj]) >>> >>> *Large Set*: >>> >>> >>> >>> with tb.openFile(h5_file, 'r') as f: >>> data = f.root.data >>> >>> N_elements = len(data) >>> >>> D = np.empty((N_irises, N_irises)) >>> for ii in xrange(N_elements): >>> for jj in xrange(ii+1, N_elements): >>> D[ii, jj] = compare(data['element'][ii], data['element'][jj]) >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> MVPs and experts. ON SALE this month only -- learn more at: >>> http://p.sf.net/sfu/learnmore_122712 >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122712 >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |