Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 9

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I think I have to reopen this issue.  I have been running fine for awhile
using the combinations method from itertools, but have recently run into a
memory since I have recently quadrupled the size of the hdf file.

Here is my code again:

        from itertools import combinations, izip
 with tb.openFile(h5_all, 'r') as f:
 irises = f.root.irises

templates = f.root.irises.cols.templates
masks = f.root.irises.cols.masks1

N_irises = len(irises)
 index = np.ones((20 * 480), np.bool)

print '%i Comparisons' % (N_irises*(N_irises - 1)/2)
D = np.empty((N_irises, N_irises))
 for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks,
range(N_irises)), 2):
# print ii
 D[ii, jj] = ham_dist(
t1[8, index],
t2[:, index],
 m1[8, index],
m2[:, index],
)

And here is the error:

In [10]: get_hd3()
10669890 Comparisons
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-10-cfb255ce7bd1> in <module>()
----> 1 get_hd3()

    118                 print '%i Comparisons' % (N_irises*(N_irises - 1)/2)
    119                 D = np.empty((N_irises, N_irises))
--> 120                 for (t1, m1, ii), (t2, m2, jj) in
combinations(izip(temp
lates, masks, range(N_irises)), 2):
    121                         # print ii
    122                         D[ii, jj] = ham_dist(

c:\python27\lib\site-packages\tables\table.pyc in __iter__(self)
   3274         for start_row in xrange(0, len(self), nrowsinbuf):
   3275             end_row = min([start_row + nrowsinbuf, max_row])
-> 3276             buf = table.read(start_row, end_row, 1,
field=self.pathname)

   3277             for row in buf:
   3278                 yield row

c:\python27\lib\site-packages\tables\table.pyc in read(self, start, stop,
step,
field)
   1772         (start, stop, step) = self._processRangeRead(start, stop,
step)
   1773
-> 1774         arr = self._read(start, stop, step, field)
   1775         return internal_to_flavor(arr, self.flavor)
   1776

c:\python27\lib\site-packages\tables\table.pyc in _read(self, start, stop,
step,
 field)
   1719         if field:
   1720             # Create a container for the results
-> 1721             result = numpy.empty(shape=nrows, dtype=dtypeField)
   1722         else:
   1723             # Recarray case

MemoryError:
> c:\python27\lib\site-packages\tables\table.py(1721)_read()
   1720             # Create a container for the results
-> 1721             result = numpy.empty(shape=nrows, dtype=dtypeField)
   1722         else:

Also, if you guys see any performance problems in my code, please let me
know.

Thank you so much for the help.

-Dave

On Fri, Jan 4, 2013 at 8:57 AM, <
pyt...@li...> wrote:

> Send Pytables-users mailing list submissions to
>         pyt...@li...
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> or, via email, send a message with subject or body 'help' to
>         pyt...@li...
>
> You can reach the person managing the list at
>         pyt...@li...
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Pytables-users digest..."
>
>
> Today's Topics:
>
>    1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 4 Jan 2013 08:56:28 -0500
> From: David Reed <dav...@gm...>
> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 8
> To: pyt...@li...
> Message-ID:
>         <
> CAM...@ma...>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I can't thank you guys enough for the help.  I was able to add the __iter__
> function to the table.py file and everything seems to be working great!
>  I'm not quite as fast as I was with iterating right of a matrix but pretty
> close.  I was at 555 comparisons per second, and now im at 420.
>
> I handled the problem I mentioned earlier by doing this, and it seems to
> work great:
>
> A = f.root.data.cols.A
> B = f.root.data.cols.B
>
> D = np.empty((len(A), len(A))
> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, range(len(A))),
> 2):
>   D[ii, jj] = compare(a1, a2, b1, b2)
>
> Again, thanks a lot.
>
> -Dave
>
>
> On Thu, Jan 3, 2013 at 6:31 PM, <
> pyt...@li...> wrote:
>
> > Send Pytables-users mailing list submissions to
> >         pyt...@li...
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
> > or, via email, send a message with subject or body 'help' to
> >         pyt...@li...
> >
> > You can reach the person managing the list at
> >         pyt...@li...
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Pytables-users digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony Scopatz)
> >    2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony Scopatz)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Thu, 3 Jan 2013 17:26:55 -0600
> > From: Anthony Scopatz <sc...@gm...>
> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3
> > To: Discussion list for PyTables
> >         <pyt...@li...>
> > Message-ID:
> >         <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ=
> > Gz...@ma...>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <dav...@gm...>
> wrote:
> >
> > > Thanks a lot for the help so far guys!
> > >
> > > Looking at itertools, I found what I believe to be the perfect function
> > > for what I need, itertools.combinations. This appears to be a valid
> > > replacement to the method proposed.
> > >
> >
> > Yes, combinations is awesome!
> >
> >
> > >
> > > There is a small problem that I didn't mention is that my compare
> > function
> > > actually takes as inputs 2 columns from the table. Like so:
> > >
> > > D = np.empty((N_irises, N_irises))
> > > for ii in xrange(N_elements):
> > >     for jj in xrange(ii+1, N_elements):
> > >          D[ii, jj] = compare(data['element1'][ii],
> > data['element1'][jj],data['element2'][ii],
> > > data['element2'][jj])
> > >
> > > Is there an efficient way of using itertools with this structure?
> > >
> >
> > You can always make two other iterators for each column.  Since you have
> > two columns you would have 4 iterators.  I am not sure how fast this is
> > going to be but I am confident that there is definitely a way to do this
> in
> > one for-loop, which is going to be way faster than nested loops.
> >
> > Be Well
> > Anthony
> >
> >
> > >
> > >
> > > On Thu, Jan 3, 2013 at 1:29 PM, <
> > > pyt...@li...> wrote:
> > >
> > >> Send Pytables-users mailing list submissions to
> > >>         pyt...@li...
> > >>
> > >> To subscribe or unsubscribe via the World Wide Web, visit
> > >>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> or, via email, send a message with subject or body 'help' to
> > >>         pyt...@li...
> > >>
> > >> You can reach the person managing the list at
> > >>         pyt...@li...
> > >>
> > >> When replying, please edit your Subject line so it is more specific
> > >> than "Re: Contents of Pytables-users digest..."
> > >>
> > >>
> > >> Today's Topics:
> > >>
> > >>    1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)
> > >>
> > >>
> > >> ----------------------------------------------------------------------
> > >>
> > >> Message: 1
> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800
> > >> From: Josh Ayers <jos...@gm...>
> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
> > >> To: Discussion list for PyTables
> > >>         <pyt...@li...>
> > >> Message-ID:
> > >>         <
> > >> CAC...@ma...>
> > >> Content-Type: text/plain; charset="iso-8859-1"
> > >>
> > >> David,
> > >>
> > >> The change in issue 27 was only for iteration over a tables.Column
> > >> instance.  To use it, tweak Anthony's code as follows.  This will
> > iterate
> > >> over the "element" column, as in your original example.
> > >>
> > >> Note also that this will only work with the development version of
> > >> PyTables
> > >> available on github.  It will be very slow using the released v2.4.0.
> > >>
> > >>
> > >> from itertools import izip
> > >>
> > >> with tb.openFile(...) as f:
> > >>     data = f.root.data.cols.element
> > >>     data_i = iter(data)
> > >>     data_j = iter(data)
> > >>     data_i.next() # throw the first value away
> > >>     for i, j in izip(data_i, data_j):
> > >>         compare(i, j)
> > >>
> > >>
> > >> Hope that helps,
> > >> Josh
> > >>
> > >>
> > >>
> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...>
> > >> wrote:
> > >>
> > >> > HI David,
> > >> >
> > >> > Tables and table column iteration have been overhauled fairly
> recently
> > >> > [1].  So you might try creating two iterators, offset by one, and
> then
> > >> > doing the comparison.  I am hacking this out super quick so please
> > >> forgive
> > >> > me:
> > >> >
> > >> > from itertools import izip
> > >> >
> > >> > with tb.openFile(...) as f:
> > >> >     data = f.root.data
> > >> >     data_i = iter(data)
> > >> >     data_j = iter(data)
> > >> >     data_i.next() # throw the first value away
> > >> >     for i, j in izip(data_i, data_j):
> > >> >         compare(i, j)
> > >> >
> > >> > You get the idea ;)
> > >> >
> > >> > Be Well
> > >> > Anthony
> > >> >
> > >> > 1. https://github.com/PyTables/PyTables/issues/27
> > >> >
> > >> >
> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>
> > >> wrote:
> > >> >
> > >> >> I was hoping someone could help me out here.
> > >> >>
> > >> >> This is from a post I put up on StackOverflow,
> > >> >>
> > >> >> I am have a fairly large dataset that I store in HDF5 and access
> > using
> > >> >> PyTables. One operation I need to do on this dataset are pairwise
> > >> >> comparisons between each of the elements. This requires 2 loops,
> one
> > to
> > >> >> iterate over each element, and an inner loop to iterate over every
> > >> other
> > >> >> element. This operation thus looks at N(N-1)/2 comparisons.
> > >> >>
> > >> >> For fairly small sets I found it to be faster to dump the contents
> > >> into a
> > >> >> multdimensional numpy array and then do my iteration. I run into
> > >> problems
> > >> >> with large sets because of memory issues and need to access each
> > >> element of
> > >> >> the dataset at run time.
> > >> >>
> > >> >> Putting the elements into an array gives me about 600 comparisons
> per
> > >> >> second, while operating on hdf5 data itself gives me about 300
> > >> comparisons
> > >> >> per second.
> > >> >>
> > >> >> Is there a way to speed this process up?
> > >> >>
> > >> >> Example follows (this is not my real code, just an example):
> > >> >>
> > >> >> *Small Set*:
> > >> >>
> > >> >>
> > >> >> with tb.openFile(h5_file, 'r') as f:
> > >> >>     data = f.root.data
> > >> >>
> > >> >>     N_elements = len(data)
> > >> >>     elements = np.empty((N_irises, 1e5))
> > >> >>
> > >> >>     for ii, d in enumerate(data):
> > >> >>         elements[ii] = data['element']
> > >> >>
> > >> >> D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
> > >> >>     for jj in xrange(ii+1, N_elements):
> > >> >>         D[ii, jj] = compare(elements[ii], elements[jj])
> > >> >>
> > >> >>  *Large Set*:
> > >> >>
> > >> >>
> > >> >> with tb.openFile(h5_file, 'r') as f:
> > >> >>     data = f.root.data
> > >> >>
> > >> >>     N_elements = len(data)
> > >> >>
> > >> >>     D = np.empty((N_irises, N_irises))
> > >> >>     for ii in xrange(N_elements):
> > >> >>         for jj in xrange(ii+1, N_elements):
> > >> >>              D[ii, jj] = compare(data['element'][ii],
> > >> data['element'][jj])
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5,
> CSS,
> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> > current
> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > >> >> MVPs and experts. ON SALE this month only -- learn more at:
> > >> >> http://p.sf.net/sfu/learnmore_122712
> > >> >> _______________________________________________
> > >> >> Pytables-users mailing list
> > >> >> Pyt...@li...
> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >> >
> > >>
> >
> ------------------------------------------------------------------------------
> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5,
> CSS,
> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> > current
> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > >> > MVPs and experts. ON SALE this month only -- learn more at:
> > >> > http://p.sf.net/sfu/learnmore_122712
> > >> > _______________________________________________
> > >> > Pytables-users mailing list
> > >> > Pyt...@li...
> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >
> > >> >
> > >> -------------- next part --------------
> > >> An HTML attachment was scrubbed...
> > >>
> > >> ------------------------------
> > >>
> > >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> current
> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > >> MVPs and experts. ON SALE this month only -- learn more at:
> > >> http://p.sf.net/sfu/learnmore_122712
> > >>
> > >> ------------------------------
> > >>
> > >> _______________________________________________
> > >> Pytables-users mailing list
> > >> Pyt...@li...
> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>
> > >>
> > >> End of Pytables-users Digest, Vol 80, Issue 3
> > >> *********************************************
> > >>
> > >
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > > MVPs and experts. ON SALE this month only -- learn more at:
> > > http://p.sf.net/sfu/learnmore_122712
> > > _______________________________________________
> > > Pytables-users mailing list
> > > Pyt...@li...
> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Thu, 3 Jan 2013 17:30:59 -0600
> > From: Anthony Scopatz <sc...@gm...>
> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4
> > To: Discussion list for PyTables
> >         <pyt...@li...>
> > Message-ID:
> >         <
> > CAP...@ma...>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Josh is right that you can just edit the code by hand (which works but
> > sucks).
> >
> > However, on Windows -- on the rare occasion when I also have to develop
> on
> > it -- I typically use a distribution that includes a compiler, cython,
> > hdf5, and pytables already and then I install my development version from
> > github OVER this.  I recommend either EPD or Anaconda, though other
> > distributions listed here [1] might also work.
> >
> > Be well
> > Anthony
> >
> > 1. http://numfocus.org/projects-2/software-distributions/
> >
> >
> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote:
> >
> > > The change was in pure Python code, so you should be able to just paste
> > in
> > > the changes to your local copy.  Start with the table.Column.__iter__
> > > method (lines 3296-3310) here.
> > >
> > >
> > >
> >
> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py
> > >
> > > It needs to be modified slightly because it uses some additional
> features
> > > that aren't available in the released version (the out=buf_slice
> argument
> > > to table.read).  The following should work.
> > >
> > > def __iter__(self):
> > >         table = self.table
> > >         itemsize = self.dtype.itemsize
> > >         nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize
> > >         max_row = len(self)
> > >         for start_row in xrange(0, len(self), nrowsinbuf):
> > >             end_row = min([start_row + nrowsinbuf, max_row])
> > >             buf = table.read(start_row, end_row, 1,
> field=self.pathname)
> > >             for row in buf:
> > >                 yield row
> > >
> > >
> > > I haven't tested this, but I think it will work.
> > >
> > > Josh
> > >
> > >
> > >
> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...>
> > wrote:
> > >
> > >> I apologize if I'm starting to sound helpless, but I'm forced to work
> on
> > >> Windows 7 at work and have never had luck compiling python source
> > >> successfully.  I have had to rely on precompiled binaries and now its
> > >> biting me in the butt.
> > >>
> > >> Is there any quick fix I can do to improve this iteration using
> v2.4.0?
> > >>
> > >>
> > >> On Thu, Jan 3, 2013 at 3:17 PM, <
> > >> pyt...@li...> wrote:
> > >>
> > >>> Send Pytables-users mailing list submissions to
> > >>>         pyt...@li...
> > >>>
> > >>> To subscribe or unsubscribe via the World Wide Web, visit
> > >>>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>> or, via email, send a message with subject or body 'help' to
> > >>>         pyt...@li...
> > >>>
> > >>> You can reach the person managing the list at
> > >>>         pyt...@li...
> > >>>
> > >>> When replying, please edit your Subject line so it is more specific
> > >>> than "Re: Contents of Pytables-users digest..."
> > >>>
> > >>>
> > >>> Today's Topics:
> > >>>
> > >>>    1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed)
> > >>>    2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed)
> > >>>
> > >>>
> > >>>
> ----------------------------------------------------------------------
> > >>>
> > >>> Message: 1
> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500
> > >>> From: David Reed <dav...@gm...>
> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2
> > >>> To: pyt...@li...
> > >>> Message-ID:
> > >>>         <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
> > >>> ev...@ma...>
> > >>> Content-Type: text/plain; charset="iso-8859-1"
> > >>>
> > >>> Thanks Anthony, but unless Im missing something I don't think that
> > method
> > >>> will work since this will only be comparing the ith element with
> ith+1
> > >>> element.  I still need 2 for loops right?
> > >>>
> > >>> Using itertools might speed things up though, I've never used them
> so I
> > >>> will give it a shot and let you know how it goes.  Looks like I need
> to
> > >>> download the latest release before I do that too.  Thanks for the
> help.
> > >>>
> > >>> -Dave
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Jan 3, 2013 at 12:12 PM, <
> > >>> pyt...@li...> wrote:
> > >>>
> > >>> > Send Pytables-users mailing list submissions to
> > >>> >         pyt...@li...
> > >>> >
> > >>> > To subscribe or unsubscribe via the World Wide Web, visit
> > >>> >
> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>> > or, via email, send a message with subject or body 'help' to
> > >>> >         pyt...@li...
> > >>> >
> > >>> > You can reach the person managing the list at
> > >>> >         pyt...@li...
> > >>> >
> > >>> > When replying, please edit your Subject line so it is more specific
> > >>> > than "Re: Contents of Pytables-users digest..."
> > >>> >
> > >>> >
> > >>> > Today's Topics:
> > >>> >
> > >>> >    1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)
> > >>> >
> > >>> >
> > >>> >
> > ----------------------------------------------------------------------
> > >>> >
> > >>> > Message: 1
> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600
> > >>> > From: Anthony Scopatz <sc...@gm...>
> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using
> PyTables
> > >>> > To: Discussion list for PyTables
> > >>> >         <pyt...@li...>
> > >>> > Message-ID:
> > >>> >         <CAPk-6T5b=
> > >>> > 1EG...@ma...>
> > >>> > Content-Type: text/plain; charset="iso-8859-1"
> > >>> >
> > >>> > HI David,
> > >>> >
> > >>> > Tables and table column iteration have been overhauled fairly
> > recently
> > >>> [1].
> > >>> >  So you might try creating two iterators, offset by one, and then
> > >>> doing the
> > >>> > comparison.  I am hacking this out super quick so please forgive
> me:
> > >>> >
> > >>> > from itertools import izip
> > >>> >
> > >>> > with tb.openFile(...) as f:
> > >>> >     data = f.root.data
> > >>> >     data_i = iter(data)
> > >>> >     data_j = iter(data)
> > >>> >     data_i.next() # throw the first value away
> > >>> >     for i, j in izip(data_i, data_j):
> > >>> >         compare(i, j)
> > >>> >
> > >>> > You get the idea ;)
> > >>> >
> > >>> > Be Well
> > >>> > Anthony
> > >>> >
> > >>> > 1. https://github.com/PyTables/PyTables/issues/27
> > >>> >
> > >>> >
> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...
> >
> > >>> wrote:
> > >>> >
> > >>> > > I was hoping someone could help me out here.
> > >>> > >
> > >>> > > This is from a post I put up on StackOverflow,
> > >>> > >
> > >>> > > I am have a fairly large dataset that I store in HDF5 and access
> > >>> using
> > >>> > > PyTables. One operation I need to do on this dataset are pairwise
> > >>> > > comparisons between each of the elements. This requires 2 loops,
> > one
> > >>> to
> > >>> > > iterate over each element, and an inner loop to iterate over
> every
> > >>> other
> > >>> > > element. This operation thus looks at N(N-1)/2 comparisons.
> > >>> > >
> > >>> > > For fairly small sets I found it to be faster to dump the
> contents
> > >>> into a
> > >>> > > multdimensional numpy array and then do my iteration. I run into
> > >>> problems
> > >>> > > with large sets because of memory issues and need to access each
> > >>> element
> > >>> > of
> > >>> > > the dataset at run time.
> > >>> > >
> > >>> > > Putting the elements into an array gives me about 600 comparisons
> > per
> > >>> > > second, while operating on hdf5 data itself gives me about 300
> > >>> > comparisons
> > >>> > > per second.
> > >>> > >
> > >>> > > Is there a way to speed this process up?
> > >>> > >
> > >>> > > Example follows (this is not my real code, just an example):
> > >>> > >
> > >>> > > *Small Set*:
> > >>> > >
> > >>> > >
> > >>> > > with tb.openFile(h5_file, 'r') as f:
> > >>> > >     data = f.root.data
> > >>> > >
> > >>> > >     N_elements = len(data)
> > >>> > >     elements = np.empty((N_irises, 1e5))
> > >>> > >
> > >>> > >     for ii, d in enumerate(data):
> > >>> > >         elements[ii] = data['element']
> > >>> > >
> > >>> > > D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
> > >>> > >     for jj in xrange(ii+1, N_elements):
> > >>> > >         D[ii, jj] = compare(elements[ii], elements[jj])
> > >>> > >
> > >>> > >  *Large Set*:
> > >>> > >
> > >>> > >
> > >>> > > with tb.openFile(h5_file, 'r') as f:
> > >>> > >     data = f.root.data
> > >>> > >
> > >>> > >     N_elements = len(data)
> > >>> > >
> > >>> > >     D = np.empty((N_irises, N_irises))
> > >>> > >     for ii in xrange(N_elements):
> > >>> > >         for jj in xrange(ii+1, N_elements):
> > >>> > >              D[ii, jj] = compare(data['element'][ii],
> > >>> > data['element'][jj])
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> >
> > >>>
> >
> ------------------------------------------------------------------------------
> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5,
> > CSS,
> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> > >>> current
> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by
> Microsoft
> > >>> > > MVPs and experts. ON SALE this month only -- learn more at:
> > >>> > > http://p.sf.net/sfu/learnmore_122712
> > >>> > > _______________________________________________
> > >>> > > Pytables-users mailing list
> > >>> > > Pyt...@li...
> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>> > >
> > >>> > >
> > >>> > -------------- next part --------------
> > >>> > An HTML attachment was scrubbed...
> > >>> >
> > >>> > ------------------------------
> > >>> >
> > >>> >
> > >>> >
> > >>>
> >
> ------------------------------------------------------------------------------
> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5,
> CSS,
> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> > current
> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > >>> > MVPs and experts. ON SALE this month only -- learn more at:
> > >>> > http://p.sf.net/sfu/learnmore_122712
> > >>> >
> > >>> > ------------------------------
> > >>> >
> > >>> > _______________________________________________
> > >>> > Pytables-users mailing list
> > >>> > Pyt...@li...
> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>> >
> > >>> >
> > >>> > End of Pytables-users Digest, Vol 80, Issue 2
> > >>> > *********************************************
> > >>> >
> > >>> -------------- next part --------------
> > >>> An HTML attachment was scrubbed...
> > >>>
> > >>> ------------------------------
> > >>>
> > >>> Message: 2
> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500
> > >>> From: David Reed <dav...@gm...>
> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3
> > >>> To: pyt...@li...
> > >>> Message-ID:
> > >>>         <
> > >>> CAM...@ma...>
> > >>> Content-Type: text/plain; charset="iso-8859-1"
> > >>>
> > >>> Thanks a lot for the help so far guys!
> > >>>
> > >>> Looking at itertools, I found what I believe to be the perfect
> function
> > >>> for
> > >>> what I need, itertools.combinations. This appears to be a valid
> > >>> replacement
> > >>> to the method proposed.
> > >>>
> > >>> There is a small problem that I didn't mention is that my compare
> > >>> function
> > >>> actually takes as inputs 2 columns from the table. Like so:
> > >>>
> > >>> D = np.empty((N_irises, N_irises))
> > >>> for ii in xrange(N_elements):
> > >>>     for jj in xrange(ii+1, N_elements):
> > >>>          D[ii, jj] = compare(data['element1'][ii],
> > >>> data['element1'][jj],data['element2'][ii],
> > >>> data['element2'][jj])
> > >>>
> > >>> Is there an efficient way of using itertools with this structure?
> > >>>
> > >>>
> > >>> On Thu, Jan 3, 2013 at 1:29 PM, <
> > >>> pyt...@li...> wrote:
> > >>>
> > >>> > Send Pytables-users mailing list submissions to
> > >>> >         pyt...@li...
> > >>> >
> > >>> > To subscribe or unsubscribe via the World Wide Web, visit
> > >>> >
> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>> > or, via email, send a message with subject or body 'help' to
> > >>> >         pyt...@li...
> > >>> >
> > >>> > You can reach the person managing the list at
> > >>> >         pyt...@li...
> > >>> >
> > >>> > When replying, please edit your Subject line so it is more specific
> > >>> > than "Re: Contents of Pytables-users digest..."
> > >>> >
> > >>> >
> > >>> > Today's Topics:
> > >>> >
> > >>> >    1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)
> > >>> >
> > >>> >
> > >>> >
> > ----------------------------------------------------------------------
> > >>> >
> > >>> > Message: 1
> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800
> > >>> > From: Josh Ayers <jos...@gm...>
> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using
> PyTables
> > >>> > To: Discussion list for PyTables
> > >>> >         <pyt...@li...>
> > >>> > Message-ID:
> > >>> >         <
> > >>> > CAC...@ma...
> >
> > >>> > Content-Type: text/plain; charset="iso-8859-1"
> > >>> >
> > >>> > David,
> > >>> >
> > >>> > The change in issue 27 was only for iteration over a tables.Column
> > >>> > instance.  To use it, tweak Anthony's code as follows.  This will
> > >>> iterate
> > >>> > over the "element" column, as in your original example.
> > >>> >
> > >>> > Note also that this will only work with the development version of
> > >>> PyTables
> > >>> > available on github.  It will be very slow using the released
> v2.4.0.
> > >>> >
> > >>> >
> > >>> > from itertools import izip
> > >>> >
> > >>> > with tb.openFile(...) as f:
> > >>> >     data = f.root.data.cols.element
> > >>> >     data_i = iter(data)
> > >>> >     data_j = iter(data)
> > >>> >     data_i.next() # throw the first value away
> > >>> >     for i, j in izip(data_i, data_j):
> > >>> >         compare(i, j)
> > >>> >
> > >>> >
> > >>> > Hope that helps,
> > >>> > Josh
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...
> >
> > >>> wrote:
> > >>> >
> > >>> > > HI David,
> > >>> > >
> > >>> > > Tables and table column iteration have been overhauled fairly
> > >>> recently
> > >>> > > [1].  So you might try creating two iterators, offset by one, and
> > >>> then
> > >>> > > doing the comparison.  I am hacking this out super quick so
> please
> > >>> > forgive
> > >>> > > me:
> > >>> > >
> > >>> > > from itertools import izip
> > >>> > >
> > >>> > > with tb.openFile(...) as f:
> > >>> > >     data = f.root.data
> > >>> > >     data_i = iter(data)
> > >>> > >     data_j = iter(data)
> > >>> > >     data_i.next() # throw the first value away
> > >>> > >     for i, j in izip(data_i, data_j):
> > >>> > >         compare(i, j)
> > >>> > >
> > >>> > > You get the idea ;)
> > >>> > >
> > >>> > > Be Well
> > >>> > > Anthony
> > >>> > >
> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27
> > >>> > >
> > >>> > >
> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <
> dav...@gm...
> > >
> > >>> > wrote:
> > >>> > >
> > >>> > >> I was hoping someone could help me out here.
> > >>> > >>
> > >>> > >> This is from a post I put up on StackOverflow,
> > >>> > >>
> > >>> > >> I am have a fairly large dataset that I store in HDF5 and access
> > >>> using
> > >>> > >> PyTables. One operation I need to do on this dataset are
> pairwise
> > >>> > >> comparisons between each of the elements. This requires 2 loops,
> > >>> one to
> > >>> > >> iterate over each element, and an inner loop to iterate over
> every
> > >>> other
> > >>> > >> element. This operation thus looks at N(N-1)/2 comparisons.
> > >>> > >>
> > >>> > >> For fairly small sets I found it to be faster to dump the
> contents
> > >>> into
> > >>> > a
> > >>> > >> multdimensional numpy array and then do my iteration. I run into
> > >>> > problems
> > >>> > >> with large sets because of memory issues and need to access each
> > >>> > element of
> > >>> > >> the dataset at run time.
> > >>> > >>
> > >>> > >> Putting the elements into an array gives me about 600
> comparisons
> > >>> per
> > >>> > >> second, while operating on hdf5 data itself gives me about 300
> > >>> > comparisons
> > >>> > >> per second.
> > >>> > >>
> > >>> > >> Is there a way to speed this process up?
> > >>> > >>
> > >>> > >> Example follows (this is not my real code, just an example):
> > >>> > >>
> > >>> > >> *Small Set*:
> > >>> > >>
> > >>> > >>
> > >>> > >> with tb.openFile(h5_file, 'r') as f:
> > >>> > >>     data = f.root.data
> > >>> > >>
> > >>> > >>     N_elements = len(data)
> > >>> > >>     elements = np.empty((N_irises, 1e5))
> > >>> > >>
> > >>> > >>     for ii, d in enumerate(data):
> > >>> > >>         elements[ii] = data['element']
> > >>> > >>
> > >>> > >> D = np.empty((N_irises, N_irises))  for ii in
> xrange(N_elements):
> > >>> > >>     for jj in xrange(ii+1, N_elements):
> > >>> > >>         D[ii, jj] = compare(elements[ii], elements[jj])
> > >>> > >>
> > >>> > >>  *Large Set*:
> > >>> > >>
> > >>> > >>
> > >>> > >> with tb.openFile(h5_file, 'r') as f:
> > >>> > >>     data = f.root.data
> > >>> > >>
> > >>> > >>     N_elements = len(data)
> > >>> > >>
> > >>> > >>     D = np.empty((N_irises, N_irises))
> > >>> > >>     for ii in xrange(N_elements):
> > >>> > >>         for jj in xrange(ii+1, N_elements):
> > >>> > >>              D[ii, jj] = compare(data['element'][ii],
> > >>> > data['element'][jj])
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> >
> > >>>
> >
> ------------------------------------------------------------------------------
> > >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5,
> > >>> CSS,
> > >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> > >>> current
> > >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by
> Microsoft
> > >>> > >> MVPs and experts. ON SALE this month only -- learn more at:
> > >>> > >> http://p.sf.net/sfu/learnmore_122712
> > >>> > >> _______________________________________________
> > >>> > >> Pytables-users mailing list
> > >>> > >> Pyt...@li...
> > >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>> > >>
> > >>> > >>
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> >
> > >>>
> >
> ------------------------------------------------------------------------------
> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5,
> > CSS,
> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> > >>> current
> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by
> Microsoft
> > >>> > > MVPs and experts. ON SALE this month only -- learn more at:
> > >>> > > http://p.sf.net/sfu/learnmore_122712
> > >>> > > _______________________________________________
> > >>> > > Pytables-users mailing list
> > >>> > > Pyt...@li...
> > >>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>> > >
> > >>> > >
> > >>> > -------------- next part --------------
> > >>> > An HTML attachment was scrubbed...
> > >>> >
> > >>> > ------------------------------
> > >>> >
> > >>> >
> > >>> >
> > >>>
> >
> ------------------------------------------------------------------------------
> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5,
> CSS,
> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> > current
> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > >>> > MVPs and experts. ON SALE this month only -- learn more at:
> > >>> > http://p.sf.net/sfu/learnmore_122712
> > >>> >
> > >>> > ------------------------------
> > >>> >
> > >>> > _______________________________________________
> > >>> > Pytables-users mailing list
> > >>> > Pyt...@li...
> > >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>> >
> > >>> >
> > >>> > End of Pytables-users Digest, Vol 80, Issue 3
> > >>> > *********************************************
> > >>> >
> > >>> -------------- next part --------------
> > >>> An HTML attachment was scrubbed...
> > >>>
> > >>> ------------------------------
> > >>>
> > >>>
> > >>>
> >
> ------------------------------------------------------------------------------
> > >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> current
> > >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > >>> MVPs and experts. ON SALE this month only -- learn more at:
> > >>> http://p.sf.net/sfu/learnmore_122712
> > >>>
> > >>> ------------------------------
> > >>>
> > >>> _______________________________________________
> > >>> Pytables-users mailing list
> > >>> Pyt...@li...
> > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>>
> > >>>
> > >>> End of Pytables-users Digest, Vol 80, Issue 4
> > >>> *********************************************
> > >>>
> > >>
> > >>
> > >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> current
> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > >> MVPs and experts. ON SALE this month only -- learn more at:
> > >> http://p.sf.net/sfu/learnmore_122712
> > >> _______________________________________________
> > >> Pytables-users mailing list
> > >> Pyt...@li...
> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>
> > >>
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > > MVPs and experts. ON SALE this month only -- learn more at:
> > > http://p.sf.net/sfu/learnmore_122712
> > > _______________________________________________
> > > Pytables-users mailing list
> > > Pyt...@li...
> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> >
> >
> ------------------------------------------------------------------------------
> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > MVPs and experts. ON SALE this month only -- learn more at:
> > http://p.sf.net/sfu/learnmore_122712
> >
> > ------------------------------
> >
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> > End of Pytables-users Digest, Vol 80, Issue 8
> > *********************************************
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
> much more. Get web development skills now with LearnDevNow -
> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
> SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122812
>
> ------------------------------
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> End of Pytables-users Digest, Vol 80, Issue 9
> *********************************************
>