Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> wrote:

> Hi Josh,
>
> Here is my __iter__ code:
>
> def __iter__(self):
>         table = self.table
>         itemsize = self.dtype.itemsize
>         nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize
>         max_row = len(self)
>         for start_row in xrange(0, len(self), nrowsinbuf):
>             end_row = min([start_row + nrowsinbuf, max_row])
>             buf = table.read(start_row, end_row, 1, field=self.pathname)
>             for row in buf:
>                 yield row
>
> It does look different, I will try swapping in the code from github and
> see what happens.
>

Yes, please let us know how that goes!  Otherwise send the list both the
test data generator script and the script that fails.

Be Well
Anthony

>
>
> On Mon, Feb 4, 2013 at 9:59 AM, <
> pyt...@li...> wrote:
>
>> Send Pytables-users mailing list submissions to
>>         pyt...@li...
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://lists.sourceforge.net/lists/listinfo/pytables-users
>> or, via email, send a message with subject or body 'help' to
>>         pyt...@li...
>>
>> You can reach the person managing the list at
>>         pyt...@li...
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Pytables-users digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers)
>>    2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 1 Feb 2013 14:08:47 -0800
>> From: Josh Ayers <jos...@gm...>
>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4
>> To: Discussion list for PyTables
>>         <pyt...@li...>
>> Message-ID:
>>         <CACOB4aPG4NZ6b2a3v=
>> 1Ue...@ma...>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> David,
>>
>> You added a custom version of table.Column.__iter__, correct?  Could you
>> also include that along with the script to reproduce the error?
>>
>> It seems like the problem may be in the 'nrowsinbuf' calculation - see
>> [1].  Each of your rows is 17 x 9600 = 163200 bytes.  If you're using the
>> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6
>> chunks.  Instead, it's reading the entire table.
>>
>> [1]:
>> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296
>>
>>
>>
>> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...>
>> wrote:
>>
>> >
>> >
>> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...>
>> wrote:
>> >
>> >> at the error:
>> >>
>> >> result = numpy.empty(shape=nrows, dtype=dtypeField)
>> >>
>> >> nrows = 4620 and dtypeField is ('bool', (17, 9600))
>> >>
>> >> I'm not sure what that means as a dtype, but thats what it is.
>> >>
>> >> Forgive me if I'm being totally naive, but I thought the whole point of
>> >> __iter__ with pyttables was to do iteration on the fly, so there is no
>> >> preallocation.
>> >>
>> >
>> > Nope you are not being naive at all.  That is the point.
>> >
>> >
>> >>  If you have any ideas on this I'm all ears.
>> >>
>> >
>> > If you could send a minimal script which reproduces this error, that
>> would
>> > help a lot.
>> >
>> > Be Well
>> > Anthony
>> >
>> >
>> >>
>> >>
>> >>  Thanks again.
>> >>
>> >> Dave
>> >>
>> >>
>> >> On Fri, Feb 1, 2013 at 3:45 PM, <
>> >> pyt...@li...> wrote:
>> >>
>> >>> Send Pytables-users mailing list submissions to
>> >>>         pyt...@li...
>> >>>
>> >>> To subscribe or unsubscribe via the World Wide Web, visit
>> >>>         https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >>> or, via email, send a message with subject or body 'help' to
>> >>>         pyt...@li...
>> >>>
>> >>> You can reach the person managing the list at
>> >>>         pyt...@li...
>> >>>
>> >>> When replying, please edit your Subject line so it is more specific
>> >>> than "Re: Contents of Pytables-users digest..."
>> >>>
>> >>>
>> >>> Today's Topics:
>> >>>
>> >>>    1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz)
>> >>>
>> >>>
>> >>> ----------------------------------------------------------------------
>> >>>
>> >>> Message: 1
>> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600
>> >>> From: Anthony Scopatz <sc...@gm...>
>> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2
>> >>> To: Discussion list for PyTables
>> >>>         <pyt...@li...>
>> >>> Message-ID:
>> >>>         <
>> >>> CAP...@ma...>
>> >>> Content-Type: text/plain; charset="iso-8859-1"
>> >>>
>> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...>
>> >>> wrote:
>> >>>
>> >>> > Hi Anthony,
>> >>> >
>> >>> > Thanks for the reply.
>> >>> >
>> >>> > I honestly don't know how to monitor my Python memory usage, but I'm
>> >>> sure
>> >>> > that its caused by out of memory.
>> >>> >
>> >>>
>> >>> Well, I would just run top or process monitor or something while
>> running
>> >>> the python script to see what happens to memory usage as the script
>> chugs
>> >>> along...
>> >>>
>> >>>
>> >>> >  I'm just trying to find out how to fix it.  My HDF5 table has 4620
>> >>> rows
>> >>> > and the column I'm iterating over is a 17x9600 boolean matrix.  The
>> >>> > __iter__ method is preallocating an array that is this size which
>> >>> appears
>> >>> > to be root of the error.  I was hoping there is a fix somewhere in
>> >>> here to
>> >>> > not have to do this preallocation.
>> >>> >
>> >>>
>> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space.  4620 of
>> >>> these is ~760 MB.  If you have 2 GB of memory and you are iterating
>> over
>> >>> 2
>> >>> of these (templates & masks) it is conceivable that you are just
>> running
>> >>> out of memory.  Maybe there is a way that __iter__ could not
>> preallocate
>> >>> something that is basically a temporary.  What is the dtype of the
>> >>> templates array?
>> >>>
>> >>> Be Well
>> >>> Anthony
>> >>>
>> >>>
>> >>> >
>> >>> > Thanks again.
>> >>>
>> >>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Mon, 4 Feb 2013 09:58:53 -0500
>> From: David Reed <dav...@gm...>
>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6
>> To: pyt...@li...
>> Message-ID:
>>         <CAM6XA7=
>> h50...@ma...>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi Anthony,
>>
>> Sorry to just get back to you. I can send a script, should I send a script
>> that creates some fake data as well?
>>
>> -Dave
>>
>>
>> On Fri, Feb 1, 2013 at 4:50 PM, <
>> pyt...@li...> wrote:
>>
>> > Send Pytables-users mailing list submissions to
>> >         pyt...@li...
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > or, via email, send a message with subject or body 'help' to
>> >         pyt...@li...
>> >
>> > You can reach the person managing the list at
>> >         pyt...@li...
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of Pytables-users digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> >    1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz)
>> >
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Message: 1
>> > Date: Fri, 1 Feb 2013 15:50:11 -0600
>> > From: Anthony Scopatz <sc...@gm...>
>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4
>> > To: Discussion list for PyTables
>> >         <pyt...@li...>
>> > Message-ID:
>> >         <
>> > CAP...@ma...>
>> > Content-Type: text/plain; charset="iso-8859-1"
>> >
>> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...>
>> wrote:
>> >
>> > > at the error:
>> > >
>> > > result = numpy.empty(shape=nrows, dtype=dtypeField)
>> > >
>> > > nrows = 4620 and dtypeField is ('bool', (17, 9600))
>> > >
>> > > I'm not sure what that means as a dtype, but thats what it is.
>> > >
>> > > Forgive me if I'm being totally naive, but I thought the whole point
>> of
>> > > __iter__ with pyttables was to do iteration on the fly, so there is no
>> > > preallocation.
>> > >
>> >
>> > Nope you are not being naive at all.  That is the point.
>> >
>> >
>> > >  If you have any ideas on this I'm all ears.
>> > >
>> >
>> > If you could send a minimal script which reproduces this error, that
>> would
>> > help a lot.
>> >
>> > Be Well
>> > Anthony
>> >
>> >
>> > >
>> > >
>> > >  Thanks again.
>> > >
>> > > Dave
>> > >
>> > >
>> > > On Fri, Feb 1, 2013 at 3:45 PM, <
>> > > pyt...@li...> wrote:
>> > >
>> > >> Send Pytables-users mailing list submissions to
>> > >>         pyt...@li...
>> > >>
>> > >> To subscribe or unsubscribe via the World Wide Web, visit
>> > >>         https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> or, via email, send a message with subject or body 'help' to
>> > >>         pyt...@li...
>> > >>
>> > >> You can reach the person managing the list at
>> > >>         pyt...@li...
>> > >>
>> > >> When replying, please edit your Subject line so it is more specific
>> > >> than "Re: Contents of Pytables-users digest..."
>> > >>
>> > >>
>> > >> Today's Topics:
>> > >>
>> > >>    1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz)
>> > >>
>> > >>
>> > >>
>> ----------------------------------------------------------------------
>> > >>
>> > >> Message: 1
>> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600
>> > >> From: Anthony Scopatz <sc...@gm...>
>> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2
>> > >> To: Discussion list for PyTables
>> > >>         <pyt...@li...>
>> > >> Message-ID:
>> > >>         <
>> > >> CAP...@ma...>
>> > >> Content-Type: text/plain; charset="iso-8859-1"
>> > >>
>> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...>
>> > >> wrote:
>> > >>
>> > >> > Hi Anthony,
>> > >> >
>> > >> > Thanks for the reply.
>> > >> >
>> > >> > I honestly don't know how to monitor my Python memory usage, but
>> I'm
>> > >> sure
>> > >> > that its caused by out of memory.
>> > >> >
>> > >>
>> > >> Well, I would just run top or process monitor or something while
>> running
>> > >> the python script to see what happens to memory usage as the script
>> > chugs
>> > >> along...
>> > >>
>> > >>
>> > >> >  I'm just trying to find out how to fix it.  My HDF5 table has 4620
>> > rows
>> > >> > and the column I'm iterating over is a 17x9600 boolean matrix.  The
>> > >> > __iter__ method is preallocating an array that is this size which
>> > >> appears
>> > >> > to be root of the error.  I was hoping there is a fix somewhere in
>> > here
>> > >> to
>> > >> > not have to do this preallocation.
>> > >> >
>> > >>
>> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space.  4620
>> of
>> > >> these is ~760 MB.  If you have 2 GB of memory and you are iterating
>> > over 2
>> > >> of these (templates & masks) it is conceivable that you are just
>> running
>> > >> out of memory.  Maybe there is a way that __iter__ could not
>> preallocate
>> > >> something that is basically a temporary.  What is the dtype of the
>> > >> templates array?
>> > >>
>> > >> Be Well
>> > >> Anthony
>> > >>
>> > >>
>> > >> >
>> > >> > Thanks again.
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Fri, Feb 1, 2013 at 11:12 AM, <
>> > >> > pyt...@li...> wrote:
>> > >> >
>> > >> >> Send Pytables-users mailing list submissions to
>> > >> >>         pyt...@li...
>> > >> >>
>> > >> >> To subscribe or unsubscribe via the World Wide Web, visit
>> > >> >>
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> or, via email, send a message with subject or body 'help' to
>> > >> >>         pyt...@li...
>> > >> >>
>> > >> >> You can reach the person managing the list at
>> > >> >>         pyt...@li...
>> > >> >>
>> > >> >> When replying, please edit your Subject line so it is more
>> specific
>> > >> >> than "Re: Contents of Pytables-users digest..."
>> > >> >>
>> > >> >>
>> > >> >> Today's Topics:
>> > >> >>
>> > >> >>    1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz)
>> > >> >>
>> > >> >>
>> > >> >>
>> > ----------------------------------------------------------------------
>> > >> >>
>> > >> >> Message: 1
>> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600
>> > >> >> From: Anthony Scopatz <sc...@gm...>
>> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80,
>> Issue 9
>> > >> >> To: Discussion list for PyTables
>> > >> >>         <pyt...@li...>
>> > >> >> Message-ID:
>> > >> >>         <
>> > >> >>
>> CAP...@ma...>
>> > >> >> Content-Type: text/plain; charset="iso-8859-1"
>> > >> >>
>> > >> >> Hi David,
>> > >> >>
>> > >> >> Sorry, I haven't had a ton of time recently.  You seem to be
>> getting
>> > a
>> > >> >> memory error on creating a numpy array.  This kind of thing
>> typically
>> > >> >> happens when you are out of memory.  Does this seem to be the case
>> > with
>> > >> >> you?  When this dies, is your memory usage at 100%?  If so, this
>> > >> algorithm
>> > >> >> might require a little tweaking...
>> > >> >>
>> > >> >> Be Well
>> > >> >> Anthony
>> > >> >>
>> > >> >>
>> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <
>> dav...@gm...>
>> > >> >> wrote:
>> > >> >>
>> > >> >> > I'm still having problems with this one.  I can't tell if this
>> > >> something
>> > >> >> > dumb Im doing with itertools, or if its something in pytables.
>> > >> >> >
>> > >> >> > Would appreciate any help.
>> > >> >> >
>> > >> >> > Thanks
>> > >> >> >
>> > >> >> >
>> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed <
>> > dav...@gm...
>> > >> >> >wrote:
>> > >> >> >
>> > >> >> >> I think I have to reopen this issue.  I have been running fine
>> for
>> > >> >> awhile
>> > >> >> >> using the combinations method from itertools, but have recently
>> > run
>> > >> >> into a
>> > >> >> >> memory since I have recently quadrupled the size of the hdf
>> file.
>> > >> >> >>
>> > >> >> >> Here is my code again:
>> > >> >> >>
>> > >> >> >>         from itertools import combinations, izip
>> > >> >> >>  with tb.openFile(h5_all, 'r') as f:
>> > >> >> >>  irises = f.root.irises
>> > >> >> >>
>> > >> >> >> templates = f.root.irises.cols.templates
>> > >> >> >> masks = f.root.irises.cols.masks1
>> > >> >> >>
>> > >> >> >> N_irises = len(irises)
>> > >> >> >>  index = np.ones((20 * 480), np.bool)
>> > >> >> >>
>> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2)
>> > >> >> >> D = np.empty((N_irises, N_irises))
>> > >> >> >>  for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates,
>> > >> masks,
>> > >> >> >> range(N_irises)), 2):
>> > >> >> >> # print ii
>> > >> >> >>  D[ii, jj] = ham_dist(
>> > >> >> >> t1[8, index],
>> > >> >> >> t2[:, index],
>> > >> >> >>  m1[8, index],
>> > >> >> >> m2[:, index],
>> > >> >> >> )
>> > >> >> >>
>> > >> >> >> And here is the error:
>> > >> >> >>
>> > >> >> >> In [10]: get_hd3()
>> > >> >> >> 10669890 Comparisons
>> > >> >> >>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> ---------------------------------------------------------------------------
>> > >> >> >> MemoryError                               Traceback (most
>> recent
>> > >> call
>> > >> >> >> last)
>> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>()
>> > >> >> >> ----> 1 get_hd3()
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>     118                 print '%i Comparisons' %
>> > >> (N_irises*(N_irises -
>> > >> >> >> 1)/2)
>> > >> >> >>     119                 D = np.empty((N_irises, N_irises))
>> > >> >> >> --> 120                 for (t1, m1, ii), (t2, m2, jj) in
>> > >> >> >> combinations(izip(temp
>> > >> >> >> lates, masks, range(N_irises)), 2):
>> > >> >> >>     121                         # print ii
>> > >> >> >>     122                         D[ii, jj] = ham_dist(
>> > >> >> >>
>> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in
>> __iter__(self)
>> > >> >> >>    3274         for start_row in xrange(0, len(self),
>> nrowsinbuf):
>> > >> >> >>    3275             end_row = min([start_row + nrowsinbuf,
>> > max_row])
>> > >> >> >> -> 3276             buf = table.read(start_row, end_row, 1,
>> > >> >> >> field=self.pathname)
>> > >> >> >>
>> > >> >> >>    3277             for row in buf:
>> > >> >> >>    3278                 yield row
>> > >> >> >>
>> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self,
>> > start,
>> > >> >> stop,
>> > >> >> >> step,
>> > >> >> >> field)
>> > >> >> >>    1772         (start, stop, step) =
>> > self._processRangeRead(start,
>> > >> >> stop,
>> > >> >> >> step)
>> > >> >> >>    1773
>> > >> >> >> -> 1774         arr = self._read(start, stop, step, field)
>> > >> >> >>    1775         return internal_to_flavor(arr, self.flavor)
>> > >> >> >>    1776
>> > >> >> >>
>> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self,
>> > start,
>> > >> >> >> stop, step,
>> > >> >> >>  field)
>> > >> >> >>    1719         if field:
>> > >> >> >>    1720             # Create a container for the results
>> > >> >> >> -> 1721             result = numpy.empty(shape=nrows,
>> > >> dtype=dtypeField)
>> > >> >> >>    1722         else:
>> > >> >> >>    1723             # Recarray case
>> > >> >> >>
>> > >> >> >> MemoryError:
>> > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read()
>> > >> >> >>    1720             # Create a container for the results
>> > >> >> >> -> 1721             result = numpy.empty(shape=nrows,
>> > >> dtype=dtypeField)
>> > >> >> >>    1722         else:
>> > >> >> >>
>> > >> >> >> Also, if you guys see any performance problems in my code,
>> please
>> > >> let
>> > >> >> me
>> > >> >> >> know.
>> > >> >> >>
>> > >> >> >> Thank you so much for the help.
>> > >> >> >>
>> > >> >> >> -Dave
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, <
>> > >> >> >> pyt...@li...> wrote:
>> > >> >> >>
>> > >> >> >>> Send Pytables-users mailing list submissions to
>> > >> >> >>>         pyt...@li...
>> > >> >> >>>
>> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit
>> > >> >> >>>
>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> or, via email, send a message with subject or body 'help' to
>> > >> >> >>>         pyt...@li...
>> > >> >> >>>
>> > >> >> >>> You can reach the person managing the list at
>> > >> >> >>>         pyt...@li...
>> > >> >> >>>
>> > >> >> >>> When replying, please edit your Subject line so it is more
>> > specific
>> > >> >> >>> than "Re: Contents of Pytables-users digest..."
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> Today's Topics:
>> > >> >> >>>
>> > >> >> >>>    1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed)
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> ----------------------------------------------------------------------
>> > >> >> >>>
>> > >> >> >>> Message: 1
>> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500
>> > >> >> >>> From: David Reed <dav...@gm...>
>> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80,
>> > Issue
>> > >> 8
>> > >> >> >>> To: pyt...@li...
>> > >> >> >>> Message-ID:
>> > >> >> >>>         <
>> > >> >> >>>
>> > CAM...@ma...
>> > >> >
>> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1"
>> > >> >> >>>
>> > >> >> >>> I can't thank you guys enough for the help.  I was able to add
>> > the
>> > >> >> >>> __iter__
>> > >> >> >>> function to the table.py file and everything seems to be
>> working
>> > >> >> great!
>> > >> >> >>>  I'm not quite as fast as I was with iterating right of a
>> matrix
>> > >> but
>> > >> >> >>> pretty
>> > >> >> >>> close.  I was at 555 comparisons per second, and now im at
>> 420.
>> > >> >> >>>
>> > >> >> >>> I handled the problem I mentioned earlier by doing this, and
>> it
>> > >> seems
>> > >> >> to
>> > >> >> >>> work great:
>> > >> >> >>>
>> > >> >> >>> A = f.root.data.cols.A
>> > >> >> >>> B = f.root.data.cols.B
>> > >> >> >>>
>> > >> >> >>> D = np.empty((len(A), len(A))
>> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B,
>> > >> >> range(len(A))),
>> > >> >> >>> 2):
>> > >> >> >>>   D[ii, jj] = compare(a1, a2, b1, b2)
>> > >> >> >>>
>> > >> >> >>> Again, thanks a lot.
>> > >> >> >>>
>> > >> >> >>> -Dave
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, <
>> > >> >> >>> pyt...@li...> wrote:
>> > >> >> >>>
>> > >> >> >>> > Send Pytables-users mailing list submissions to
>> > >> >> >>> >         pyt...@li...
>> > >> >> >>> >
>> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit
>> > >> >> >>> >
>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > or, via email, send a message with subject or body 'help' to
>> > >> >> >>> >         pyt...@li...
>> > >> >> >>> >
>> > >> >> >>> > You can reach the person managing the list at
>> > >> >> >>> >         pyt...@li...
>> > >> >> >>> >
>> > >> >> >>> > When replying, please edit your Subject line so it is more
>> > >> specific
>> > >> >> >>> > than "Re: Contents of Pytables-users digest..."
>> > >> >> >>> >
>> > >> >> >>> >
>> > >> >> >>> > Today's Topics:
>> > >> >> >>> >
>> > >> >> >>> >    1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony
>> > >> Scopatz)
>> > >> >> >>> >    2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony
>> > >> Scopatz)
>> > >> >> >>> >
>> > >> >> >>> >
>> > >> >> >>> >
>> > >> >>
>> > ----------------------------------------------------------------------
>> > >> >> >>> >
>> > >> >> >>> > Message: 1
>> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600
>> > >> >> >>> > From: Anthony Scopatz <sc...@gm...>
>> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80,
>> > >> Issue 3
>> > >> >> >>> > To: Discussion list for PyTables
>> > >> >> >>> >         <pyt...@li...>
>> > >> >> >>> > Message-ID:
>> > >> >> >>> >         <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ=
>> > >> >> >>> > Gz...@ma...>
>> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1"
>> > >> >> >>> >
>> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <
>> > >> dav...@gm...>
>> > >> >> >>> wrote:
>> > >> >> >>> >
>> > >> >> >>> > > Thanks a lot for the help so far guys!
>> > >> >> >>> > >
>> > >> >> >>> > > Looking at itertools, I found what I believe to be the
>> > perfect
>> > >> >> >>> function
>> > >> >> >>> > > for what I need, itertools.combinations. This appears to
>> be a
>> > >> >> valid
>> > >> >> >>> > > replacement to the method proposed.
>> > >> >> >>> > >
>> > >> >> >>> >
>> > >> >> >>> > Yes, combinations is awesome!
>> > >> >> >>> >
>> > >> >> >>> >
>> > >> >> >>> > >
>> > >> >> >>> > > There is a small problem that I didn't mention is that my
>> > >> compare
>> > >> >> >>> > function
>> > >> >> >>> > > actually takes as inputs 2 columns from the table. Like
>> so:
>> > >> >> >>> > >
>> > >> >> >>> > > D = np.empty((N_irises, N_irises))
>> > >> >> >>> > > for ii in xrange(N_elements):
>> > >> >> >>> > >     for jj in xrange(ii+1, N_elements):
>> > >> >> >>> > >          D[ii, jj] = compare(data['element1'][ii],
>> > >> >> >>> > data['element1'][jj],data['element2'][ii],
>> > >> >> >>> > > data['element2'][jj])
>> > >> >> >>> > >
>> > >> >> >>> > > Is there an efficient way of using itertools with this
>> > >> structure?
>> > >> >> >>> > >
>> > >> >> >>> >
>> > >> >> >>> > You can always make two other iterators for each column.
>>  Since
>> > >> you
>> > >> >> >>> have
>> > >> >> >>> > two columns you would have 4 iterators.  I am not sure how
>> fast
>> > >> >> this is
>> > >> >> >>> > going to be but I am confident that there is definitely a
>> way
>> > to
>> > >> do
>> > >> >> >>> this in
>> > >> >> >>> > one for-loop, which is going to be way faster than nested
>> > loops.
>> > >> >> >>> >
>> > >> >> >>> > Be Well
>> > >> >> >>> > Anthony
>> > >> >> >>> >
>> > >> >> >>> >
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, <
>> > >> >> >>> > > pyt...@li...> wrote:
>> > >> >> >>> > >
>> > >> >> >>> > >> Send Pytables-users mailing list submissions to
>> > >> >> >>> > >>         pyt...@li...
>> > >> >> >>> > >>
>> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit
>> > >> >> >>> > >>
>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >> or, via email, send a message with subject or body
>> 'help' to
>> > >> >> >>> > >>         pyt...@li...
>> > >> >> >>> > >>
>> > >> >> >>> > >> You can reach the person managing the list at
>> > >> >> >>> > >>         pyt...@li...
>> > >> >> >>> > >>
>> > >> >> >>> > >> When replying, please edit your Subject line so it is
>> more
>> > >> >> specific
>> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..."
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >> Today's Topics:
>> > >> >> >>> > >>
>> > >> >> >>> > >>    1. Re: Nested Iteration of HDF5 using PyTables (Josh
>> > Ayers)
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>>
>> > >>
>> ----------------------------------------------------------------------
>> > >> >> >>> > >>
>> > >> >> >>> > >> Message: 1
>> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800
>> > >> >> >>> > >> From: Josh Ayers <jos...@gm...>
>> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5
>> using
>> > >> >> >>> PyTables
>> > >> >> >>> > >> To: Discussion list for PyTables
>> > >> >> >>> > >>         <pyt...@li...>
>> > >> >> >>> > >> Message-ID:
>> > >> >> >>> > >>         <
>> > >> >> >>> > >>
>> > >> >>
>> CAC...@ma...>
>> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1"
>> > >> >> >>> > >>
>> > >> >> >>> > >> David,
>> > >> >> >>> > >>
>> > >> >> >>> > >> The change in issue 27 was only for iteration over a
>> > >> >> tables.Column
>> > >> >> >>> > >> instance.  To use it, tweak Anthony's code as follows.
>>  This
>> > >> will
>> > >> >> >>> > iterate
>> > >> >> >>> > >> over the "element" column, as in your original example.
>> > >> >> >>> > >>
>> > >> >> >>> > >> Note also that this will only work with the development
>> > >> version
>> > >> >> of
>> > >> >> >>> > >> PyTables
>> > >> >> >>> > >> available on github.  It will be very slow using the
>> > released
>> > >> >> >>> v2.4.0.
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >> from itertools import izip
>> > >> >> >>> > >>
>> > >> >> >>> > >> with tb.openFile(...) as f:
>> > >> >> >>> > >>     data = f.root.data.cols.element
>> > >> >> >>> > >>     data_i = iter(data)
>> > >> >> >>> > >>     data_j = iter(data)
>> > >> >> >>> > >>     data_i.next() # throw the first value away
>> > >> >> >>> > >>     for i, j in izip(data_i, data_j):
>> > >> >> >>> > >>         compare(i, j)
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >> Hope that helps,
>> > >> >> >>> > >> Josh
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <
>> > >> >> sc...@gm...>
>> > >> >> >>> > >> wrote:
>> > >> >> >>> > >>
>> > >> >> >>> > >> > HI David,
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Tables and table column iteration have been overhauled
>> > >> fairly
>> > >> >> >>> recently
>> > >> >> >>> > >> > [1].  So you might try creating two iterators, offset
>> by
>> > >> one,
>> > >> >> and
>> > >> >> >>> then
>> > >> >> >>> > >> > doing the comparison.  I am hacking this out super
>> quick
>> > so
>> > >> >> please
>> > >> >> >>> > >> forgive
>> > >> >> >>> > >> > me:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > from itertools import izip
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > with tb.openFile(...) as f:
>> > >> >> >>> > >> >     data = f.root.data
>> > >> >> >>> > >> >     data_i = iter(data)
>> > >> >> >>> > >> >     data_j = iter(data)
>> > >> >> >>> > >> >     data_i.next() # throw the first value away
>> > >> >> >>> > >> >     for i, j in izip(data_i, data_j):
>> > >> >> >>> > >> >         compare(i, j)
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > You get the idea ;)
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Be Well
>> > >> >> >>> > >> > Anthony
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <
>> > >> >> >>> dav...@gm...>
>> > >> >> >>> > >> wrote:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >> I was hoping someone could help me out here.
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> This is from a post I put up on StackOverflow,
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5
>> and
>> > >> >> access
>> > >> >> >>> > using
>> > >> >> >>> > >> >> PyTables. One operation I need to do on this dataset
>> are
>> > >> >> pairwise
>> > >> >> >>> > >> >> comparisons between each of the elements. This
>> requires 2
>> > >> >> loops,
>> > >> >> >>> one
>> > >> >> >>> > to
>> > >> >> >>> > >> >> iterate over each element, and an inner loop to
>> iterate
>> > >> over
>> > >> >> >>> every
>> > >> >> >>> > >> other
>> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2
>> > comparisons.
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> For fairly small sets I found it to be faster to dump
>> the
>> > >> >> >>> contents
>> > >> >> >>> > >> into a
>> > >> >> >>> > >> >> multdimensional numpy array and then do my iteration.
>> I
>> > run
>> > >> >> into
>> > >> >> >>> > >> problems
>> > >> >> >>> > >> >> with large sets because of memory issues and need to
>> > access
>> > >> >> each
>> > >> >> >>> > >> element of
>> > >> >> >>> > >> >> the dataset at run time.
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> Putting the elements into an array gives me about 600
>> > >> >> >>> comparisons per
>> > >> >> >>> > >> >> second, while operating on hdf5 data itself gives me
>> > about
>> > >> 300
>> > >> >> >>> > >> comparisons
>> > >> >> >>> > >> >> per second.
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> Is there a way to speed this process up?
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> Example follows (this is not my real code, just an
>> > >> example):
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> *Small Set*:
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f:
>> > >> >> >>> > >> >>     data = f.root.data
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>     N_elements = len(data)
>> > >> >> >>> > >> >>     elements = np.empty((N_irises, 1e5))
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>     for ii, d in enumerate(data):
>> > >> >> >>> > >> >>         elements[ii] = data['element']
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises))  for ii in
>> > >> >> xrange(N_elements):
>> > >> >> >>> > >> >>     for jj in xrange(ii+1, N_elements):
>> > >> >> >>> > >> >>         D[ii, jj] = compare(elements[ii],
>> elements[jj])
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>  *Large Set*:
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f:
>> > >> >> >>> > >> >>     data = f.root.data
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>     N_elements = len(data)
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>     D = np.empty((N_irises, N_irises))
>> > >> >> >>> > >> >>     for ii in xrange(N_elements):
>> > >> >> >>> > >> >>         for jj in xrange(ii+1, N_elements):
>> > >> >> >>> > >> >>              D[ii, jj] = compare(data['element'][ii],
>> > >> >> >>> > >> data['element'][jj])
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >>
>> > >>
>> >
>> ------------------------------------------------------------------------------
>> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C#
>> 2012,
>> > >> >> HTML5,
>> > >> >> >>> CSS,
>> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep
>> your
>> > >> >> skills
>> > >> >> >>> > current
>> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials
>> by
>> > >> >> >>> Microsoft
>> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn
>> more
>> > at:
>> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712
>> > >> >> >>> > >> >> _______________________________________________
>> > >> >> >>> > >> >> Pytables-users mailing list
>> > >> >> >>> > >> >> Pyt...@li...
>> > >> >> >>> > >> >>
>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >>
>> > >>
>> >
>> ------------------------------------------------------------------------------
>> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C#
>> 2012,
>> > >> >> HTML5,
>> > >> >> >>> CSS,
>> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep
>> your
>> > >> skills
>> > >> >> >>> > current
>> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials
>> by
>> > >> >> Microsoft
>> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more
>> > at:
>> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712
>> > >> >> >>> > >> > _______________________________________________
>> > >> >> >>> > >> > Pytables-users mailing list
>> > >> >> >>> > >> > Pyt...@li...
>> > >> >> >>> > >> >
>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> -------------- next part --------------
>> > >> >> >>> > >> An HTML attachment was scrubbed...
>> > >> >> >>> > >>
>> > >> >> >>> > >> ------------------------------
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >>
>> > >>
>> >
>> ------------------------------------------------------------------------------
>> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012,
>> > >> HTML5,
>> > >> >> >>> CSS,
>> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your
>> > >> skills
>> > >> >> >>> current
>> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by
>> > >> >> Microsoft
>> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more
>> at:
>> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712
>> > >> >> >>> > >>
>> > >> >> >>> > >> ------------------------------
>> > >> >> >>> > >>
>> > >> >> >>> > >> _______________________________________________
>> > >> >> >>> > >> Pytables-users mailing list
>> > >> >> >>> > >> Pyt...@li...
>> > >> >> >>> > >>
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3
>> > >> >> >>> > >> *********************************************
>> > >> >> >>> > >>
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >>
>> > >>
>> >
>> ------------------------------------------------------------------------------
>> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012,
>> > >> HTML5,
>> > >> >> CSS,
>> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your
>> > skills
>> > >> >> >>> current
>> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by
>> > >> Microsoft
>> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more
>> at:
>> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712
>> > >> >> >>> > > _______________________________________________
>> > >> >> >>> > > Pytables-users mailing list
>> > >> >> >>> > > Pyt...@li...
>> > >> >> >>> > >
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > -------------- next part --------------
>> > >> >> >>> > An HTML attachment was scrubbed...
>> > >> >> >>> >
>> > >> >> >>> > ------------------------------
>> > >> >> >>> >
>> > >> >> >>> > Message: 2
>> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600
>> > >> >> >>> > From: Anthony Scopatz <sc...@gm...>
>> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80,
>> > >> Issue 4
>> > >> >> >>> > To: Discussion list for PyTables
>> > >> >> >>> >         <pyt...@li...>
>> > >> >> >>> > Message-ID:
>> > >> >> >>> >         <
>> > >> >> >>> >
>> > >> CAP...@ma...>
>> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1"
>> > >> >> >>> >
>> > >> >> >>> > Josh is right that you can just edit the code by hand (which
>> > >> works
>> > >> >> but
>> > >> >> >>> > sucks).
>> > >> >> >>> >
>> > >> >> >>> > However, on Windows -- on the rare occasion when I also
>> have to
>> > >> >> >>> develop on
>> > >> >> >>> > it -- I typically use a distribution that includes a
>> compiler,
>> > >> >> cython,
>> > >> >> >>> > hdf5, and pytables already and then I install my development
>> > >> version
>> > >> >> >>> from
>> > >> >> >>> > github OVER this.  I recommend either EPD or Anaconda,
>> though
>> > >> other
>> > >> >> >>> > distributions listed here [1] might also work.
>> > >> >> >>> >
>> > >> >> >>> > Be well
>> > >> >> >>> > Anthony
>> > >> >> >>> >
>> > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/
>> > >> >> >>> >
>> > >> >> >>> >
>> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <
>> > jos...@gm...
>> > >> >
>> > >> >> >>> wrote:
>> > >> >> >>> >
>> > >> >> >>> > > The change was in pure Python code, so you should be able
>> to
>> > >> just
>> > >> >> >>> paste
>> > >> >> >>> > in
>> > >> >> >>> > > the changes to your local copy.  Start with the
>> > >> >> table.Column.__iter__
>> > >> >> >>> > > method (lines 3296-3310) here.
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >>
>> > >>
>> >
>> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py
>> > >> >> >>> > >
>> > >> >> >>> > > It needs to be modified slightly because it uses some
>> > >> additional
>> > >> >> >>> features
>> > >> >> >>> > > that aren't available in the released version (the
>> > >> out=buf_slice
>> > >> >> >>> argument
>> > >> >> >>> > > to table.read).  The following should work.
>> > >> >> >>> > >
>> > >> >> >>> > > def __iter__(self):
>> > >> >> >>> > >         table = self.table
>> > >> >> >>> > >         itemsize = self.dtype.itemsize
>> > >> >> >>> > >         nrowsinbuf =
>> table._v_file.params['IO_BUFFER_SIZE']
>> > //
>> > >> >> >>> itemsize
>> > >> >> >>> > >         max_row = len(self)
>> > >> >> >>> > >         for start_row in xrange(0, len(self), nrowsinbuf):
>> > >> >> >>> > >             end_row = min([start_row + nrowsinbuf,
>> max_row])
>> > >> >> >>> > >             buf = table.read(start_row, end_row, 1,
>> > >> >> >>> field=self.pathname)
>> > >> >> >>> > >             for row in buf:
>> > >> >> >>> > >                 yield row
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > > I haven't tested this, but I think it will work.
>> > >> >> >>> > >
>> > >> >> >>> > > Josh
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <
>> > >> >> dav...@gm...>
>> > >> >> >>> > wrote:
>> > >> >> >>> > >
>> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm
>> > forced
>> > >> to
>> > >> >> >>> work on
>> > >> >> >>> > >> Windows 7 at work and have never had luck compiling
>> python
>> > >> source
>> > >> >> >>> > >> successfully.  I have had to rely on precompiled binaries
>> > and
>> > >> now
>> > >> >> >>> its
>> > >> >> >>> > >> biting me in the butt.
>> > >> >> >>> > >>
>> > >> >> >>> > >> Is there any quick fix I can do to improve this iteration
>> > >> using
>> > >> >> >>> v2.4.0?
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, <
>> > >> >> >>> > >> pyt...@li...> wrote:
>> > >> >> >>> > >>
>> > >> >> >>> > >>> Send Pytables-users mailing list submissions to
>> > >> >> >>> > >>>         pyt...@li...
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web,
>> visit
>> > >> >> >>> > >>>
>> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >>> or, via email, send a message with subject or body
>> 'help'
>> > to
>> > >> >> >>> > >>>         pyt...@li...
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> You can reach the person managing the list at
>> > >> >> >>> > >>>         pyt...@li...
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> When replying, please edit your Subject line so it is
>> more
>> > >> >> specific
>> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..."
>> > >> >> >>> > >>>
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> Today's Topics:
>> > >> >> >>> > >>>
>> > >> >> >>> > >>>    1. Re: Pytables-users Digest, Vol 80, Issue 2 (David
>> > Reed)
>> > >> >> >>> > >>>    2. Re: Pytables-users Digest, Vol 80, Issue 3 (David
>> > Reed)
>> > >> >> >>> > >>>
>> > >> >> >>> > >>>
>> > >> >> >>> > >>>
>> > >> >> >>>
>> > >>
>> ----------------------------------------------------------------------
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> Message: 1
>> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500
>> > >> >> >>> > >>> From: David Reed <dav...@gm...>
>> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol
>> > 80,
>> > >> >> Issue
>> > >> >> >>> 2
>> > >> >> >>> > >>> To: pyt...@li...
>> > >> >> >>> > >>> Message-ID:
>> > >> >> >>> > >>>         <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
>> > >> >> >>> > >>> ev...@ma...>
>> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1"
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't
>> > think
>> > >> >> that
>> > >> >> >>> > method
>> > >> >> >>> > >>> will work since this will only be comparing the ith
>> element
>> > >> with
>> > >> >> >>> ith+1
>> > >> >> >>> > >>> element.  I still need 2 for loops right?
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> Using itertools might speed things up though, I've never
>> > used
>> > >> >> them
>> > >> >> >>> so I
>> > >> >> >>> > >>> will give it a shot and let you know how it goes.  Looks
>> > >> like I
>> > >> >> >>> need to
>> > >> >> >>> > >>> download the latest release before I do that too.
>>  Thanks
>> > for
>> > >> >> the
>> > >> >> >>> help.
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> -Dave
>> > >> >> >>> > >>>
>> > >> >> >>> > >>>
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, <
>> > >> >> >>> > >>> pyt...@li...> wrote:
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to
>> > >> >> >>> > >>> >         pyt...@li...
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web,
>> visit
>> > >> >> >>> > >>> >
>> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >>> > or, via email, send a message with subject or body
>> 'help'
>> > >> to
>> > >> >> >>> > >>> >         pyt...@li...
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > You can reach the person managing the list at
>> > >> >> >>> > >>> >         pyt...@li...
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > When replying, please edit your Subject line so it is
>> > more
>> > >> >> >>> specific
>> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..."
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > Today's Topics:
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> >    1. Re: Nested Iteration of HDF5 using PyTables
>> > (Anthony
>> > >> >> >>> Scopatz)
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> >
>> > >> >> >>> >
>> > >> >>
>> > ----------------------------------------------------------------------
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > Message: 1
>> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600
>> > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...>
>> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5
>> > >> using
>> > >> >> >>> PyTables
>> > >> >> >>> > >>> > To: Discussion list for PyTables
>> > >> >> >>> > >>> >         <pyt...@li...>
>> > >> >> >>> > >>> > Message-ID:
>> > >> >> >>> > >>> >         <CAPk-6T5b=
>> > >> >> >>> > >>> >
>> 1EG...@ma...
>> > >
>> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1"
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > HI David,
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > Tables and table column iteration have been overhauled
>> > >> fairly
>> > >> >> >>> > recently
>> > >> >> >>> > >>> [1].
>> > >> >> >>> > >>> >  So you might try creating two iterators, offset by
>> one,
>> > >> and
>> > >> >> then
>> > >> >> >>> > >>> doing the
>> > >> >> >>> > >>> > comparison.  I am hacking this out super quick so
>> please
>> > >> >> forgive
>> > >> >> >>> me:
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > from itertools import izip
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > with tb.openFile(...) as f:
>> > >> >> >>> > >>> >     data = f.root.data
>> > >> >> >>> > >>> >     data_i = iter(data)
>> > >> >> >>> > >>> >     data_j = iter(data)
>> > >> >> >>> > >>> >     data_i.next() # throw the first value away
>> > >> >> >>> > >>> >     for i, j in izip(data_i, data_j):
>> > >> >> >>> > >>> >         compare(i, j)
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > You get the idea ;)
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > Be Well
>> > >> >> >>> > >>> > Anthony
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <
>> > >> >> >>> dav...@gm...>
>> > >> >> >>> > >>> wrote:
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > > I was hoping someone could help me out here.
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > This is from a post I put up on StackOverflow,
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > I am have a fairly large dataset that I store in
>> HDF5
>> > and
>> > >> >> >>> access
>> > >> >> >>> > >>> using
>> > >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset
>> > are
>> > >> >> >>> pairwise
>> > >> >> >>> > >>> > > comparisons between each of the elements. This
>> > requires 2
>> > >> >> >>> loops,
>> > >> >> >>> > one
>> > >> >> >>> > >>> to
>> > >> >> >>> > >>> > > iterate over each element, and an inner loop to
>> iterate
>> > >> over
>> > >> >> >>> every
>> > >> >> >>> > >>> other
>> > >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2
>> > >> comparisons.
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > For fairly small sets I found it to be faster to
>> dump
>> > the
>> > >> >> >>> contents
>> > >> >> >>> > >>> into a
>> > >> >> >>> > >>> > > multdimensional numpy array and then do my
>> iteration. I
>> > >> run
>> > >> >> >>> into
>> > >> >> >>> > >>> problems
>> > >> >> >>> > >>> > > with large sets because of memory issues and need to
>> > >> access
>> > >> >> >>> each
>> > >> >> >>> > >>> element
>> > >> >> >>> > >>> > of
>> > >> >> >>> > >>> > > the dataset at run time.
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > Putting the elements into an array gives me about
>> 600
>> > >> >> >>> comparisons
>> > >> >> >>> > per
>> > >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me
>> > >> about
>> > >> >> 300
>> > >> >> >>> > >>> > comparisons
>> > >> >> >>> > >>> > > per second.
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > Is there a way to speed this process up?
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > Example follows (this is not my real code, just an
>> > >> example):
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > *Small Set*:
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f:
>> > >> >> >>> > >>> > >     data = f.root.data
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >     N_elements = len(data)
>> > >> >> >>> > >>> > >     elements = np.empty((N_irises, 1e5))
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >     for ii, d in enumerate(data):
>> > >> >> >>> > >>> > >         elements[ii] = data['element']
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises))  for ii in
>> > >> >> >>> xrange(N_elements):
>> > >> >> >>> > >>> > >     for jj in xrange(ii+1, N_elements):
>> > >> >> >>> > >>> > >         D[ii, jj] = compare(elements[ii],
>> elements[jj])
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >  *Large Set*:
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f:
>> > >> >> >>> > >>> > >     data = f.root.data
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >     N_elements = len(data)
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >     D = np.empty((N_irises, N_irises))
>> > >> >> >>> > >>> > >     for ii in xrange(N_elements):
>> > >> >> >>> > >>> > >         for jj in xrange(ii+1, N_elements):
>> > >> >> >>> > >>> > >              D[ii, jj] =
>> compare(data['element'][ii],
>> > >> >> >>> > >>> > data['element'][jj])
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >>
>> > >>
>> >
>> ------------------------------------------------------------------------------
>> > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C#
>> > 2012,
>> > >> >> >>> HTML5,
>> > >> >> >>> > CSS,
>> > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep
>> > your
>> > >> >> skills
>> > >> >> >>> > >>> current
>> > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video
>> tutorials
>> > by
>> > >> >> >>> Microsoft
>> > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn
>> more
>> > >> at:
>> > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712
>> > >> >> >>> > >>> > > _______________________________________________
>> > >> >> >>> > >>> > > Pytables-users mailing list
>> > >> >> >>> > >>> > > Pyt...@li...
>> > >> >> >>> > >>> > >
>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > >
>> > >> >> >>> > >>> > -------------- next part --------------
>> > >> >> >>> > >>> > An HTML attachment was scrubbed...
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > ------------------------------
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >>
>> > >>
>> >
>> ------------------------------------------------------------------------------
>> > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C#
>> 2012,
>> > >> >> HTML5,
>> > >> >> >>> CSS,
>> > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep
>> your
>> > >> >> skills
>> > >> >> >>> > current
>> > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials
>> by
>> > >> >> >>> Microsoft
>> > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn
>> more
>> > at:
>> > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > ------------------------------
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > _______________________________________________
>> > >> >> >>> > >>> > Pytables-users mailing list
>> > >> >> >>> > >>> > Pyt...@li...
>> > >> >> >>> > >>> >
>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2
>> > >> >> >>> > >>> > *********************************************
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> -------------- next part --------------
>> > >> >> >>> > >>> An HTML attachment was scrubbed...
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> ------------------------------
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> Message: 2
>> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500
>> > >> >> >>> > >>> From: David Reed <dav...@gm...>
>> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol
>> > 80,
>> > >> >> Issue
>> > >> >> >>> 3
>> > >> >> >>> > >>> To: pyt...@li...
>> > >> >> >>> > >>> Message-ID:
>> > >> >> >>> > >>>         <
>> > >> >> >>> > >>>
>> > >> >>
>> CAM...@ma...
>> > >> >> >>> >
>> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1"
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> Thanks a lot for the help so far guys!
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> Looking at itertools, I found what I believe to be the
>> > >> perfect
>> > >> >> >>> function
>> > >> >> >>> > >>> for
>> > >> >> >>> > >>> what I need, itertools.combinations. This appears to be
>> a
>> > >> valid
>> > >> >> >>> > >>> replacement
>> > >> >> >>> > >>> to the method proposed.
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> There is a small problem that I didn't mention is that
>> my
>> > >> >> compare
>> > >> >> >>> > >>> function
>> > >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like
>> so:
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> D = np.empty((N_irises, N_irises))
>> > >> >> >>> > >>> for ii in xrange(N_elements):
>> > >> >> >>> > >>>     for jj in xrange(ii+1, N_elements):
>> > >> >> >>> > >>>          D[ii, jj] = compare(data['element1'][ii],
>> > >> >> >>> > >>> data['element1'][jj],data['element2'][ii],
>> > >> >> >>> > >>> data['element2'][jj])
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> Is there an efficient way of using itertools with this
>> > >> >> structure?
>> > >> >> >>> > >>>
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, <
>> > >> >> >>> > >>> pyt...@li...> wrote:
>> > >> >> >>> > >>>
>> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to
>> > >> >> >>> > >>> >         pyt...@li...
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web,
>> visit
>> > >> >> >>> > >>> >
>> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> > >> >> >>> > >>> > or, via email, send a message with subject or body
>> 'help'
>> > >> to
>> > >> >> >>> > >>> >         pyt...@li...
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > You can reach the person managing the list at
>> > >> >> >>> > >>> >         pyt...@li...
>> > >> >> >>> > >>> >
>> > >> >> >>> > >>> > When replying, please edit your Subject line so it is
>> > more
>> > >> >> >>> specific
>> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..."
>> > >>
>
> ...
>
> [Message clipped]
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_jan
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>