pytables-users Mailing List for PyTables - Hierarchical datasets (Page 11)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Josh,

Here is my __iter__ code:

def __iter__(self):
        table = self.table
        itemsize = self.dtype.itemsize
        nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize
        max_row = len(self)
        for start_row in xrange(0, len(self), nrowsinbuf):
            end_row = min([start_row + nrowsinbuf, max_row])
            buf = table.read(start_row, end_row, 1, field=self.pathname)
            for row in buf:
                yield row

It does look different, I will try swapping in the code from github and see
what happens.

On Mon, Feb 4, 2013 at 9:59 AM, <
pyt...@li...> wrote:

> Send Pytables-users mailing list submissions to
>         pyt...@li...
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> or, via email, send a message with subject or body 'help' to
>         pyt...@li...
>
> You can reach the person managing the list at
>         pyt...@li...
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Pytables-users digest..."
>
>
> Today's Topics:
>
>    1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers)
>    2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 1 Feb 2013 14:08:47 -0800
> From: Josh Ayers <jos...@gm...>
> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4
> To: Discussion list for PyTables
>         <pyt...@li...>
> Message-ID:
>         <CACOB4aPG4NZ6b2a3v=
> 1Ue...@ma...>
> Content-Type: text/plain; charset="iso-8859-1"
>
> David,
>
> You added a custom version of table.Column.__iter__, correct?  Could you
> also include that along with the script to reproduce the error?
>
> It seems like the problem may be in the 'nrowsinbuf' calculation - see
> [1].  Each of your rows is 17 x 9600 = 163200 bytes.  If you're using the
> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6
> chunks.  Instead, it's reading the entire table.
>
> [1]:
> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296
>
>
>
> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> wrote:
>
> >
> >
> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...>
> wrote:
> >
> >> at the error:
> >>
> >> result = numpy.empty(shape=nrows, dtype=dtypeField)
> >>
> >> nrows = 4620 and dtypeField is ('bool', (17, 9600))
> >>
> >> I'm not sure what that means as a dtype, but thats what it is.
> >>
> >> Forgive me if I'm being totally naive, but I thought the whole point of
> >> __iter__ with pyttables was to do iteration on the fly, so there is no
> >> preallocation.
> >>
> >
> > Nope you are not being naive at all.  That is the point.
> >
> >
> >>  If you have any ideas on this I'm all ears.
> >>
> >
> > If you could send a minimal script which reproduces this error, that
> would
> > help a lot.
> >
> > Be Well
> > Anthony
> >
> >
> >>
> >>
> >>  Thanks again.
> >>
> >> Dave
> >>
> >>
> >> On Fri, Feb 1, 2013 at 3:45 PM, <
> >> pyt...@li...> wrote:
> >>
> >>> Send Pytables-users mailing list submissions to
> >>>         pyt...@li...
> >>>
> >>> To subscribe or unsubscribe via the World Wide Web, visit
> >>>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>> or, via email, send a message with subject or body 'help' to
> >>>         pyt...@li...
> >>>
> >>> You can reach the person managing the list at
> >>>         pyt...@li...
> >>>
> >>> When replying, please edit your Subject line so it is more specific
> >>> than "Re: Contents of Pytables-users digest..."
> >>>
> >>>
> >>> Today's Topics:
> >>>
> >>>    1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz)
> >>>
> >>>
> >>> ----------------------------------------------------------------------
> >>>
> >>> Message: 1
> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600
> >>> From: Anthony Scopatz <sc...@gm...>
> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2
> >>> To: Discussion list for PyTables
> >>>         <pyt...@li...>
> >>> Message-ID:
> >>>         <
> >>> CAP...@ma...>
> >>> Content-Type: text/plain; charset="iso-8859-1"
> >>>
> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...>
> >>> wrote:
> >>>
> >>> > Hi Anthony,
> >>> >
> >>> > Thanks for the reply.
> >>> >
> >>> > I honestly don't know how to monitor my Python memory usage, but I'm
> >>> sure
> >>> > that its caused by out of memory.
> >>> >
> >>>
> >>> Well, I would just run top or process monitor or something while
> running
> >>> the python script to see what happens to memory usage as the script
> chugs
> >>> along...
> >>>
> >>>
> >>> >  I'm just trying to find out how to fix it.  My HDF5 table has 4620
> >>> rows
> >>> > and the column I'm iterating over is a 17x9600 boolean matrix.  The
> >>> > __iter__ method is preallocating an array that is this size which
> >>> appears
> >>> > to be root of the error.  I was hoping there is a fix somewhere in
> >>> here to
> >>> > not have to do this preallocation.
> >>> >
> >>>
> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space.  4620 of
> >>> these is ~760 MB.  If you have 2 GB of memory and you are iterating
> over
> >>> 2
> >>> of these (templates & masks) it is conceivable that you are just
> running
> >>> out of memory.  Maybe there is a way that __iter__ could not
> preallocate
> >>> something that is basically a temporary.  What is the dtype of the
> >>> templates array?
> >>>
> >>> Be Well
> >>> Anthony
> >>>
> >>>
> >>> >
> >>> > Thanks again.
> >>>
> >>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Mon, 4 Feb 2013 09:58:53 -0500
> From: David Reed <dav...@gm...>
> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6
> To: pyt...@li...
> Message-ID:
>         <CAM6XA7=
> h50...@ma...>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Anthony,
>
> Sorry to just get back to you. I can send a script, should I send a script
> that creates some fake data as well?
>
> -Dave
>
>
> On Fri, Feb 1, 2013 at 4:50 PM, <
> pyt...@li...> wrote:
>
> > Send Pytables-users mailing list submissions to
> >         pyt...@li...
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
> > or, via email, send a message with subject or body 'help' to
> >         pyt...@li...
> >
> > You can reach the person managing the list at
> >         pyt...@li...
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Pytables-users digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Fri, 1 Feb 2013 15:50:11 -0600
> > From: Anthony Scopatz <sc...@gm...>
> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4
> > To: Discussion list for PyTables
> >         <pyt...@li...>
> > Message-ID:
> >         <
> > CAP...@ma...>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...>
> wrote:
> >
> > > at the error:
> > >
> > > result = numpy.empty(shape=nrows, dtype=dtypeField)
> > >
> > > nrows = 4620 and dtypeField is ('bool', (17, 9600))
> > >
> > > I'm not sure what that means as a dtype, but thats what it is.
> > >
> > > Forgive me if I'm being totally naive, but I thought the whole point of
> > > __iter__ with pyttables was to do iteration on the fly, so there is no
> > > preallocation.
> > >
> >
> > Nope you are not being naive at all.  That is the point.
> >
> >
> > >  If you have any ideas on this I'm all ears.
> > >
> >
> > If you could send a minimal script which reproduces this error, that
> would
> > help a lot.
> >
> > Be Well
> > Anthony
> >
> >
> > >
> > >
> > >  Thanks again.
> > >
> > > Dave
> > >
> > >
> > > On Fri, Feb 1, 2013 at 3:45 PM, <
> > > pyt...@li...> wrote:
> > >
> > >> Send Pytables-users mailing list submissions to
> > >>         pyt...@li...
> > >>
> > >> To subscribe or unsubscribe via the World Wide Web, visit
> > >>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> or, via email, send a message with subject or body 'help' to
> > >>         pyt...@li...
> > >>
> > >> You can reach the person managing the list at
> > >>         pyt...@li...
> > >>
> > >> When replying, please edit your Subject line so it is more specific
> > >> than "Re: Contents of Pytables-users digest..."
> > >>
> > >>
> > >> Today's Topics:
> > >>
> > >>    1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz)
> > >>
> > >>
> > >> ----------------------------------------------------------------------
> > >>
> > >> Message: 1
> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600
> > >> From: Anthony Scopatz <sc...@gm...>
> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2
> > >> To: Discussion list for PyTables
> > >>         <pyt...@li...>
> > >> Message-ID:
> > >>         <
> > >> CAP...@ma...>
> > >> Content-Type: text/plain; charset="iso-8859-1"
> > >>
> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...>
> > >> wrote:
> > >>
> > >> > Hi Anthony,
> > >> >
> > >> > Thanks for the reply.
> > >> >
> > >> > I honestly don't know how to monitor my Python memory usage, but I'm
> > >> sure
> > >> > that its caused by out of memory.
> > >> >
> > >>
> > >> Well, I would just run top or process monitor or something while
> running
> > >> the python script to see what happens to memory usage as the script
> > chugs
> > >> along...
> > >>
> > >>
> > >> >  I'm just trying to find out how to fix it.  My HDF5 table has 4620
> > rows
> > >> > and the column I'm iterating over is a 17x9600 boolean matrix.  The
> > >> > __iter__ method is preallocating an array that is this size which
> > >> appears
> > >> > to be root of the error.  I was hoping there is a fix somewhere in
> > here
> > >> to
> > >> > not have to do this preallocation.
> > >> >
> > >>
> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space.  4620 of
> > >> these is ~760 MB.  If you have 2 GB of memory and you are iterating
> > over 2
> > >> of these (templates & masks) it is conceivable that you are just
> running
> > >> out of memory.  Maybe there is a way that __iter__ could not
> preallocate
> > >> something that is basically a temporary.  What is the dtype of the
> > >> templates array?
> > >>
> > >> Be Well
> > >> Anthony
> > >>
> > >>
> > >> >
> > >> > Thanks again.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Feb 1, 2013 at 11:12 AM, <
> > >> > pyt...@li...> wrote:
> > >> >
> > >> >> Send Pytables-users mailing list submissions to
> > >> >>         pyt...@li...
> > >> >>
> > >> >> To subscribe or unsubscribe via the World Wide Web, visit
> > >> >>
> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> or, via email, send a message with subject or body 'help' to
> > >> >>         pyt...@li...
> > >> >>
> > >> >> You can reach the person managing the list at
> > >> >>         pyt...@li...
> > >> >>
> > >> >> When replying, please edit your Subject line so it is more specific
> > >> >> than "Re: Contents of Pytables-users digest..."
> > >> >>
> > >> >>
> > >> >> Today's Topics:
> > >> >>
> > >> >>    1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz)
> > >> >>
> > >> >>
> > >> >>
> > ----------------------------------------------------------------------
> > >> >>
> > >> >> Message: 1
> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600
> > >> >> From: Anthony Scopatz <sc...@gm...>
> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue
> 9
> > >> >> To: Discussion list for PyTables
> > >> >>         <pyt...@li...>
> > >> >> Message-ID:
> > >> >>         <
> > >> >> CAP...@ma...
> >
> > >> >> Content-Type: text/plain; charset="iso-8859-1"
> > >> >>
> > >> >> Hi David,
> > >> >>
> > >> >> Sorry, I haven't had a ton of time recently.  You seem to be
> getting
> > a
> > >> >> memory error on creating a numpy array.  This kind of thing
> typically
> > >> >> happens when you are out of memory.  Does this seem to be the case
> > with
> > >> >> you?  When this dies, is your memory usage at 100%?  If so, this
> > >> algorithm
> > >> >> might require a little tweaking...
> > >> >>
> > >> >> Be Well
> > >> >> Anthony
> > >> >>
> > >> >>
> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed <dav...@gm...
> >
> > >> >> wrote:
> > >> >>
> > >> >> > I'm still having problems with this one.  I can't tell if this
> > >> something
> > >> >> > dumb Im doing with itertools, or if its something in pytables.
> > >> >> >
> > >> >> > Would appreciate any help.
> > >> >> >
> > >> >> > Thanks
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed <
> > dav...@gm...
> > >> >> >wrote:
> > >> >> >
> > >> >> >> I think I have to reopen this issue.  I have been running fine
> for
> > >> >> awhile
> > >> >> >> using the combinations method from itertools, but have recently
> > run
> > >> >> into a
> > >> >> >> memory since I have recently quadrupled the size of the hdf
> file.
> > >> >> >>
> > >> >> >> Here is my code again:
> > >> >> >>
> > >> >> >>         from itertools import combinations, izip
> > >> >> >>  with tb.openFile(h5_all, 'r') as f:
> > >> >> >>  irises = f.root.irises
> > >> >> >>
> > >> >> >> templates = f.root.irises.cols.templates
> > >> >> >> masks = f.root.irises.cols.masks1
> > >> >> >>
> > >> >> >> N_irises = len(irises)
> > >> >> >>  index = np.ones((20 * 480), np.bool)
> > >> >> >>
> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2)
> > >> >> >> D = np.empty((N_irises, N_irises))
> > >> >> >>  for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates,
> > >> masks,
> > >> >> >> range(N_irises)), 2):
> > >> >> >> # print ii
> > >> >> >>  D[ii, jj] = ham_dist(
> > >> >> >> t1[8, index],
> > >> >> >> t2[:, index],
> > >> >> >>  m1[8, index],
> > >> >> >> m2[:, index],
> > >> >> >> )
> > >> >> >>
> > >> >> >> And here is the error:
> > >> >> >>
> > >> >> >> In [10]: get_hd3()
> > >> >> >> 10669890 Comparisons
> > >> >> >>
> > >> >> >>
> > >> >>
> > >>
> >
> ---------------------------------------------------------------------------
> > >> >> >> MemoryError                               Traceback (most recent
> > >> call
> > >> >> >> last)
> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>()
> > >> >> >> ----> 1 get_hd3()
> > >> >> >>
> > >> >> >>
> > >> >> >>     118                 print '%i Comparisons' %
> > >> (N_irises*(N_irises -
> > >> >> >> 1)/2)
> > >> >> >>     119                 D = np.empty((N_irises, N_irises))
> > >> >> >> --> 120                 for (t1, m1, ii), (t2, m2, jj) in
> > >> >> >> combinations(izip(temp
> > >> >> >> lates, masks, range(N_irises)), 2):
> > >> >> >>     121                         # print ii
> > >> >> >>     122                         D[ii, jj] = ham_dist(
> > >> >> >>
> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in __iter__(self)
> > >> >> >>    3274         for start_row in xrange(0, len(self),
> nrowsinbuf):
> > >> >> >>    3275             end_row = min([start_row + nrowsinbuf,
> > max_row])
> > >> >> >> -> 3276             buf = table.read(start_row, end_row, 1,
> > >> >> >> field=self.pathname)
> > >> >> >>
> > >> >> >>    3277             for row in buf:
> > >> >> >>    3278                 yield row
> > >> >> >>
> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self,
> > start,
> > >> >> stop,
> > >> >> >> step,
> > >> >> >> field)
> > >> >> >>    1772         (start, stop, step) =
> > self._processRangeRead(start,
> > >> >> stop,
> > >> >> >> step)
> > >> >> >>    1773
> > >> >> >> -> 1774         arr = self._read(start, stop, step, field)
> > >> >> >>    1775         return internal_to_flavor(arr, self.flavor)
> > >> >> >>    1776
> > >> >> >>
> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self,
> > start,
> > >> >> >> stop, step,
> > >> >> >>  field)
> > >> >> >>    1719         if field:
> > >> >> >>    1720             # Create a container for the results
> > >> >> >> -> 1721             result = numpy.empty(shape=nrows,
> > >> dtype=dtypeField)
> > >> >> >>    1722         else:
> > >> >> >>    1723             # Recarray case
> > >> >> >>
> > >> >> >> MemoryError:
> > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read()
> > >> >> >>    1720             # Create a container for the results
> > >> >> >> -> 1721             result = numpy.empty(shape=nrows,
> > >> dtype=dtypeField)
> > >> >> >>    1722         else:
> > >> >> >>
> > >> >> >> Also, if you guys see any performance problems in my code,
> please
> > >> let
> > >> >> me
> > >> >> >> know.
> > >> >> >>
> > >> >> >> Thank you so much for the help.
> > >> >> >>
> > >> >> >> -Dave
> > >> >> >>
> > >> >> >>
> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, <
> > >> >> >> pyt...@li...> wrote:
> > >> >> >>
> > >> >> >>> Send Pytables-users mailing list submissions to
> > >> >> >>>         pyt...@li...
> > >> >> >>>
> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit
> > >> >> >>>
> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> or, via email, send a message with subject or body 'help' to
> > >> >> >>>         pyt...@li...
> > >> >> >>>
> > >> >> >>> You can reach the person managing the list at
> > >> >> >>>         pyt...@li...
> > >> >> >>>
> > >> >> >>> When replying, please edit your Subject line so it is more
> > specific
> > >> >> >>> than "Re: Contents of Pytables-users digest..."
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> Today's Topics:
> > >> >> >>>
> > >> >> >>>    1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed)
> > >> >> >>>
> > >> >> >>>
> > >> >> >>>
> > >> ----------------------------------------------------------------------
> > >> >> >>>
> > >> >> >>> Message: 1
> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500
> > >> >> >>> From: David Reed <dav...@gm...>
> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80,
> > Issue
> > >> 8
> > >> >> >>> To: pyt...@li...
> > >> >> >>> Message-ID:
> > >> >> >>>         <
> > >> >> >>>
> > CAM...@ma...
> > >> >
> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1"
> > >> >> >>>
> > >> >> >>> I can't thank you guys enough for the help.  I was able to add
> > the
> > >> >> >>> __iter__
> > >> >> >>> function to the table.py file and everything seems to be
> working
> > >> >> great!
> > >> >> >>>  I'm not quite as fast as I was with iterating right of a
> matrix
> > >> but
> > >> >> >>> pretty
> > >> >> >>> close.  I was at 555 comparisons per second, and now im at 420.
> > >> >> >>>
> > >> >> >>> I handled the problem I mentioned earlier by doing this, and it
> > >> seems
> > >> >> to
> > >> >> >>> work great:
> > >> >> >>>
> > >> >> >>> A = f.root.data.cols.A
> > >> >> >>> B = f.root.data.cols.B
> > >> >> >>>
> > >> >> >>> D = np.empty((len(A), len(A))
> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B,
> > >> >> range(len(A))),
> > >> >> >>> 2):
> > >> >> >>>   D[ii, jj] = compare(a1, a2, b1, b2)
> > >> >> >>>
> > >> >> >>> Again, thanks a lot.
> > >> >> >>>
> > >> >> >>> -Dave
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, <
> > >> >> >>> pyt...@li...> wrote:
> > >> >> >>>
> > >> >> >>> > Send Pytables-users mailing list submissions to
> > >> >> >>> >         pyt...@li...
> > >> >> >>> >
> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit
> > >> >> >>> >
> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > or, via email, send a message with subject or body 'help' to
> > >> >> >>> >         pyt...@li...
> > >> >> >>> >
> > >> >> >>> > You can reach the person managing the list at
> > >> >> >>> >         pyt...@li...
> > >> >> >>> >
> > >> >> >>> > When replying, please edit your Subject line so it is more
> > >> specific
> > >> >> >>> > than "Re: Contents of Pytables-users digest..."
> > >> >> >>> >
> > >> >> >>> >
> > >> >> >>> > Today's Topics:
> > >> >> >>> >
> > >> >> >>> >    1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony
> > >> Scopatz)
> > >> >> >>> >    2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony
> > >> Scopatz)
> > >> >> >>> >
> > >> >> >>> >
> > >> >> >>> >
> > >> >>
> > ----------------------------------------------------------------------
> > >> >> >>> >
> > >> >> >>> > Message: 1
> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600
> > >> >> >>> > From: Anthony Scopatz <sc...@gm...>
> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80,
> > >> Issue 3
> > >> >> >>> > To: Discussion list for PyTables
> > >> >> >>> >         <pyt...@li...>
> > >> >> >>> > Message-ID:
> > >> >> >>> >         <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ=
> > >> >> >>> > Gz...@ma...>
> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1"
> > >> >> >>> >
> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed <
> > >> dav...@gm...>
> > >> >> >>> wrote:
> > >> >> >>> >
> > >> >> >>> > > Thanks a lot for the help so far guys!
> > >> >> >>> > >
> > >> >> >>> > > Looking at itertools, I found what I believe to be the
> > perfect
> > >> >> >>> function
> > >> >> >>> > > for what I need, itertools.combinations. This appears to
> be a
> > >> >> valid
> > >> >> >>> > > replacement to the method proposed.
> > >> >> >>> > >
> > >> >> >>> >
> > >> >> >>> > Yes, combinations is awesome!
> > >> >> >>> >
> > >> >> >>> >
> > >> >> >>> > >
> > >> >> >>> > > There is a small problem that I didn't mention is that my
> > >> compare
> > >> >> >>> > function
> > >> >> >>> > > actually takes as inputs 2 columns from the table. Like so:
> > >> >> >>> > >
> > >> >> >>> > > D = np.empty((N_irises, N_irises))
> > >> >> >>> > > for ii in xrange(N_elements):
> > >> >> >>> > >     for jj in xrange(ii+1, N_elements):
> > >> >> >>> > >          D[ii, jj] = compare(data['element1'][ii],
> > >> >> >>> > data['element1'][jj],data['element2'][ii],
> > >> >> >>> > > data['element2'][jj])
> > >> >> >>> > >
> > >> >> >>> > > Is there an efficient way of using itertools with this
> > >> structure?
> > >> >> >>> > >
> > >> >> >>> >
> > >> >> >>> > You can always make two other iterators for each column.
>  Since
> > >> you
> > >> >> >>> have
> > >> >> >>> > two columns you would have 4 iterators.  I am not sure how
> fast
> > >> >> this is
> > >> >> >>> > going to be but I am confident that there is definitely a way
> > to
> > >> do
> > >> >> >>> this in
> > >> >> >>> > one for-loop, which is going to be way faster than nested
> > loops.
> > >> >> >>> >
> > >> >> >>> > Be Well
> > >> >> >>> > Anthony
> > >> >> >>> >
> > >> >> >>> >
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, <
> > >> >> >>> > > pyt...@li...> wrote:
> > >> >> >>> > >
> > >> >> >>> > >> Send Pytables-users mailing list submissions to
> > >> >> >>> > >>         pyt...@li...
> > >> >> >>> > >>
> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit
> > >> >> >>> > >>
> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >> or, via email, send a message with subject or body 'help'
> to
> > >> >> >>> > >>         pyt...@li...
> > >> >> >>> > >>
> > >> >> >>> > >> You can reach the person managing the list at
> > >> >> >>> > >>         pyt...@li...
> > >> >> >>> > >>
> > >> >> >>> > >> When replying, please edit your Subject line so it is more
> > >> >> specific
> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..."
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >> Today's Topics:
> > >> >> >>> > >>
> > >> >> >>> > >>    1. Re: Nested Iteration of HDF5 using PyTables (Josh
> > Ayers)
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>>
> > >> ----------------------------------------------------------------------
> > >> >> >>> > >>
> > >> >> >>> > >> Message: 1
> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800
> > >> >> >>> > >> From: Josh Ayers <jos...@gm...>
> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5
> using
> > >> >> >>> PyTables
> > >> >> >>> > >> To: Discussion list for PyTables
> > >> >> >>> > >>         <pyt...@li...>
> > >> >> >>> > >> Message-ID:
> > >> >> >>> > >>         <
> > >> >> >>> > >>
> > >> >> CAC...@ma...
> >
> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1"
> > >> >> >>> > >>
> > >> >> >>> > >> David,
> > >> >> >>> > >>
> > >> >> >>> > >> The change in issue 27 was only for iteration over a
> > >> >> tables.Column
> > >> >> >>> > >> instance.  To use it, tweak Anthony's code as follows.
>  This
> > >> will
> > >> >> >>> > iterate
> > >> >> >>> > >> over the "element" column, as in your original example.
> > >> >> >>> > >>
> > >> >> >>> > >> Note also that this will only work with the development
> > >> version
> > >> >> of
> > >> >> >>> > >> PyTables
> > >> >> >>> > >> available on github.  It will be very slow using the
> > released
> > >> >> >>> v2.4.0.
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >> from itertools import izip
> > >> >> >>> > >>
> > >> >> >>> > >> with tb.openFile(...) as f:
> > >> >> >>> > >>     data = f.root.data.cols.element
> > >> >> >>> > >>     data_i = iter(data)
> > >> >> >>> > >>     data_j = iter(data)
> > >> >> >>> > >>     data_i.next() # throw the first value away
> > >> >> >>> > >>     for i, j in izip(data_i, data_j):
> > >> >> >>> > >>         compare(i, j)
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >> Hope that helps,
> > >> >> >>> > >> Josh
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <
> > >> >> sc...@gm...>
> > >> >> >>> > >> wrote:
> > >> >> >>> > >>
> > >> >> >>> > >> > HI David,
> > >> >> >>> > >> >
> > >> >> >>> > >> > Tables and table column iteration have been overhauled
> > >> fairly
> > >> >> >>> recently
> > >> >> >>> > >> > [1].  So you might try creating two iterators, offset by
> > >> one,
> > >> >> and
> > >> >> >>> then
> > >> >> >>> > >> > doing the comparison.  I am hacking this out super quick
> > so
> > >> >> please
> > >> >> >>> > >> forgive
> > >> >> >>> > >> > me:
> > >> >> >>> > >> >
> > >> >> >>> > >> > from itertools import izip
> > >> >> >>> > >> >
> > >> >> >>> > >> > with tb.openFile(...) as f:
> > >> >> >>> > >> >     data = f.root.data
> > >> >> >>> > >> >     data_i = iter(data)
> > >> >> >>> > >> >     data_j = iter(data)
> > >> >> >>> > >> >     data_i.next() # throw the first value away
> > >> >> >>> > >> >     for i, j in izip(data_i, data_j):
> > >> >> >>> > >> >         compare(i, j)
> > >> >> >>> > >> >
> > >> >> >>> > >> > You get the idea ;)
> > >> >> >>> > >> >
> > >> >> >>> > >> > Be Well
> > >> >> >>> > >> > Anthony
> > >> >> >>> > >> >
> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <
> > >> >> >>> dav...@gm...>
> > >> >> >>> > >> wrote:
> > >> >> >>> > >> >
> > >> >> >>> > >> >> I was hoping someone could help me out here.
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> This is from a post I put up on StackOverflow,
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5
> and
> > >> >> access
> > >> >> >>> > using
> > >> >> >>> > >> >> PyTables. One operation I need to do on this dataset
> are
> > >> >> pairwise
> > >> >> >>> > >> >> comparisons between each of the elements. This
> requires 2
> > >> >> loops,
> > >> >> >>> one
> > >> >> >>> > to
> > >> >> >>> > >> >> iterate over each element, and an inner loop to iterate
> > >> over
> > >> >> >>> every
> > >> >> >>> > >> other
> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2
> > comparisons.
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> For fairly small sets I found it to be faster to dump
> the
> > >> >> >>> contents
> > >> >> >>> > >> into a
> > >> >> >>> > >> >> multdimensional numpy array and then do my iteration. I
> > run
> > >> >> into
> > >> >> >>> > >> problems
> > >> >> >>> > >> >> with large sets because of memory issues and need to
> > access
> > >> >> each
> > >> >> >>> > >> element of
> > >> >> >>> > >> >> the dataset at run time.
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> Putting the elements into an array gives me about 600
> > >> >> >>> comparisons per
> > >> >> >>> > >> >> second, while operating on hdf5 data itself gives me
> > about
> > >> 300
> > >> >> >>> > >> comparisons
> > >> >> >>> > >> >> per second.
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> Is there a way to speed this process up?
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> Example follows (this is not my real code, just an
> > >> example):
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> *Small Set*:
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f:
> > >> >> >>> > >> >>     data = f.root.data
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>     N_elements = len(data)
> > >> >> >>> > >> >>     elements = np.empty((N_irises, 1e5))
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>     for ii, d in enumerate(data):
> > >> >> >>> > >> >>         elements[ii] = data['element']
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises))  for ii in
> > >> >> xrange(N_elements):
> > >> >> >>> > >> >>     for jj in xrange(ii+1, N_elements):
> > >> >> >>> > >> >>         D[ii, jj] = compare(elements[ii], elements[jj])
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>  *Large Set*:
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f:
> > >> >> >>> > >> >>     data = f.root.data
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>     N_elements = len(data)
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>     D = np.empty((N_irises, N_irises))
> > >> >> >>> > >> >>     for ii in xrange(N_elements):
> > >> >> >>> > >> >>         for jj in xrange(ii+1, N_elements):
> > >> >> >>> > >> >>              D[ii, jj] = compare(data['element'][ii],
> > >> >> >>> > >> data['element'][jj])
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C#
> 2012,
> > >> >> HTML5,
> > >> >> >>> CSS,
> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep
> your
> > >> >> skills
> > >> >> >>> > current
> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials
> by
> > >> >> >>> Microsoft
> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn more
> > at:
> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712
> > >> >> >>> > >> >> _______________________________________________
> > >> >> >>> > >> >> Pytables-users mailing list
> > >> >> >>> > >> >> Pyt...@li...
> > >> >> >>> > >> >>
> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >> >>
> > >> >> >>> > >> >>
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C#
> 2012,
> > >> >> HTML5,
> > >> >> >>> CSS,
> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep your
> > >> skills
> > >> >> >>> > current
> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials by
> > >> >> Microsoft
> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more
> > at:
> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712
> > >> >> >>> > >> > _______________________________________________
> > >> >> >>> > >> > Pytables-users mailing list
> > >> >> >>> > >> > Pyt...@li...
> > >> >> >>> > >> >
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> -------------- next part --------------
> > >> >> >>> > >> An HTML attachment was scrubbed...
> > >> >> >>> > >>
> > >> >> >>> > >> ------------------------------
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012,
> > >> HTML5,
> > >> >> >>> CSS,
> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your
> > >> skills
> > >> >> >>> current
> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by
> > >> >> Microsoft
> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more
> at:
> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712
> > >> >> >>> > >>
> > >> >> >>> > >> ------------------------------
> > >> >> >>> > >>
> > >> >> >>> > >> _______________________________________________
> > >> >> >>> > >> Pytables-users mailing list
> > >> >> >>> > >> Pyt...@li...
> > >> >> >>> > >>
> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3
> > >> >> >>> > >> *********************************************
> > >> >> >>> > >>
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012,
> > >> HTML5,
> > >> >> CSS,
> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your
> > skills
> > >> >> >>> current
> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by
> > >> Microsoft
> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more at:
> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712
> > >> >> >>> > > _______________________________________________
> > >> >> >>> > > Pytables-users mailing list
> > >> >> >>> > > Pyt...@li...
> > >> >> >>> > >
> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > -------------- next part --------------
> > >> >> >>> > An HTML attachment was scrubbed...
> > >> >> >>> >
> > >> >> >>> > ------------------------------
> > >> >> >>> >
> > >> >> >>> > Message: 2
> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600
> > >> >> >>> > From: Anthony Scopatz <sc...@gm...>
> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80,
> > >> Issue 4
> > >> >> >>> > To: Discussion list for PyTables
> > >> >> >>> >         <pyt...@li...>
> > >> >> >>> > Message-ID:
> > >> >> >>> >         <
> > >> >> >>> >
> > >> CAP...@ma...>
> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1"
> > >> >> >>> >
> > >> >> >>> > Josh is right that you can just edit the code by hand (which
> > >> works
> > >> >> but
> > >> >> >>> > sucks).
> > >> >> >>> >
> > >> >> >>> > However, on Windows -- on the rare occasion when I also have
> to
> > >> >> >>> develop on
> > >> >> >>> > it -- I typically use a distribution that includes a
> compiler,
> > >> >> cython,
> > >> >> >>> > hdf5, and pytables already and then I install my development
> > >> version
> > >> >> >>> from
> > >> >> >>> > github OVER this.  I recommend either EPD or Anaconda, though
> > >> other
> > >> >> >>> > distributions listed here [1] might also work.
> > >> >> >>> >
> > >> >> >>> > Be well
> > >> >> >>> > Anthony
> > >> >> >>> >
> > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/
> > >> >> >>> >
> > >> >> >>> >
> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <
> > jos...@gm...
> > >> >
> > >> >> >>> wrote:
> > >> >> >>> >
> > >> >> >>> > > The change was in pure Python code, so you should be able
> to
> > >> just
> > >> >> >>> paste
> > >> >> >>> > in
> > >> >> >>> > > the changes to your local copy.  Start with the
> > >> >> table.Column.__iter__
> > >> >> >>> > > method (lines 3296-3310) here.
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py
> > >> >> >>> > >
> > >> >> >>> > > It needs to be modified slightly because it uses some
> > >> additional
> > >> >> >>> features
> > >> >> >>> > > that aren't available in the released version (the
> > >> out=buf_slice
> > >> >> >>> argument
> > >> >> >>> > > to table.read).  The following should work.
> > >> >> >>> > >
> > >> >> >>> > > def __iter__(self):
> > >> >> >>> > >         table = self.table
> > >> >> >>> > >         itemsize = self.dtype.itemsize
> > >> >> >>> > >         nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE']
> > //
> > >> >> >>> itemsize
> > >> >> >>> > >         max_row = len(self)
> > >> >> >>> > >         for start_row in xrange(0, len(self), nrowsinbuf):
> > >> >> >>> > >             end_row = min([start_row + nrowsinbuf,
> max_row])
> > >> >> >>> > >             buf = table.read(start_row, end_row, 1,
> > >> >> >>> field=self.pathname)
> > >> >> >>> > >             for row in buf:
> > >> >> >>> > >                 yield row
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > > I haven't tested this, but I think it will work.
> > >> >> >>> > >
> > >> >> >>> > > Josh
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > >
> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed <
> > >> >> dav...@gm...>
> > >> >> >>> > wrote:
> > >> >> >>> > >
> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm
> > forced
> > >> to
> > >> >> >>> work on
> > >> >> >>> > >> Windows 7 at work and have never had luck compiling python
> > >> source
> > >> >> >>> > >> successfully.  I have had to rely on precompiled binaries
> > and
> > >> now
> > >> >> >>> its
> > >> >> >>> > >> biting me in the butt.
> > >> >> >>> > >>
> > >> >> >>> > >> Is there any quick fix I can do to improve this iteration
> > >> using
> > >> >> >>> v2.4.0?
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, <
> > >> >> >>> > >> pyt...@li...> wrote:
> > >> >> >>> > >>
> > >> >> >>> > >>> Send Pytables-users mailing list submissions to
> > >> >> >>> > >>>         pyt...@li...
> > >> >> >>> > >>>
> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit
> > >> >> >>> > >>>
> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >>> or, via email, send a message with subject or body 'help'
> > to
> > >> >> >>> > >>>         pyt...@li...
> > >> >> >>> > >>>
> > >> >> >>> > >>> You can reach the person managing the list at
> > >> >> >>> > >>>         pyt...@li...
> > >> >> >>> > >>>
> > >> >> >>> > >>> When replying, please edit your Subject line so it is
> more
> > >> >> specific
> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..."
> > >> >> >>> > >>>
> > >> >> >>> > >>>
> > >> >> >>> > >>> Today's Topics:
> > >> >> >>> > >>>
> > >> >> >>> > >>>    1. Re: Pytables-users Digest, Vol 80, Issue 2 (David
> > Reed)
> > >> >> >>> > >>>    2. Re: Pytables-users Digest, Vol 80, Issue 3 (David
> > Reed)
> > >> >> >>> > >>>
> > >> >> >>> > >>>
> > >> >> >>> > >>>
> > >> >> >>>
> > >> ----------------------------------------------------------------------
> > >> >> >>> > >>>
> > >> >> >>> > >>> Message: 1
> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500
> > >> >> >>> > >>> From: David Reed <dav...@gm...>
> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol
> > 80,
> > >> >> Issue
> > >> >> >>> 2
> > >> >> >>> > >>> To: pyt...@li...
> > >> >> >>> > >>> Message-ID:
> > >> >> >>> > >>>         <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
> > >> >> >>> > >>> ev...@ma...>
> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1"
> > >> >> >>> > >>>
> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't
> > think
> > >> >> that
> > >> >> >>> > method
> > >> >> >>> > >>> will work since this will only be comparing the ith
> element
> > >> with
> > >> >> >>> ith+1
> > >> >> >>> > >>> element.  I still need 2 for loops right?
> > >> >> >>> > >>>
> > >> >> >>> > >>> Using itertools might speed things up though, I've never
> > used
> > >> >> them
> > >> >> >>> so I
> > >> >> >>> > >>> will give it a shot and let you know how it goes.  Looks
> > >> like I
> > >> >> >>> need to
> > >> >> >>> > >>> download the latest release before I do that too.  Thanks
> > for
> > >> >> the
> > >> >> >>> help.
> > >> >> >>> > >>>
> > >> >> >>> > >>> -Dave
> > >> >> >>> > >>>
> > >> >> >>> > >>>
> > >> >> >>> > >>>
> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, <
> > >> >> >>> > >>> pyt...@li...> wrote:
> > >> >> >>> > >>>
> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to
> > >> >> >>> > >>> >         pyt...@li...
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web,
> visit
> > >> >> >>> > >>> >
> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >>> > or, via email, send a message with subject or body
> 'help'
> > >> to
> > >> >> >>> > >>> >         pyt...@li...
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > You can reach the person managing the list at
> > >> >> >>> > >>> >         pyt...@li...
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > When replying, please edit your Subject line so it is
> > more
> > >> >> >>> specific
> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..."
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > Today's Topics:
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >    1. Re: Nested Iteration of HDF5 using PyTables
> > (Anthony
> > >> >> >>> Scopatz)
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> >
> > >> >>
> > ----------------------------------------------------------------------
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > Message: 1
> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600
> > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...>
> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5
> > >> using
> > >> >> >>> PyTables
> > >> >> >>> > >>> > To: Discussion list for PyTables
> > >> >> >>> > >>> >         <pyt...@li...>
> > >> >> >>> > >>> > Message-ID:
> > >> >> >>> > >>> >         <CAPk-6T5b=
> > >> >> >>> > >>> >
> 1EG...@ma...
> > >
> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1"
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > HI David,
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > Tables and table column iteration have been overhauled
> > >> fairly
> > >> >> >>> > recently
> > >> >> >>> > >>> [1].
> > >> >> >>> > >>> >  So you might try creating two iterators, offset by
> one,
> > >> and
> > >> >> then
> > >> >> >>> > >>> doing the
> > >> >> >>> > >>> > comparison.  I am hacking this out super quick so
> please
> > >> >> forgive
> > >> >> >>> me:
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > from itertools import izip
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > with tb.openFile(...) as f:
> > >> >> >>> > >>> >     data = f.root.data
> > >> >> >>> > >>> >     data_i = iter(data)
> > >> >> >>> > >>> >     data_j = iter(data)
> > >> >> >>> > >>> >     data_i.next() # throw the first value away
> > >> >> >>> > >>> >     for i, j in izip(data_i, data_j):
> > >> >> >>> > >>> >         compare(i, j)
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > You get the idea ;)
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > Be Well
> > >> >> >>> > >>> > Anthony
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <
> > >> >> >>> dav...@gm...>
> > >> >> >>> > >>> wrote:
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > > I was hoping someone could help me out here.
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > This is from a post I put up on StackOverflow,
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > I am have a fairly large dataset that I store in HDF5
> > and
> > >> >> >>> access
> > >> >> >>> > >>> using
> > >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset
> > are
> > >> >> >>> pairwise
> > >> >> >>> > >>> > > comparisons between each of the elements. This
> > requires 2
> > >> >> >>> loops,
> > >> >> >>> > one
> > >> >> >>> > >>> to
> > >> >> >>> > >>> > > iterate over each element, and an inner loop to
> iterate
> > >> over
> > >> >> >>> every
> > >> >> >>> > >>> other
> > >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2
> > >> comparisons.
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > For fairly small sets I found it to be faster to dump
> > the
> > >> >> >>> contents
> > >> >> >>> > >>> into a
> > >> >> >>> > >>> > > multdimensional numpy array and then do my
> iteration. I
> > >> run
> > >> >> >>> into
> > >> >> >>> > >>> problems
> > >> >> >>> > >>> > > with large sets because of memory issues and need to
> > >> access
> > >> >> >>> each
> > >> >> >>> > >>> element
> > >> >> >>> > >>> > of
> > >> >> >>> > >>> > > the dataset at run time.
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > Putting the elements into an array gives me about 600
> > >> >> >>> comparisons
> > >> >> >>> > per
> > >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me
> > >> about
> > >> >> 300
> > >> >> >>> > >>> > comparisons
> > >> >> >>> > >>> > > per second.
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > Is there a way to speed this process up?
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > Example follows (this is not my real code, just an
> > >> example):
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > *Small Set*:
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f:
> > >> >> >>> > >>> > >     data = f.root.data
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >     N_elements = len(data)
> > >> >> >>> > >>> > >     elements = np.empty((N_irises, 1e5))
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >     for ii, d in enumerate(data):
> > >> >> >>> > >>> > >         elements[ii] = data['element']
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises))  for ii in
> > >> >> >>> xrange(N_elements):
> > >> >> >>> > >>> > >     for jj in xrange(ii+1, N_elements):
> > >> >> >>> > >>> > >         D[ii, jj] = compare(elements[ii],
> elements[jj])
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >  *Large Set*:
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f:
> > >> >> >>> > >>> > >     data = f.root.data
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >     N_elements = len(data)
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >     D = np.empty((N_irises, N_irises))
> > >> >> >>> > >>> > >     for ii in xrange(N_elements):
> > >> >> >>> > >>> > >         for jj in xrange(ii+1, N_elements):
> > >> >> >>> > >>> > >              D[ii, jj] = compare(data['element'][ii],
> > >> >> >>> > >>> > data['element'][jj])
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> >
> > >> >> >>> > >>>
> > >> >> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C#
> > 2012,
> > >> >> >>> HTML5,
> > >> >> >>> > CSS,
> > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep
> > your
> > >> >> skills
> > >> >> >>> > >>> current
> > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video tutorials
> > by
> > >> >> >>> Microsoft
> > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn
> more
> > >> at:
> > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712
> > >> >> >>> > >>> > > _______________________________________________
> > >> >> >>> > >>> > > Pytables-users mailing list
> > >> >> >>> > >>> > > Pyt...@li...
> > >> >> >>> > >>> > >
> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > -------------- next part --------------
> > >> >> >>> > >>> > An HTML attachment was scrubbed...
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > ------------------------------
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>>
> > >> >> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C#
> 2012,
> > >> >> HTML5,
> > >> >> >>> CSS,
> > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep
> your
> > >> >> skills
> > >> >> >>> > current
> > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials
> by
> > >> >> >>> Microsoft
> > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn more
> > at:
> > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > ------------------------------
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > _______________________________________________
> > >> >> >>> > >>> > Pytables-users mailing list
> > >> >> >>> > >>> > Pyt...@li...
> > >> >> >>> > >>> >
> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2
> > >> >> >>> > >>> > *********************************************
> > >> >> >>> > >>> >
> > >> >> >>> > >>> -------------- next part --------------
> > >> >> >>> > >>> An HTML attachment was scrubbed...
> > >> >> >>> > >>>
> > >> >> >>> > >>> ------------------------------
> > >> >> >>> > >>>
> > >> >> >>> > >>> Message: 2
> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500
> > >> >> >>> > >>> From: David Reed <dav...@gm...>
> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol
> > 80,
> > >> >> Issue
> > >> >> >>> 3
> > >> >> >>> > >>> To: pyt...@li...
> > >> >> >>> > >>> Message-ID:
> > >> >> >>> > >>>         <
> > >> >> >>> > >>>
> > >> >> CAM...@ma...
> > >> >> >>> >
> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1"
> > >> >> >>> > >>>
> > >> >> >>> > >>> Thanks a lot for the help so far guys!
> > >> >> >>> > >>>
> > >> >> >>> > >>> Looking at itertools, I found what I believe to be the
> > >> perfect
> > >> >> >>> function
> > >> >> >>> > >>> for
> > >> >> >>> > >>> what I need, itertools.combinations. This appears to be a
> > >> valid
> > >> >> >>> > >>> replacement
> > >> >> >>> > >>> to the method proposed.
> > >> >> >>> > >>>
> > >> >> >>> > >>> There is a small problem that I didn't mention is that my
> > >> >> compare
> > >> >> >>> > >>> function
> > >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like
> so:
> > >> >> >>> > >>>
> > >> >> >>> > >>> D = np.empty((N_irises, N_irises))
> > >> >> >>> > >>> for ii in xrange(N_elements):
> > >> >> >>> > >>>     for jj in xrange(ii+1, N_elements):
> > >> >> >>> > >>>          D[ii, jj] = compare(data['element1'][ii],
> > >> >> >>> > >>> data['element1'][jj],data['element2'][ii],
> > >> >> >>> > >>> data['element2'][jj])
> > >> >> >>> > >>>
> > >> >> >>> > >>> Is there an efficient way of using itertools with this
> > >> >> structure?
> > >> >> >>> > >>>
> > >> >> >>> > >>>
> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, <
> > >> >> >>> > >>> pyt...@li...> wrote:
> > >> >> >>> > >>>
> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to
> > >> >> >>> > >>> >         pyt...@li...
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web,
> visit
> > >> >> >>> > >>> >
> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >> >> >>> > >>> > or, via email, send a message with subject or body
> 'help'
> > >> to
> > >> >> >>> > >>> >         pyt...@li...
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > You can reach the person managing the list at
> > >> >> >>> > >>> >         pyt...@li...
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > When replying, please edit your Subject line so it is
> > more
> > >> >> >>> specific
> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..."
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > Today's Topics:
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >    1. Re: Nested Iteration of HDF5 using PyTables (Josh
> > >> Ayers)
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> >
> > >> >>
> > ----------------------------------------------------------------------
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > Message: 1
> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 10:29:33 -0800
> > >> >> >>> > >>> > From: Josh Ayers <jos...@gm...>
> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5
> > >> using
> > >> >> >>> PyTables
> > >> >> >>> > >>> > To: Discussion list for PyTables
> > >> >> >>> > >>> >         <pyt...@li...>
> > >> >> >>> > >>> > Message-ID:
> > >> >> >>> > >>> >         <
> > >> >> >>> > >>> >
> > >> >> >>>
> > CAC...@ma...
> > >> >
> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1"
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > David,
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > The change in issue 27 was only for iteration over a
> > >> >> >>> tables.Column
> > >> >> >>> > >>> > instance.  To use it, tweak Anthony's code as follows.
> > >>  This
> > >> >> will
> > >> >> >>> > >>> iterate
> > >> >> >>> > >>> > over the "element" column, as in your original example.
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > Note also that this will only work with the development
> > >> >> version
> > >> >> >>> of
> > >> >> >>> > >>> PyTables
> > >> >> >>> > >>> > available on github.  It will be very slow using the
> > >> released
> > >> >> >>> v2.4.0.
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > from itertools import izip
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > with tb.openFile(...) as f:
> > >> >> >>> > >>> >     data = f.root.data.cols.element
> > >> >> >>> > >>> >     data_i = iter(data)
> > >> >> >>> > >>> >     data_j = iter(data)
> > >> >> >>> > >>> >     data_i.next() # throw the first value away
> > >> >> >>> > >>> >     for i, j in izip(data_i, data_j):
> > >> >> >>> > >>> >         compare(i, j)
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > Hope that helps,
> > >> >> >>> > >>> > Josh
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <
> > >> >> >>> sc...@gm...>
> > >> >> >>> > >>> wrote:
> > >> >> >>> > >>> >
> > >> >> >>> > >>> > > HI David,
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > Tables and table column iteration have been
> overhauled
> > >> >> fairly
> > >> >> >>> > >>> recently
> > >> >> >>> > >>> > > [1].  So you might try creating two iterators, offset
> > by
> > >> >> one,
> > >> >> >>> and
> > >> >> >>> > >>> then
> > >> >> >>> > >>> > > doing the comparison.  I am hacking this out super
> > quick
> > >> so
> > >> >> >>> please
> > >> >> >>> > >>> > forgive
> > >> >> >>> > >>> > > me:
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > from itertools import izip
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > with tb.openFile(...) as f:
> > >> >> >>> > >>> > >     data = f.root.data
> > >> >> >>> > >>> > >     data_i = iter(data)
> > >> >> >>> > >>> > >     data_j = iter(data)
> > >> >> >>> > >>> > >     data_i.next() # throw the first value away
> > >> >> >>> > >>> > >     for i, j in izip(data_i, data_j):
> > >> >> >>> > >>> > >         compare(i, j)
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > You get the idea ;)
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > Be Well
> > >> >> >>> > >>> > > Anthony
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > 1. https://github.com/PyTables/PyTables/issues/27
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <
> > >> >> >>> dav...@gm...
> > >> >> >>> > >
> > >> >> >>> > >>> > wrote:
> > >> >> >>> > >>> > >
> > >> >> >>> > >>> > >> I was hoping someone could help me out here.
> > >> >> >>> > >>> > >>
> > >> >> >>> > >>> > >> This is from a post I put up on StackOverflow,
> > >> >> >>> > >>> > >>
> > >> >> >>> > >>> > >> I am have a fairly large dataset that I store in
> HDF5
> > >> and
> > >> >> >>> access
> > >> >> >>> > >>> using
> > >> >> >>> > >>> > >> PyTables. One operation I need to do on this dataset
> > are
> > >> >> >>> pairwise
> > >> >> >>> > >>> > >> comparisons between each of the elements. This
> > requires
> > >> 2
> > >> >> >>> loops,
> > >> >> >>> > >>> one to
> > >> >> >>> > >>> > >> iterate over each element, and an inner loop to
> > iterate
> > >> >> over
> > >> >> >>> every
> > >> >> >>> > >>> other
> > >> >> >>> > >>> > >> element. This operation thus looks at N(N-1)/2
> > >> comparisons.
> > >> >> >>> > >>> > >>
> > >> >> >>> > >>> > >> For fairly small sets I found it to be faster to
> dump
> > >> the
> > >> >> >>> contents
> > >> >> >>> > >>> into
> > >> >> >>> > >>> > a
> > >> >> >>> > >>> > >> multdimensional numpy array and then do my
> iteration.
> > I
> > >> run
> > >> >> >>> into
> > >> >> >>> > >>> > problems
> > >> >> >>> > >>> > >> with large sets because of memory issues and need to
> > >> access
> > >> >> >>> each
> > >> >> >>> > >>> > element of
> > >> >> >>> > >>> > >> the dataset at run time.
> > >> >> >>> > >>> > >>
> > >> >> >>> > >>> > >> Putting the elements into an array gives me ...

[truncated message content]

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

pytables-users Mailing List for PyTables - Hierarchical datasets (Page 11)

pytables-users — PyTables users discussion list