Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I apologize if I'm starting to sound helpless, but I'm forced to work on
Windows 7 at work and have never had luck compiling python source
successfully.  I have had to rely on precompiled binaries and now its
biting me in the butt.

Is there any quick fix I can do to improve this iteration using v2.4.0?

On Thu, Jan 3, 2013 at 3:17 PM, <
pyt...@li...> wrote:

> Send Pytables-users mailing list submissions to
>         pyt...@li...
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> or, via email, send a message with subject or body 'help' to
>         pyt...@li...
>
> You can reach the person managing the list at
>         pyt...@li...
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Pytables-users digest..."
>
>
> Today's Topics:
>
>    1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed)
>    2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 3 Jan 2013 13:44:29 -0500
> From: David Reed <dav...@gm...>
> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2
> To: pyt...@li...
> Message-ID:
>         <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
> ev...@ma...>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thanks Anthony, but unless Im missing something I don't think that method
> will work since this will only be comparing the ith element with ith+1
> element.  I still need 2 for loops right?
>
> Using itertools might speed things up though, I've never used them so I
> will give it a shot and let you know how it goes.  Looks like I need to
> download the latest release before I do that too.  Thanks for the help.
>
> -Dave
>
>
>
> On Thu, Jan 3, 2013 at 12:12 PM, <
> pyt...@li...> wrote:
>
> > Send Pytables-users mailing list submissions to
> >         pyt...@li...
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
> > or, via email, send a message with subject or body 'help' to
> >         pyt...@li...
> >
> > You can reach the person managing the list at
> >         pyt...@li...
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Pytables-users digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Thu, 3 Jan 2013 11:11:47 -0600
> > From: Anthony Scopatz <sc...@gm...>
> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
> > To: Discussion list for PyTables
> >         <pyt...@li...>
> > Message-ID:
> >         <CAPk-6T5b=
> > 1EG...@ma...>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > HI David,
> >
> > Tables and table column iteration have been overhauled fairly recently
> [1].
> >  So you might try creating two iterators, offset by one, and then doing
> the
> > comparison.  I am hacking this out super quick so please forgive me:
> >
> > from itertools import izip
> >
> > with tb.openFile(...) as f:
> >     data = f.root.data
> >     data_i = iter(data)
> >     data_j = iter(data)
> >     data_i.next() # throw the first value away
> >     for i, j in izip(data_i, data_j):
> >         compare(i, j)
> >
> > You get the idea ;)
> >
> > Be Well
> > Anthony
> >
> > 1. https://github.com/PyTables/PyTables/issues/27
> >
> >
> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>
> wrote:
> >
> > > I was hoping someone could help me out here.
> > >
> > > This is from a post I put up on StackOverflow,
> > >
> > > I am have a fairly large dataset that I store in HDF5 and access using
> > > PyTables. One operation I need to do on this dataset are pairwise
> > > comparisons between each of the elements. This requires 2 loops, one to
> > > iterate over each element, and an inner loop to iterate over every
> other
> > > element. This operation thus looks at N(N-1)/2 comparisons.
> > >
> > > For fairly small sets I found it to be faster to dump the contents
> into a
> > > multdimensional numpy array and then do my iteration. I run into
> problems
> > > with large sets because of memory issues and need to access each
> element
> > of
> > > the dataset at run time.
> > >
> > > Putting the elements into an array gives me about 600 comparisons per
> > > second, while operating on hdf5 data itself gives me about 300
> > comparisons
> > > per second.
> > >
> > > Is there a way to speed this process up?
> > >
> > > Example follows (this is not my real code, just an example):
> > >
> > > *Small Set*:
> > >
> > >
> > > with tb.openFile(h5_file, 'r') as f:
> > >     data = f.root.data
> > >
> > >     N_elements = len(data)
> > >     elements = np.empty((N_irises, 1e5))
> > >
> > >     for ii, d in enumerate(data):
> > >         elements[ii] = data['element']
> > >
> > > D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
> > >     for jj in xrange(ii+1, N_elements):
> > >         D[ii, jj] = compare(elements[ii], elements[jj])
> > >
> > >  *Large Set*:
> > >
> > >
> > > with tb.openFile(h5_file, 'r') as f:
> > >     data = f.root.data
> > >
> > >     N_elements = len(data)
> > >
> > >     D = np.empty((N_irises, N_irises))
> > >     for ii in xrange(N_elements):
> > >         for jj in xrange(ii+1, N_elements):
> > >              D[ii, jj] = compare(data['element'][ii],
> > data['element'][jj])
> > >
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > > MVPs and experts. ON SALE this month only -- learn more at:
> > > http://p.sf.net/sfu/learnmore_122712
> > > _______________________________________________
> > > Pytables-users mailing list
> > > Pyt...@li...
> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> >
> >
> ------------------------------------------------------------------------------
> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > MVPs and experts. ON SALE this month only -- learn more at:
> > http://p.sf.net/sfu/learnmore_122712
> >
> > ------------------------------
> >
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> > End of Pytables-users Digest, Vol 80, Issue 2
> > *********************************************
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Thu, 3 Jan 2013 15:17:01 -0500
> From: David Reed <dav...@gm...>
> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3
> To: pyt...@li...
> Message-ID:
>         <
> CAM...@ma...>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thanks a lot for the help so far guys!
>
> Looking at itertools, I found what I believe to be the perfect function for
> what I need, itertools.combinations. This appears to be a valid replacement
> to the method proposed.
>
> There is a small problem that I didn't mention is that my compare function
> actually takes as inputs 2 columns from the table. Like so:
>
> D = np.empty((N_irises, N_irises))
> for ii in xrange(N_elements):
>     for jj in xrange(ii+1, N_elements):
>          D[ii, jj] = compare(data['element1'][ii],
> data['element1'][jj],data['element2'][ii],
> data['element2'][jj])
>
> Is there an efficient way of using itertools with this structure?
>
>
> On Thu, Jan 3, 2013 at 1:29 PM, <
> pyt...@li...> wrote:
>
> > Send Pytables-users mailing list submissions to
> >         pyt...@li...
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
> > or, via email, send a message with subject or body 'help' to
> >         pyt...@li...
> >
> > You can reach the person managing the list at
> >         pyt...@li...
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Pytables-users digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Thu, 3 Jan 2013 10:29:33 -0800
> > From: Josh Ayers <jos...@gm...>
> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
> > To: Discussion list for PyTables
> >         <pyt...@li...>
> > Message-ID:
> >         <
> > CAC...@ma...>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > David,
> >
> > The change in issue 27 was only for iteration over a tables.Column
> > instance.  To use it, tweak Anthony's code as follows.  This will iterate
> > over the "element" column, as in your original example.
> >
> > Note also that this will only work with the development version of
> PyTables
> > available on github.  It will be very slow using the released v2.4.0.
> >
> >
> > from itertools import izip
> >
> > with tb.openFile(...) as f:
> >     data = f.root.data.cols.element
> >     data_i = iter(data)
> >     data_j = iter(data)
> >     data_i.next() # throw the first value away
> >     for i, j in izip(data_i, data_j):
> >         compare(i, j)
> >
> >
> > Hope that helps,
> > Josh
> >
> >
> >
> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...>
> wrote:
> >
> > > HI David,
> > >
> > > Tables and table column iteration have been overhauled fairly recently
> > > [1].  So you might try creating two iterators, offset by one, and then
> > > doing the comparison.  I am hacking this out super quick so please
> > forgive
> > > me:
> > >
> > > from itertools import izip
> > >
> > > with tb.openFile(...) as f:
> > >     data = f.root.data
> > >     data_i = iter(data)
> > >     data_j = iter(data)
> > >     data_i.next() # throw the first value away
> > >     for i, j in izip(data_i, data_j):
> > >         compare(i, j)
> > >
> > > You get the idea ;)
> > >
> > > Be Well
> > > Anthony
> > >
> > > 1. https://github.com/PyTables/PyTables/issues/27
> > >
> > >
> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>
> > wrote:
> > >
> > >> I was hoping someone could help me out here.
> > >>
> > >> This is from a post I put up on StackOverflow,
> > >>
> > >> I am have a fairly large dataset that I store in HDF5 and access using
> > >> PyTables. One operation I need to do on this dataset are pairwise
> > >> comparisons between each of the elements. This requires 2 loops, one
> to
> > >> iterate over each element, and an inner loop to iterate over every
> other
> > >> element. This operation thus looks at N(N-1)/2 comparisons.
> > >>
> > >> For fairly small sets I found it to be faster to dump the contents
> into
> > a
> > >> multdimensional numpy array and then do my iteration. I run into
> > problems
> > >> with large sets because of memory issues and need to access each
> > element of
> > >> the dataset at run time.
> > >>
> > >> Putting the elements into an array gives me about 600 comparisons per
> > >> second, while operating on hdf5 data itself gives me about 300
> > comparisons
> > >> per second.
> > >>
> > >> Is there a way to speed this process up?
> > >>
> > >> Example follows (this is not my real code, just an example):
> > >>
> > >> *Small Set*:
> > >>
> > >>
> > >> with tb.openFile(h5_file, 'r') as f:
> > >>     data = f.root.data
> > >>
> > >>     N_elements = len(data)
> > >>     elements = np.empty((N_irises, 1e5))
> > >>
> > >>     for ii, d in enumerate(data):
> > >>         elements[ii] = data['element']
> > >>
> > >> D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
> > >>     for jj in xrange(ii+1, N_elements):
> > >>         D[ii, jj] = compare(elements[ii], elements[jj])
> > >>
> > >>  *Large Set*:
> > >>
> > >>
> > >> with tb.openFile(h5_file, 'r') as f:
> > >>     data = f.root.data
> > >>
> > >>     N_elements = len(data)
> > >>
> > >>     D = np.empty((N_irises, N_irises))
> > >>     for ii in xrange(N_elements):
> > >>         for jj in xrange(ii+1, N_elements):
> > >>              D[ii, jj] = compare(data['element'][ii],
> > data['element'][jj])
> > >>
> > >>
> > >>
> > >>
> >
> ------------------------------------------------------------------------------
> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
> current
> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > >> MVPs and experts. ON SALE this month only -- learn more at:
> > >> http://p.sf.net/sfu/learnmore_122712
> > >> _______________________________________________
> > >> Pytables-users mailing list
> > >> Pyt...@li...
> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >>
> > >>
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > > MVPs and experts. ON SALE this month only -- learn more at:
> > > http://p.sf.net/sfu/learnmore_122712
> > > _______________________________________________
> > > Pytables-users mailing list
> > > Pyt...@li...
> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> >
> >
> ------------------------------------------------------------------------------
> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > MVPs and experts. ON SALE this month only -- learn more at:
> > http://p.sf.net/sfu/learnmore_122712
> >
> > ------------------------------
> >
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> > End of Pytables-users Digest, Vol 80, Issue 3
> > *********************************************
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
>
> ------------------------------
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> End of Pytables-users Digest, Vol 80, Issue 4
> *********************************************
>