Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Josh is right that you can just edit the code by hand (which works but
sucks).

However, on Windows -- on the rare occasion when I also have to develop on
it -- I typically use a distribution that includes a compiler, cython,
hdf5, and pytables already and then I install my development version from
github OVER this.  I recommend either EPD or Anaconda, though other
distributions listed here [1] might also work.

Be well
Anthony

1. http://numfocus.org/projects-2/software-distributions/

On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers <jos...@gm...> wrote:

> The change was in pure Python code, so you should be able to just paste in
> the changes to your local copy.  Start with the table.Column.__iter__
> method (lines 3296-3310) here.
>
>
> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py
>
> It needs to be modified slightly because it uses some additional features
> that aren't available in the released version (the out=buf_slice argument
> to table.read).  The following should work.
>
> def __iter__(self):
>         table = self.table
>         itemsize = self.dtype.itemsize
>         nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize
>         max_row = len(self)
>         for start_row in xrange(0, len(self), nrowsinbuf):
>             end_row = min([start_row + nrowsinbuf, max_row])
>             buf = table.read(start_row, end_row, 1, field=self.pathname)
>             for row in buf:
>                 yield row
>
>
> I haven't tested this, but I think it will work.
>
> Josh
>
>
>
> On Thu, Jan 3, 2013 at 1:25 PM, David Reed <dav...@gm...> wrote:
>
>> I apologize if I'm starting to sound helpless, but I'm forced to work on
>> Windows 7 at work and have never had luck compiling python source
>> successfully.  I have had to rely on precompiled binaries and now its
>> biting me in the butt.
>>
>> Is there any quick fix I can do to improve this iteration using v2.4.0?
>>
>>
>> On Thu, Jan 3, 2013 at 3:17 PM, <
>> pyt...@li...> wrote:
>>
>>> Send Pytables-users mailing list submissions to
>>>         pyt...@li...
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>         https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> or, via email, send a message with subject or body 'help' to
>>>         pyt...@li...
>>>
>>> You can reach the person managing the list at
>>>         pyt...@li...
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of Pytables-users digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>    1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed)
>>>    2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed)
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Thu, 3 Jan 2013 13:44:29 -0500
>>> From: David Reed <dav...@gm...>
>>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2
>>> To: pyt...@li...
>>> Message-ID:
>>>         <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
>>> ev...@ma...>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> Thanks Anthony, but unless Im missing something I don't think that method
>>> will work since this will only be comparing the ith element with ith+1
>>> element.  I still need 2 for loops right?
>>>
>>> Using itertools might speed things up though, I've never used them so I
>>> will give it a shot and let you know how it goes.  Looks like I need to
>>> download the latest release before I do that too.  Thanks for the help.
>>>
>>> -Dave
>>>
>>>
>>>
>>> On Thu, Jan 3, 2013 at 12:12 PM, <
>>> pyt...@li...> wrote:
>>>
>>> > Send Pytables-users mailing list submissions to
>>> >         pyt...@li...
>>> >
>>> > To subscribe or unsubscribe via the World Wide Web, visit
>>> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> > or, via email, send a message with subject or body 'help' to
>>> >         pyt...@li...
>>> >
>>> > You can reach the person managing the list at
>>> >         pyt...@li...
>>> >
>>> > When replying, please edit your Subject line so it is more specific
>>> > than "Re: Contents of Pytables-users digest..."
>>> >
>>> >
>>> > Today's Topics:
>>> >
>>> >    1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)
>>> >
>>> >
>>> > ----------------------------------------------------------------------
>>> >
>>> > Message: 1
>>> > Date: Thu, 3 Jan 2013 11:11:47 -0600
>>> > From: Anthony Scopatz <sc...@gm...>
>>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
>>> > To: Discussion list for PyTables
>>> >         <pyt...@li...>
>>> > Message-ID:
>>> >         <CAPk-6T5b=
>>> > 1EG...@ma...>
>>> > Content-Type: text/plain; charset="iso-8859-1"
>>> >
>>> > HI David,
>>> >
>>> > Tables and table column iteration have been overhauled fairly recently
>>> [1].
>>> >  So you might try creating two iterators, offset by one, and then
>>> doing the
>>> > comparison.  I am hacking this out super quick so please forgive me:
>>> >
>>> > from itertools import izip
>>> >
>>> > with tb.openFile(...) as f:
>>> >     data = f.root.data
>>> >     data_i = iter(data)
>>> >     data_j = iter(data)
>>> >     data_i.next() # throw the first value away
>>> >     for i, j in izip(data_i, data_j):
>>> >         compare(i, j)
>>> >
>>> > You get the idea ;)
>>> >
>>> > Be Well
>>> > Anthony
>>> >
>>> > 1. https://github.com/PyTables/PyTables/issues/27
>>> >
>>> >
>>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>
>>> wrote:
>>> >
>>> > > I was hoping someone could help me out here.
>>> > >
>>> > > This is from a post I put up on StackOverflow,
>>> > >
>>> > > I am have a fairly large dataset that I store in HDF5 and access
>>> using
>>> > > PyTables. One operation I need to do on this dataset are pairwise
>>> > > comparisons between each of the elements. This requires 2 loops, one
>>> to
>>> > > iterate over each element, and an inner loop to iterate over every
>>> other
>>> > > element. This operation thus looks at N(N-1)/2 comparisons.
>>> > >
>>> > > For fairly small sets I found it to be faster to dump the contents
>>> into a
>>> > > multdimensional numpy array and then do my iteration. I run into
>>> problems
>>> > > with large sets because of memory issues and need to access each
>>> element
>>> > of
>>> > > the dataset at run time.
>>> > >
>>> > > Putting the elements into an array gives me about 600 comparisons per
>>> > > second, while operating on hdf5 data itself gives me about 300
>>> > comparisons
>>> > > per second.
>>> > >
>>> > > Is there a way to speed this process up?
>>> > >
>>> > > Example follows (this is not my real code, just an example):
>>> > >
>>> > > *Small Set*:
>>> > >
>>> > >
>>> > > with tb.openFile(h5_file, 'r') as f:
>>> > >     data = f.root.data
>>> > >
>>> > >     N_elements = len(data)
>>> > >     elements = np.empty((N_irises, 1e5))
>>> > >
>>> > >     for ii, d in enumerate(data):
>>> > >         elements[ii] = data['element']
>>> > >
>>> > > D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
>>> > >     for jj in xrange(ii+1, N_elements):
>>> > >         D[ii, jj] = compare(elements[ii], elements[jj])
>>> > >
>>> > >  *Large Set*:
>>> > >
>>> > >
>>> > > with tb.openFile(h5_file, 'r') as f:
>>> > >     data = f.root.data
>>> > >
>>> > >     N_elements = len(data)
>>> > >
>>> > >     D = np.empty((N_irises, N_irises))
>>> > >     for ii in xrange(N_elements):
>>> > >         for jj in xrange(ii+1, N_elements):
>>> > >              D[ii, jj] = compare(data['element'][ii],
>>> > data['element'][jj])
>>> > >
>>> > >
>>> > >
>>> > >
>>> >
>>> ------------------------------------------------------------------------------
>>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
>>> current
>>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> > > MVPs and experts. ON SALE this month only -- learn more at:
>>> > > http://p.sf.net/sfu/learnmore_122712
>>> > > _______________________________________________
>>> > > Pytables-users mailing list
>>> > > Pyt...@li...
>>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> > >
>>> > >
>>> > -------------- next part --------------
>>> > An HTML attachment was scrubbed...
>>> >
>>> > ------------------------------
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> > MVPs and experts. ON SALE this month only -- learn more at:
>>> > http://p.sf.net/sfu/learnmore_122712
>>> >
>>> > ------------------------------
>>> >
>>> > _______________________________________________
>>> > Pytables-users mailing list
>>> > Pyt...@li...
>>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> >
>>> >
>>> > End of Pytables-users Digest, Vol 80, Issue 2
>>> > *********************************************
>>> >
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Thu, 3 Jan 2013 15:17:01 -0500
>>> From: David Reed <dav...@gm...>
>>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3
>>> To: pyt...@li...
>>> Message-ID:
>>>         <
>>> CAM...@ma...>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> Thanks a lot for the help so far guys!
>>>
>>> Looking at itertools, I found what I believe to be the perfect function
>>> for
>>> what I need, itertools.combinations. This appears to be a valid
>>> replacement
>>> to the method proposed.
>>>
>>> There is a small problem that I didn't mention is that my compare
>>> function
>>> actually takes as inputs 2 columns from the table. Like so:
>>>
>>> D = np.empty((N_irises, N_irises))
>>> for ii in xrange(N_elements):
>>>     for jj in xrange(ii+1, N_elements):
>>>          D[ii, jj] = compare(data['element1'][ii],
>>> data['element1'][jj],data['element2'][ii],
>>> data['element2'][jj])
>>>
>>> Is there an efficient way of using itertools with this structure?
>>>
>>>
>>> On Thu, Jan 3, 2013 at 1:29 PM, <
>>> pyt...@li...> wrote:
>>>
>>> > Send Pytables-users mailing list submissions to
>>> >         pyt...@li...
>>> >
>>> > To subscribe or unsubscribe via the World Wide Web, visit
>>> >         https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> > or, via email, send a message with subject or body 'help' to
>>> >         pyt...@li...
>>> >
>>> > You can reach the person managing the list at
>>> >         pyt...@li...
>>> >
>>> > When replying, please edit your Subject line so it is more specific
>>> > than "Re: Contents of Pytables-users digest..."
>>> >
>>> >
>>> > Today's Topics:
>>> >
>>> >    1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)
>>> >
>>> >
>>> > ----------------------------------------------------------------------
>>> >
>>> > Message: 1
>>> > Date: Thu, 3 Jan 2013 10:29:33 -0800
>>> > From: Josh Ayers <jos...@gm...>
>>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
>>> > To: Discussion list for PyTables
>>> >         <pyt...@li...>
>>> > Message-ID:
>>> >         <
>>> > CAC...@ma...>
>>> > Content-Type: text/plain; charset="iso-8859-1"
>>> >
>>> > David,
>>> >
>>> > The change in issue 27 was only for iteration over a tables.Column
>>> > instance.  To use it, tweak Anthony's code as follows.  This will
>>> iterate
>>> > over the "element" column, as in your original example.
>>> >
>>> > Note also that this will only work with the development version of
>>> PyTables
>>> > available on github.  It will be very slow using the released v2.4.0.
>>> >
>>> >
>>> > from itertools import izip
>>> >
>>> > with tb.openFile(...) as f:
>>> >     data = f.root.data.cols.element
>>> >     data_i = iter(data)
>>> >     data_j = iter(data)
>>> >     data_i.next() # throw the first value away
>>> >     for i, j in izip(data_i, data_j):
>>> >         compare(i, j)
>>> >
>>> >
>>> > Hope that helps,
>>> > Josh
>>> >
>>> >
>>> >
>>> > On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <sc...@gm...>
>>> wrote:
>>> >
>>> > > HI David,
>>> > >
>>> > > Tables and table column iteration have been overhauled fairly
>>> recently
>>> > > [1].  So you might try creating two iterators, offset by one, and
>>> then
>>> > > doing the comparison.  I am hacking this out super quick so please
>>> > forgive
>>> > > me:
>>> > >
>>> > > from itertools import izip
>>> > >
>>> > > with tb.openFile(...) as f:
>>> > >     data = f.root.data
>>> > >     data_i = iter(data)
>>> > >     data_j = iter(data)
>>> > >     data_i.next() # throw the first value away
>>> > >     for i, j in izip(data_i, data_j):
>>> > >         compare(i, j)
>>> > >
>>> > > You get the idea ;)
>>> > >
>>> > > Be Well
>>> > > Anthony
>>> > >
>>> > > 1. https://github.com/PyTables/PyTables/issues/27
>>> > >
>>> > >
>>> > > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <dav...@gm...>
>>> > wrote:
>>> > >
>>> > >> I was hoping someone could help me out here.
>>> > >>
>>> > >> This is from a post I put up on StackOverflow,
>>> > >>
>>> > >> I am have a fairly large dataset that I store in HDF5 and access
>>> using
>>> > >> PyTables. One operation I need to do on this dataset are pairwise
>>> > >> comparisons between each of the elements. This requires 2 loops,
>>> one to
>>> > >> iterate over each element, and an inner loop to iterate over every
>>> other
>>> > >> element. This operation thus looks at N(N-1)/2 comparisons.
>>> > >>
>>> > >> For fairly small sets I found it to be faster to dump the contents
>>> into
>>> > a
>>> > >> multdimensional numpy array and then do my iteration. I run into
>>> > problems
>>> > >> with large sets because of memory issues and need to access each
>>> > element of
>>> > >> the dataset at run time.
>>> > >>
>>> > >> Putting the elements into an array gives me about 600 comparisons
>>> per
>>> > >> second, while operating on hdf5 data itself gives me about 300
>>> > comparisons
>>> > >> per second.
>>> > >>
>>> > >> Is there a way to speed this process up?
>>> > >>
>>> > >> Example follows (this is not my real code, just an example):
>>> > >>
>>> > >> *Small Set*:
>>> > >>
>>> > >>
>>> > >> with tb.openFile(h5_file, 'r') as f:
>>> > >>     data = f.root.data
>>> > >>
>>> > >>     N_elements = len(data)
>>> > >>     elements = np.empty((N_irises, 1e5))
>>> > >>
>>> > >>     for ii, d in enumerate(data):
>>> > >>         elements[ii] = data['element']
>>> > >>
>>> > >> D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
>>> > >>     for jj in xrange(ii+1, N_elements):
>>> > >>         D[ii, jj] = compare(elements[ii], elements[jj])
>>> > >>
>>> > >>  *Large Set*:
>>> > >>
>>> > >>
>>> > >> with tb.openFile(h5_file, 'r') as f:
>>> > >>     data = f.root.data
>>> > >>
>>> > >>     N_elements = len(data)
>>> > >>
>>> > >>     D = np.empty((N_irises, N_irises))
>>> > >>     for ii in xrange(N_elements):
>>> > >>         for jj in xrange(ii+1, N_elements):
>>> > >>              D[ii, jj] = compare(data['element'][ii],
>>> > data['element'][jj])
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> >
>>> ------------------------------------------------------------------------------
>>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5,
>>> CSS,
>>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
>>> current
>>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> > >> MVPs and experts. ON SALE this month only -- learn more at:
>>> > >> http://p.sf.net/sfu/learnmore_122712
>>> > >> _______________________________________________
>>> > >> Pytables-users mailing list
>>> > >> Pyt...@li...
>>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> > >>
>>> > >>
>>> > >
>>> > >
>>> > >
>>> >
>>> ------------------------------------------------------------------------------
>>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
>>> current
>>> > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> > > MVPs and experts. ON SALE this month only -- learn more at:
>>> > > http://p.sf.net/sfu/learnmore_122712
>>> > > _______________________________________________
>>> > > Pytables-users mailing list
>>> > > Pyt...@li...
>>> > > https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> > >
>>> > >
>>> > -------------- next part --------------
>>> > An HTML attachment was scrubbed...
>>> >
>>> > ------------------------------
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> > MVPs and experts. ON SALE this month only -- learn more at:
>>> > http://p.sf.net/sfu/learnmore_122712
>>> >
>>> > ------------------------------
>>> >
>>> > _______________________________________________
>>> > Pytables-users mailing list
>>> > Pyt...@li...
>>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> >
>>> >
>>> > End of Pytables-users Digest, Vol 80, Issue 3
>>> > *********************************************
>>> >
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>>
>>> ------------------------------
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> MVPs and experts. ON SALE this month only -- learn more at:
>>> http://p.sf.net/sfu/learnmore_122712
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>> End of Pytables-users Digest, Vol 80, Issue 4
>>> *********************************************
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. ON SALE this month only -- learn more at:
>> http://p.sf.net/sfu/learnmore_122712
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>