You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Thadeus B. <tha...@th...> - 2013-03-07 23:27:22
|
I have a PyTables file that receives many appends to a Table throughout the day, the file is opened, a small bit of data is appended, and the file is closed. The open/append/close can happen many times in a minute. Anywhere from 1-500 rows are appended at any given time. By the end of the day, this file is expected to have roughly 66000 rows. Chunkshape is set to 1500 for no particular reason (doesn't seem to make a difference, and some other files can be 5 million/day). BLOSC with lvl 9 compression is used on the table. Data is never deleted from the table. There are roughly 12 columns on the Table. The problem is that at the end of the day this file is 1GB in size. I don't understand why the file is growing so big. The tbl.size_on_disk shows a meager 20MB. I have used ptrepack with --keep-source-filters and --chunkshape=keep. The new file is only 30MB in size which is reasonable. I have also used ptrepack with --chunkshape=auto and although it set the chunkshape to around 388, there was no significant change in filesize from chunkshape of 1500. Is pytables not re-using chunks on new appends. When 50 rows are appended, is it still writing a chunk sized for 1500 rows. When the next append comes along, it writes a brand new chunk instead of opening the old chunk and appending the data? Should my chunksize really be "expected rows to append each time" instead of "expected total rows"? -- Thadeus |
From: Anthony S. <sc...@gm...> - 2013-03-07 16:52:26
|
Hey Tim, Awesome dataset! And neat image! As per your request, a couple of minor things I noticed were that you probably don't need to do the sanity check each time (great for debugging, but not needed always), you are using masked arrays which while sometimes convenient are generally slower than creating an array, a mask and applying the mask to the array, and you seem to be downcasting from float64 to float32 for some reason that I am not entirely clear on (size, speed?). To the more major question of write performance, one thing that you could try is compression<http://pytables.github.com/usersguide/optimization.html#compression-issues>. You might want to do some timing studies to find the best compressor and level. Performance here can vary a lot based on how similar your data is (and how close similar data is to each other). If you have got a bunch of zeros and only a few real data points, even zlib 1 is going to be blazing fast compared to writing all those zeros out explicitly. Another thing you could try doing is switching to EArray and using the append() method. This might save PyTables, numpy, hdf5, etc from having to check that the shape of "sst_node[qual_indices]" is actually the same as the data you are giving it. Additionally dumping a block of memory to the file directly (via append()) is generally faster than having to resolve fancy indexes (which are notoriously the slow part of even numpy). Lastly, as a general comment, you seem to be doing a lot of stuff in the inner most loop -- including writing to disk. I would look at how you could restructure this to move as much as possible out of this loop. Your data seems to be about 12 GB for a year, so this is probably too big to build up the full sst array completely in memory prior to writing. That is, unless you have a computer much bigger than my laptop ;). But issuing one fat write command is probably going to be faster than making 365 of them. Happy hacking! Be Well Anthony On Wed, Mar 6, 2013 at 11:25 PM, Tim Burgess <tim...@ma...> wrote: > I'm producing a large chunked HDF5 using CArray and want to clarify that > the performance I'm getting is what would normally be expected. > > The source data is a large annual satellite dataset - 365 days x 4320 > latitiude by 8640 longitude of 32bit floats. I'm only interested in pixels > of a certain quality so I am iterating over the source data (which is in > daily files) and then determining the indices of all quality pixels in that > day. There are usually about 2 million quality pixels in a day. > > I then set the equivalent CArray locations to the value of the quality > pixels. As you can see in the code below, the source numpy array is a 1 x > 4320 x 8640. So for addressing the CArray, I simply take the first index > and set it to the current day to map indices to the 365 x 4320 x 8640 > CArray. > > I've tried a couple of different chunkshapes. As I will be reading the HDF > sequentially day by day and as the data comes from a polar-orbit, I'm using > a 1 x 1080 x 240 chunk to try and optimize for chunks that will have no > data (and therefore reduce the total filesize). You can see an image of an > example day at > > > http://data.nodc.noaa.gov/pathfinder/Version5.2/browse_images/2011/sea_surface_temperature/20110101001735-NODC-L3C_GHRSST-SSTskin-AVHRR_Pathfinder-PFV5.2_NOAA19_G_2011001_night-v02.0-fv01.0-sea_surface_temperature.png > > > To produce a day takes about 2.5 minutes on a Linux (Ubuntu 12.04) machine > with two SSDs in RAID 0. The system has 64GB of RAM but I don't think > memory is a constraint here. > Looking at a profile, most of that 2.5 minutes is spent in _g_writeCoords > in tables.hdf5Extension.Array > > Here's the pertinent code: > > for year in range(2011, 2012): > > # create dataset and add global attrs > annualfile_path = > '%sPF4km/V5.2/hdf/annual/PF52-%d-c1080x240-test.h5' % (crwdir, year) > print 'Creating ' + annualfile_path > > > with tables.openFile(annualfile_path, 'w', title=('Pathfinder V5.2 > %d' % year)) as h5f: > > # write lat lons > lat_node = h5f.createArray('/', 'lat', lats, title='latitude') > lon_node = h5f.createArray('/', 'lon', lons, title='longitude') > > > # glob all the region summaries in a year > files = [glob.glob('%sPF4km/V5.2/%d/*night*' % (crwdir, > year))[0]] > print 'Found %d days' % len(files) > files.sort() > > > # create a 365 x 4320 x 8640 array > shape = (NUMDAYS, 4320, 8640) > atom = tables.Float32Atom(dflt=np.nan) > # we chunk into daily slices and then further chunk days > sst_node = h5f.createCArray(h5f.root, 'sst', atom, shape, > chunkshape=(1, 1080, 240)) > > > for filename in files: > > # get day > day = int(filename[-25:-22]) > print 'Processing %d day %d' % (year, day) > > ds = Dataset(filename) > kelvin64 = ds.variables['sea_surface_temperature'][:] > qual = ds.variables['quality_level'][:] > ds.close() > # convert sst to single precision with nan as mask > kelvin32 = np.ma.filled(kelvin64, > fill_value=np.nan).astype(np.float32) > sst = kelvin32 - 273.15 > > # find all quality >4 locations > qual_indices = np.where(np.ma.filled(qual) >= 4) > print 'Found %d quality pixels' % len(qual_indices[0]) > > # qual_indices is actually a 3D index. so set first sst > quality index > # to match the current day and write to sst_node > qual_indices[0].flags.writeable = True > qual_indices[0][:] = day > > sst_node[qual_indices] = > sst[0,qual_indices[1],qual_indices[2]] > > # sanity check that max values are the same in sst_node as > source sst data > print 'sst max %4.1f node max %4.1f' % > (np.nanmax(sst[0,qual_indices[1],qual_indices[2]]), > np.nanmax(sst_node[day])) > > Would value any comments on this :-) > Thanks, > > Tim Burgess > > > > ------------------------------------------------------------------------------ > Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester > Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the > endpoint security space. For insight on selecting the right partner to > tackle endpoint security challenges, access the full report. > http://p.sf.net/sfu/symantec-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Tim B. <tim...@ma...> - 2013-03-07 05:26:20
|
I'm producing a large chunked HDF5 using CArray and want to clarify that the performance I'm getting is what would normally be expected. The source data is a large annual satellite dataset - 365 days x 4320 latitiude by 8640 longitude of 32bit floats. I'm only interested in pixels of a certain quality so I am iterating over the source data (which is in daily files) and then determining the indices of all quality pixels in that day. There are usually about 2 million quality pixels in a day. I then set the equivalent CArray locations to the value of the quality pixels. As you can see in the code below, the source numpy array is a 1 x 4320 x 8640. So for addressing the CArray, I simply take the first index and set it to the current day to map indices to the 365 x 4320 x 8640 CArray. I've tried a couple of different chunkshapes. As I will be reading the HDF sequentially day by day and as the data comes from a polar-orbit, I'm using a 1 x 1080 x 240 chunk to try and optimize for chunks that will have no data (and therefore reduce the total filesize). You can see an image of an example day at http://data.nodc.noaa.gov/pathfinder/Version5.2/browse_images/2011/sea_surface_temperature/20110101001735-NODC-L3C_GHRSST-SSTskin-AVHRR_Pathfinder-PFV5.2_NOAA19_G_2011001_night-v02.0-fv01.0-sea_surface_temperature.png To produce a day takes about 2.5 minutes on a Linux (Ubuntu 12.04) machine with two SSDs in RAID 0. The system has 64GB of RAM but I don't think memory is a constraint here. Looking at a profile, most of that 2.5 minutes is spent in _g_writeCoords in tables.hdf5Extension.Array Here's the pertinent code: for year in range(2011, 2012): # create dataset and add global attrs annualfile_path = '%sPF4km/V5.2/hdf/annual/PF52-%d-c1080x240-test.h5' % (crwdir, year) print 'Creating ' + annualfile_path with tables.openFile(annualfile_path, 'w', title=('Pathfinder V5.2 %d' % year)) as h5f: # write lat lons lat_node = h5f.createArray('/', 'lat', lats, title='latitude') lon_node = h5f.createArray('/', 'lon', lons, title='longitude') # glob all the region summaries in a year files = [glob.glob('%sPF4km/V5.2/%d/*night*' % (crwdir, year))[0]] print 'Found %d days' % len(files) files.sort() # create a 365 x 4320 x 8640 array shape = (NUMDAYS, 4320, 8640) atom = tables.Float32Atom(dflt=np.nan) # we chunk into daily slices and then further chunk days sst_node = h5f.createCArray(h5f.root, 'sst', atom, shape, chunkshape=(1, 1080, 240)) for filename in files: # get day day = int(filename[-25:-22]) print 'Processing %d day %d' % (year, day) ds = Dataset(filename) kelvin64 = ds.variables['sea_surface_temperature'][:] qual = ds.variables['quality_level'][:] ds.close() # convert sst to single precision with nan as mask kelvin32 = np.ma.filled(kelvin64, fill_value=np.nan).astype(np.float32) sst = kelvin32 - 273.15 # find all quality >4 locations qual_indices = np.where(np.ma.filled(qual) >= 4) print 'Found %d quality pixels' % len(qual_indices[0]) # qual_indices is actually a 3D index. so set first sst quality index # to match the current day and write to sst_node qual_indices[0].flags.writeable = True qual_indices[0][:] = day sst_node[qual_indices] = sst[0,qual_indices[1],qual_indices[2]] # sanity check that max values are the same in sst_node as source sst data print 'sst max %4.1f node max %4.1f' % (np.nanmax(sst[0,qual_indices[1],qual_indices[2]]), np.nanmax(sst_node[day])) Would value any comments on this :-) Thanks, Tim Burgess |
From: Francesc A. <fa...@gm...> - 2013-03-01 21:46:22
|
Yes, if the fletcher32 filter is used, it is always verified during reads. I never experienced a failure myself, but a look at the code: http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.1/src/unpacked/src/H5Zfletcher32.c is enough to see that this is the case. Francesc On 2/27/13 9:35 PM, Anthony Scopatz wrote: > Sorry, I don't know. I never have used this feature. Maybe someone > who has can chime in. > > > On Wed, Feb 27, 2013 at 2:26 PM, Frédéric Bastien <no...@no... > <mailto:no...@no...>> wrote: > > That is fine with me. I just want to detect if my data got corrupted > by hardware problems. > > Do someone know if it always get verified? Do you know if this cause > significant speed difference? > > thanks > > Frédéric > > On Wed, Feb 27, 2013 at 3:21 PM, Anthony Scopatz > <sc...@gm... <mailto:sc...@gm...>> wrote: > > I think that the checksum is on the compressed data... > > > > > > On Wed, Feb 27, 2013 at 2:16 PM, Frédéric Bastien > <no...@no... <mailto:no...@no...>> wrote: > >> > >> Hi, > >> > >> we just got some problem with our file server and this bring me > >> question on how to detect corrupted files. > >> > >> There is a way to specify a filter when creating a table that add a > >> checksum[1]. > >> > >> My questions is, when a file is created with checksum, are they > always > >> verified when the chunks are uncompressed? Can we specify when > we open > >> the file if we want to check it or not? The examples I found > only talk > >> about it when we create the file. > >> > >> thanks > >> > >> Frédéric Bastien > >> > >> > >> [1] > http://pytables.github.com/usersguide/libref/helper_classes.html > >> > >> > >> > ------------------------------------------------------------------------------ > >> Everyone hates slow websites. So do we. > >> Make your web apps faster with AppDynamics > >> Download AppDynamics Lite for free today: > >> http://p.sf.net/sfu/appdyn_d2d_feb > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > <mailto:Pyt...@li...> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_feb > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > <mailto:Pyt...@li...> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2013-02-28 01:38:09
|
On Wed, Feb 27, 2013 at 2:24 PM, David Reed <dav...@gm...> wrote: > Thanks for getting back Anthony, > > So I originally published to this list looking for an efficient way of > doing pairwise comparisons on an HDF5 table that had only about 700 > elements. You guys sorta guided me in the direction of itertools while > also notifying me of a bug fix that was recently pushed which had more > efficient iteration. > > This worked great! and really sped up my comparisons, and I was flying > high for quite awhile. Things started breaking though when I upped the # > of elements to about 5000. I gave some code that created some sim data and > you were getting the same error on your machine. I put this code up as > Gist here: https://gist.github.com/dvreed77/fa3060b18257008df383 > > Again, if you can think of any thing, I'll try to do the leg work as best > as I can. > Ahh. Thanks for the reminder David. One thing I thought of was to maybe change the table chunkshape. I tried setting this to (50,) in the createTable() call, but that was clearly too low of a value. The problem for me seem that byte sizes are far too low. I am seeing the following traceback scopatz@ares ~/Downloads $ python tbl_error.py 10669890 Comparisons Traceback (most recent call last): File "tbl_error.py", line 63, in <module> get_hd() File "tbl_error.py", line 55, in get_hd print c.next() File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 3308, in __iter__ out=buf_slice) File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 1807, in read arr = self._read(start, stop, step, field, out) File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 1732, in _read bytes_required)) ValueError: output array size invalid, got 4620 bytes, need 753984000 bytes This problem is being caused by the fact that the dtype in the __iter__() method on line 3308 of table.py is NOT reading in the shape properly for some reason. Instead of interpreting masks1 as a 17x20*480 column of bools, it is interpreting it as a scalar column of bools. Unfortunately, I don't have time to look into how to fix it. Hopefully, you can! Be Well Anthony > > On Wed, Feb 27, 2013 at 3:06 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 11 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 27 Feb 2013 14:05:38 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 11 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi David, >> >> Sorry about the delay. I have mostly forgotten what exactly this issue >> was. I am pretty swamped this week so I could throw out some WAGs but I >> don't think I'll be able to do any real work myself on it. >> >> Be Well >> Anthony >> >> >> On Mon, Feb 25, 2013 at 2:15 PM, David Reed <dav...@gm...> >> wrote: >> >> > Anthony, >> > >> > I've had a chance recently to revisit this problem and am not getting >> > anywhere. I was hoping I might be able to get more support in getting >> this >> > working. If you have some ideas, through them out and I can do the leg >> > work and see what I can come up with. >> > >> > -David >> > >> > >> > On Mon, Feb 4, 2013 at 3:44 PM, < >> > pyt...@li...> wrote: >> > >> >> Send Pytables-users mailing list submissions to >> >> pyt...@li... >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> or, via email, send a message with subject or body 'help' to >> >> pyt...@li... >> >> >> >> You can reach the person managing the list at >> >> pyt...@li... >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 9 (Anthony Scopatz) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Mon, 4 Feb 2013 14:43:37 -0600 >> >> From: Anthony Scopatz <sc...@gm...> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 9 >> >> To: Discussion list for PyTables >> >> <pyt...@li...> >> >> Message-ID: >> >> < >> >> CAP...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> Hey David, >> >> >> >> I am getting the following error now: >> >> >> >> scopatz@ares ~ $ python t.py >> >> 10669890 Comparisons >> >> Traceback (most recent call last): >> >> File "t.py", line 61, in <module> >> >> get_hd() >> >> File "t.py", line 54, in get_hd >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, >> masks, >> >> range(N_irises)), 2): >> >> File >> "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> >> line 3308, in __iter__ >> >> out=buf_slice) >> >> File >> "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> >> line 1807, in read >> >> arr = self._read(start, stop, step, field, out) >> >> File >> "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> >> line 1732, in _read >> >> bytes_required)) >> >> ValueError: output array size invalid, got 4620 bytes, need 753984000 >> >> bytes >> >> >> >> And I had to change the phasors line to ths following: >> >> >> >> r['phasors'] = np.empty((17, 20*240), complex) >> >> >> >> Thanks. >> >> Be Well >> >> Anthony >> >> >> >> >> >> >> >> On Mon, Feb 4, 2013 at 1:41 PM, David Reed <dav...@gm...> >> >> wrote: >> >> >> >> > I didn't have any luck. I replaced that __iter__ function which led >> to >> >> me >> >> > replacing the read function which lead to me replaceing the _read >> >> function >> >> > and I eventually got another error. >> >> > >> >> > Below are 2 functions and my HDF5 Table class declaration. They >> should >> >> be >> >> > self explanatory. I wasn't sure if attachments would go through and >> >> this >> >> > is pretty small, so I figured it would be ok just to post. I >> apologize >> >> if >> >> > this is a bit cluttered. I would also appreciate any comments on >> how I >> >> > assign the results to the matrix D, this does not seem very pythonic >> at >> >> all >> >> > and could use some advice there if its easy. (the ii*jj is just a >> place >> >> > holder for a more sophisticated measure). Thanks again! >> >> > >> >> > import numpy as np >> >> > import tables as tb >> >> > >> >> > class Iris(tb.IsDescription): >> >> > subject_id = tb.IntCol() >> >> > iris_id = tb.IntCol() >> >> > database = tb.StringCol(5) >> >> > is_left = tb.BoolCol() >> >> > is_flipped = tb.BoolCol() >> >> > templates = tb.BoolCol(shape=(17, 20*480)) >> >> > masks1 = tb.BoolCol(shape=(17, 20*480)) >> >> > phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) >> >> > masks2 = tb.BoolCol(shape=(17, 20*240)) >> >> > >> >> > >> >> > def create_hdf5(): >> >> > """ >> >> > """ >> >> > with tb.openFile('test.h5', 'w') as f: >> >> > >> >> > # Create and fill the table of irises", >> >> > irises = f.createTable(f.root, 'irises', Iris, 'Irises', >> >> > filters=tb.Filters(1)) >> >> > for ii in range(4620): >> >> > >> >> > r = irises.row >> >> > r['subject_id'] = ii >> >> > r['iris_id'] = 0 >> >> > r['database'] = 'test' >> >> > r['is_left'] = True >> >> > r['is_flipped'] = False >> >> > r['templates'] = np.empty((17, 20*480), np.bool8) >> >> > r['masks1'] = np.empty((17, 20*480), np.bool8) >> >> > r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) >> >> > r['masks2'] = np.empty((17, 20*240), np.bool8) >> >> > r.append() >> >> > >> >> > irises.flush() >> >> > >> >> > def get_hd(): >> >> > """ >> >> > """ >> >> > from itertools import combinations, izip >> >> > with tb.openFile('test.h5') as f: >> >> > irises = f.root.irises >> >> > >> >> > templates = f.root.irises.cols.templates >> >> > masks = f.root.irises.cols.masks1 >> >> > >> >> > N_irises = len(irises) >> >> > >> >> > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> > D = np.empty((N_irises, N_irises)) >> >> > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, >> >> > range(N_irises)), 2): >> >> > D[ii, jj] = ii*jj >> >> > >> >> > np.save('test', D) >> >> > >> >> > >> >> > >> >> > >> >> > On Mon, Feb 4, 2013 at 11:16 AM, < >> >> > pyt...@li...> wrote: >> >> > >> >> >> Send Pytables-users mailing list submissions to >> >> >> pyt...@li... >> >> >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> or, via email, send a message with subject or body 'help' to >> >> >> pyt...@li... >> >> >> >> >> >> You can reach the person managing the list at >> >> >> pyt...@li... >> >> >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> >> >> >> Today's Topics: >> >> >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) >> >> >> >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> >> >> Message: 1 >> >> >> Date: Mon, 4 Feb 2013 10:16:24 -0600 >> >> >> From: Anthony Scopatz <sc...@gm...> >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 >> >> >> To: Discussion list for PyTables >> >> >> <pyt...@li...> >> >> >> Message-ID: >> >> >> < >> >> >> CAP...@ma...> >> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >> >> On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> >> >> >> wrote: >> >> >> >> >> >> > Hi Josh, >> >> >> > >> >> >> > Here is my __iter__ code: >> >> >> > >> >> >> > def __iter__(self): >> >> >> > table = self.table >> >> >> > itemsize = self.dtype.itemsize >> >> >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >> >> itemsize >> >> >> > max_row = len(self) >> >> >> > for start_row in xrange(0, len(self), nrowsinbuf): >> >> >> > end_row = min([start_row + nrowsinbuf, max_row]) >> >> >> > buf = table.read(start_row, end_row, 1, >> >> field=self.pathname) >> >> >> > for row in buf: >> >> >> > yield row >> >> >> > >> >> >> > It does look different, I will try swapping in the code from >> github >> >> and >> >> >> > see what happens. >> >> >> > >> >> >> >> >> >> Yes, please let us know how that goes! Otherwise send the list both >> >> the >> >> >> test data generator script and the script that fails. >> >> >> >> >> >> Be Well >> >> >> Anthony >> >> >> >> >> >> >> >> >> > >> >> >> > >> >> >> > On Mon, Feb 4, 2013 at 9:59 AM, < >> >> >> > pyt...@li...> wrote: >> >> >> > >> >> >> >> Send Pytables-users mailing list submissions to >> >> >> >> pyt...@li... >> >> >> >> >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> or, via email, send a message with subject or body 'help' to >> >> >> >> pyt...@li... >> >> >> >> >> >> >> >> You can reach the person managing the list at >> >> >> >> pyt...@li... >> >> >> >> >> >> >> >> When replying, please edit your Subject line so it is more >> specific >> >> >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> >> >> >> >> >> >> Today's Topics: >> >> >> >> >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) >> >> >> >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) >> >> >> >> >> >> >> >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> >> >> >> >> Message: 1 >> >> >> >> Date: Fri, 1 Feb 2013 14:08:47 -0800 >> >> >> >> From: Josh Ayers <jos...@gm...> >> >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> Issue 4 >> >> >> >> To: Discussion list for PyTables >> >> >> >> <pyt...@li...> >> >> >> >> Message-ID: >> >> >> >> <CACOB4aPG4NZ6b2a3v= >> >> >> >> 1Ue...@ma...> >> >> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >> >> >> >> David, >> >> >> >> >> >> >> >> You added a custom version of table.Column.__iter__, correct? >> Could >> >> >> you >> >> >> >> also include that along with the script to reproduce the error? >> >> >> >> >> >> >> >> It seems like the problem may be in the 'nrowsinbuf' calculation >> - >> >> see >> >> >> >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're >> >> using >> >> >> the >> >> >> >> default 1MB value for IO_BUFFER_SIZE, it should be reading in >> rows >> >> of 6 >> >> >> >> chunks. Instead, it's reading the entire table. >> >> >> >> >> >> >> >> [1]: >> >> >> >> >> >> >> >> >> >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz < >> sc...@gm...> >> >> >> >> wrote: >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < >> >> dav...@gm...> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> >> at the error: >> >> >> >> >> >> >> >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> >> >> >> >> >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> >> >> >> >> >> >> >> I'm not sure what that means as a dtype, but thats what it is. >> >> >> >> >> >> >> >> >> >> Forgive me if I'm being totally naive, but I thought the whole >> >> >> point of >> >> >> >> >> __iter__ with pyttables was to do iteration on the fly, so >> there >> >> is >> >> >> no >> >> >> >> >> preallocation. >> >> >> >> >> >> >> >> >> > >> >> >> >> > Nope you are not being naive at all. That is the point. >> >> >> >> > >> >> >> >> > >> >> >> >> >> If you have any ideas on this I'm all ears. >> >> >> >> >> >> >> >> >> > >> >> >> >> > If you could send a minimal script which reproduces this error, >> >> that >> >> >> >> would >> >> >> >> > help a lot. >> >> >> >> > >> >> >> >> > Be Well >> >> >> >> > Anthony >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Thanks again. >> >> >> >> >> >> >> >> >> >> Dave >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < >> >> >> >> >> pyt...@li...> wrote: >> >> >> >> >> >> >> >> >> >>> Send Pytables-users mailing list submissions to >> >> >> >> >>> pyt...@li... >> >> >> >> >>> >> >> >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> >>> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >>> or, via email, send a message with subject or body 'help' to >> >> >> >> >>> pyt...@li... >> >> >> >> >>> >> >> >> >> >>> You can reach the person managing the list at >> >> >> >> >>> pyt...@li... >> >> >> >> >>> >> >> >> >> >>> When replying, please edit your Subject line so it is more >> >> specific >> >> >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> Today's Topics: >> >> >> >> >>> >> >> >> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony >> >> Scopatz) >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> >>> >> >> >> >> >>> Message: 1 >> >> >> >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >> >> >> >>> From: Anthony Scopatz <sc...@gm...> >> >> >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> >> Issue >> >> >> 2 >> >> >> >> >>> To: Discussion list for PyTables >> >> >> >> >>> <pyt...@li...> >> >> >> >> >>> Message-ID: >> >> >> >> >>> < >> >> >> >> >>> >> >> CAP...@ma... >> >> >> > >> >> >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >>> >> >> >> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < >> >> >> dav...@gm...> >> >> >> >> >>> wrote: >> >> >> >> >>> >> >> >> >> >>> > Hi Anthony, >> >> >> >> >>> > >> >> >> >> >>> > Thanks for the reply. >> >> >> >> >>> > >> >> >> >> >>> > I honestly don't know how to monitor my Python memory >> usage, >> >> but >> >> >> I'm >> >> >> >> >>> sure >> >> >> >> >>> > that its caused by out of memory. >> >> >> >> >>> > >> >> >> >> >>> >> >> >> >> >>> Well, I would just run top or process monitor or something >> while >> >> >> >> running >> >> >> >> >>> the python script to see what happens to memory usage as the >> >> script >> >> >> >> chugs >> >> >> >> >>> along... >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> > I'm just trying to find out how to fix it. My HDF5 table >> has >> >> >> 4620 >> >> >> >> >>> rows >> >> >> >> >>> > and the column I'm iterating over is a 17x9600 boolean >> matrix. >> >> >> The >> >> >> >> >>> > __iter__ method is preallocating an array that is this size >> >> which >> >> >> >> >>> appears >> >> >> >> >>> > to be root of the error. I was hoping there is a fix >> >> somewhere >> >> >> in >> >> >> >> >>> here to >> >> >> >> >>> > not have to do this preallocation. >> >> >> >> >>> > >> >> >> >> >>> >> >> >> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. >> >> >> 4620 of >> >> >> >> >>> these is ~760 MB. If you have 2 GB of memory and you are >> >> iterating >> >> >> >> over >> >> >> >> >>> 2 >> >> >> >> >>> of these (templates & masks) it is conceivable that you are >> just >> >> >> >> running >> >> >> >> >>> out of memory. Maybe there is a way that __iter__ could not >> >> >> >> preallocate >> >> >> >> >>> something that is basically a temporary. What is the dtype >> of >> >> the >> >> >> >> >>> templates array? >> >> >> >> >>> >> >> >> >> >>> Be Well >> >> >> >> >>> Anthony >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> > >> >> >> >> >>> > Thanks again. >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> -------------- next part -------------- >> >> >> >> An HTML attachment was scrubbed... >> >> >> >> >> >> >> >> ------------------------------ >> >> >> >> >> >> >> >> Message: 2 >> >> >> >> Date: Mon, 4 Feb 2013 09:58:53 -0500 >> >> >> >> From: David Reed <dav...@gm...> >> >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> Issue 6 >> >> >> >> To: pyt...@li... >> >> >> >> Message-ID: >> >> >> >> <CAM6XA7= >> >> >> >> h50...@ma...> >> >> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >> >> >> >> Hi Anthony, >> >> >> >> >> >> >> >> Sorry to just get back to you. I can send a script, should I >> send a >> >> >> script >> >> >> >> that creates some fake data as well? >> >> >> >> >> >> >> >> -Dave >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 4:50 PM, < >> >> >> >> pyt...@li...> wrote: >> >> >> >> >> >> >> >> > Send Pytables-users mailing list submissions to >> >> >> >> > pyt...@li... >> >> >> >> > >> >> >> >> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > or, via email, send a message with subject or body 'help' to >> >> >> >> > pyt...@li... >> >> >> >> > >> >> >> >> > You can reach the person managing the list at >> >> >> >> > pyt...@li... >> >> >> >> > >> >> >> >> > When replying, please edit your Subject line so it is more >> >> specific >> >> >> >> > than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >> >> > >> >> >> >> > Today's Topics: >> >> >> >> > >> >> >> >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony >> Scopatz) >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >> >> > Message: 1 >> >> >> >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 >> >> >> >> > From: Anthony Scopatz <sc...@gm...> >> >> >> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> >> Issue 4 >> >> >> >> > To: Discussion list for PyTables >> >> >> >> > <pyt...@li...> >> >> >> >> > Message-ID: >> >> >> >> > < >> >> >> >> > >> >> CAP...@ma...> >> >> >> >> > Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < >> >> dav...@gm...> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> > > at the error: >> >> >> >> > > >> >> >> >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> >> > > >> >> >> >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> >> > > >> >> >> >> > > I'm not sure what that means as a dtype, but thats what it >> is. >> >> >> >> > > >> >> >> >> > > Forgive me if I'm being totally naive, but I thought the >> whole >> >> >> point >> >> >> >> of >> >> >> >> > > __iter__ with pyttables was to do iteration on the fly, so >> there >> >> >> is no >> >> >> >> > > preallocation. >> >> >> >> > > >> >> >> >> > >> >> >> >> > Nope you are not being naive at all. That is the point. >> >> >> >> > >> >> >> >> > >> >> >> >> > > If you have any ideas on this I'm all ears. >> >> >> >> > > >> >> >> >> > >> >> >> >> > If you could send a minimal script which reproduces this error, >> >> that >> >> >> >> would >> >> >> >> > help a lot. >> >> >> >> > >> >> >> >> > Be Well >> >> >> >> > Anthony >> >> >> >> > >> >> >> >> > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > Thanks again. >> >> >> >> > > >> >> >> >> > > Dave >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > On Fri, Feb 1, 2013 at 3:45 PM, < >> >> >> >> > > pyt...@li...> wrote: >> >> >> >> > > >> >> >> >> > >> Send Pytables-users mailing list submissions to >> >> >> >> > >> pyt...@li... >> >> >> >> > >> >> >> >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> > >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> or, via email, send a message with subject or body 'help' to >> >> >> >> > >> pyt...@li... >> >> >> >> > >> >> >> >> >> > >> You can reach the person managing the list at >> >> >> >> > >> pyt...@li... >> >> >> >> > >> >> >> >> >> > >> When replying, please edit your Subject line so it is more >> >> >> specific >> >> >> >> > >> than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> Today's Topics: >> >> >> >> > >> >> >> >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony >> >> Scopatz) >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >> >> >> > >> Message: 1 >> >> >> >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >> >> >> > >> From: Anthony Scopatz <sc...@gm...> >> >> >> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> >> >> Issue 2 >> >> >> >> > >> To: Discussion list for PyTables >> >> >> >> > >> <pyt...@li...> >> >> >> >> > >> Message-ID: >> >> >> >> > >> < >> >> >> >> > >> >> >> >> CAP...@ma...> >> >> >> >> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >> >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < >> >> >> dav...@gm...> >> >> >> >> > >> wrote: >> >> >> >> > >> >> >> >> >> > >> > Hi Anthony, >> >> >> >> > >> > >> >> >> >> > >> > Thanks for the reply. >> >> >> >> > >> > >> >> >> >> > >> > I honestly don't know how to monitor my Python memory >> usage, >> >> but >> >> >> >> I'm >> >> >> >> > >> sure >> >> >> >> > >> > that its caused by out of memory. >> >> >> >> > >> > >> >> >> >> > >> >> >> >> >> > >> Well, I would just run top or process monitor or something >> >> while >> >> >> >> running >> >> >> >> > >> the python script to see what happens to memory usage as the >> >> >> script >> >> >> >> > chugs >> >> >> >> > >> along... >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table >> >> has >> >> >> 4620 >> >> >> >> > rows >> >> >> >> > >> > and the column I'm iterating over is a 17x9600 boolean >> >> matrix. >> >> >> The >> >> >> >> > >> > __iter__ method is preallocating an array that is this >> size >> >> >> which >> >> >> >> > >> appears >> >> >> >> > >> > to be root of the error. I was hoping there is a fix >> >> somewhere >> >> >> in >> >> >> >> > here >> >> >> >> > >> to >> >> >> >> > >> > not have to do this preallocation. >> >> >> >> > >> > >> >> >> >> > >> >> >> >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in >> space. >> >> >> 4620 >> >> >> >> of >> >> >> >> > >> these is ~760 MB. If you have 2 GB of memory and you are >> >> >> iterating >> >> >> >> > over 2 >> >> >> >> > >> of these (templates & masks) it is conceivable that you are >> >> just >> >> >> >> running >> >> >> >> > >> out of memory. Maybe there is a way that __iter__ could not >> >> >> >> preallocate >> >> >> >> > >> something that is basically a temporary. What is the dtype >> of >> >> the >> >> >> >> > >> templates array? >> >> >> >> > >> >> >> >> >> > >> Be Well >> >> >> >> > >> Anthony >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> > >> >> >> >> > >> > Thanks again. >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < >> >> >> >> > >> > pyt...@li...> wrote: >> >> >> >> > >> > >> >> >> >> > >> >> Send Pytables-users mailing list submissions to >> >> >> >> > >> >> pyt...@li... >> >> >> >> > >> >> >> >> >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >> > >> >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> or, via email, send a message with subject or body >> 'help' to >> >> >> >> > >> >> pyt...@li... >> >> >> >> > >> >> >> >> >> >> > >> >> You can reach the person managing the list at >> >> >> >> > >> >> pyt...@li... >> >> >> >> > >> >> >> >> >> >> > >> >> When replying, please edit your Subject line so it is >> more >> >> >> >> specific >> >> >> >> > >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> Today's Topics: >> >> >> >> > >> >> >> >> >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony >> >> >> Scopatz) >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >> >> >> >> > >> >> Message: 1 >> >> >> >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> >> >> >> > >> >> From: Anthony Scopatz <sc...@gm...> >> >> >> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> 80, >> >> >> >> Issue 9 >> >> >> >> > >> >> To: Discussion list for PyTables >> >> >> >> > >> >> <pyt...@li...> >> >> >> >> > >> >> Message-ID: >> >> >> >> > >> >> < >> >> >> >> > >> >> >> >> >> >> >> CAP...@ma...> >> >> >> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >> >> >> >> > >> >> Hi David, >> >> >> >> > >> >> >> >> >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem >> to be >> >> >> >> getting >> >> >> >> > a >> >> >> >> > >> >> memory error on creating a numpy array. This kind of >> thing >> >> >> >> typically >> >> >> >> > >> >> happens when you are out of memory. Does this seem to be >> >> the >> >> >> case >> >> >> >> > with >> >> >> >> > >> >> you? When this dies, is your memory usage at 100%? If >> so, >> >> >> this >> >> >> >> > >> algorithm >> >> >> >> > >> >> might require a little tweaking... >> >> >> >> > >> >> >> >> >> >> > >> >> Be Well >> >> >> >> > >> >> Anthony >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < >> >> >> >> dav...@gm...> >> >> >> >> > >> >> wrote: >> >> >> >> > >> >> >> >> >> >> > >> >> > I'm still having problems with this one. I can't tell >> if >> >> >> this >> >> >> >> > >> something >> >> >> >> > >> >> > dumb Im doing with itertools, or if its something in >> >> >> pytables. >> >> >> >> > >> >> > >> >> >> >> > >> >> > Would appreciate any help. >> >> >> >> > >> >> > >> >> >> >> > >> >> > Thanks >> >> >> >> > >> >> > >> >> >> >> > >> >> > >> >> >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < >> >> >> >> > dav...@gm... >> >> >> >> > >> >> >wrote: >> >> >> >> > >> >> > >> >> >> >> > >> >> >> I think I have to reopen this issue. I have been >> running >> >> >> fine >> >> >> >> for >> >> >> >> > >> >> awhile >> >> >> >> > >> >> >> using the combinations method from itertools, but have >> >> >> recently >> >> >> >> > run >> >> >> >> > >> >> into a >> >> >> >> > >> >> >> memory since I have recently quadrupled the size of >> the >> >> hdf >> >> >> >> file. >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> Here is my code again: >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> from itertools import combinations, izip >> >> >> >> > >> >> >> with tb.openFile(h5_all, 'r') as f: >> >> >> >> > >> >> >> irises = f.root.irises >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> templates = f.root.irises.cols.templates >> >> >> >> > >> >> >> masks = f.root.irises.cols.masks1 >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> N_irises = len(irises) >> >> >> >> > >> >> >> index = np.ones((20 * 480), np.bool) >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> >> >> > >> >> >> D = np.empty((N_irises, N_irises)) >> >> >> >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in >> >> >> combinations(izip(templates, >> >> >> >> > >> masks, >> >> >> >> > >> >> >> range(N_irises)), 2): >> >> >> >> > >> >> >> # print ii >> >> >> >> > >> >> >> D[ii, jj] = ham_dist( >> >> >> >> > >> >> >> t1[8, index], >> >> >> >> > >> >> >> t2[:, index], >> >> >> >> > >> >> >> m1[8, index], >> >> >> >> > >> >> >> m2[:, index], >> >> >> >> > >> >> >> ) >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> And here is the error: >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> In [10]: get_hd3() >> >> >> >> > >> >> >> 10669890 Comparisons >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------------- >> >> >> >> > >> >> >> MemoryError Traceback >> (most >> >> >> >> recent >> >> >> >> > >> call >> >> >> >> > >> >> >> last) >> >> >> >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> >> >> >> > >> >> >> ----> 1 get_hd3() >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> 118 print '%i Comparisons' % >> >> >> >> > >> (N_irises*(N_irises - >> >> >> >> > >> >> >> 1)/2) >> >> >> >> > >> >> >> 119 D = np.empty((N_irises, >> >> N_irises)) >> >> >> >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, >> jj) in >> >> >> >> > >> >> >> combinations(izip(temp >> >> >> >> > >> >> >> lates, masks, range(N_irises)), 2): >> >> >> >> > >> >> >> 121 # print ii >> >> >> >> > >> >> >> 122 D[ii, jj] = ham_dist( >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> >> >> __iter__(self) >> >> >> >> > >> >> >> 3274 for start_row in xrange(0, len(self), >> >> >> >> nrowsinbuf): >> >> >> >> > >> >> >> 3275 end_row = min([start_row + >> >> nrowsinbuf, >> >> >> >> > max_row]) >> >> >> >> > >> >> >> -> 3276 buf = table.read(start_row, >> end_row, >> >> 1, >> >> >> >> > >> >> >> field=self.pathname) >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> 3277 for row in buf: >> >> >> >> > >> >> >> 3278 yield row >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> read(self, >> >> >> >> > start, >> >> >> >> > >> >> stop, >> >> >> >> > >> >> >> step, >> >> >> >> > >> >> >> field) >> >> >> >> > >> >> >> 1772 (start, stop, step) = >> >> >> >> > self._processRangeRead(start, >> >> >> >> > >> >> stop, >> >> >> >> > >> >> >> step) >> >> >> >> > >> >> >> 1773 >> >> >> >> > >> >> >> -> 1774 arr = self._read(start, stop, step, >> >> field) >> >> >> >> > >> >> >> 1775 return internal_to_flavor(arr, >> >> self.flavor) >> >> >> >> > >> >> >> 1776 >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> >> _read(self, >> >> >> >> > start, >> >> >> >> > >> >> >> stop, step, >> >> >> >> > >> >> >> field) >> >> >> >> > >> >> >> 1719 if field: >> >> >> >> > >> >> >> 1720 # Create a container for the >> results >> >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> >> >> >> > >> dtype=dtypeField) >> >> >> >> > >> >> >> 1722 else: >> >> >> >> > >> >> >> 1723 # Recarray case >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> MemoryError: >> >> >> >> > >> >> >> > >> >> c:\python27\lib\site-packages\tables\table.py(1721)_read() >> >> >> >> > >> >> >> 1720 # Create a container for the >> results >> >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> >> >> >> > >> dtype=dtypeField) >> >> >> >> > >> >> >> 1722 else: >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> Also, if you guys see any performance problems in my >> >> code, >> >> >> >> please >> >> >> >> > >> let >> >> >> >> > >> >> me >> >> >> >> > >> >> >> know. >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> Thank you so much for the help. >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> -Dave >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> >> >> >> > >> >> >> pyt...@li...> wrote: >> >> >> >> > >> >> >> >> >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to >> >> >> >> > >> >> >>> pyt...@li... >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, >> >> visit >> >> >> >> > >> >> >>> >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> or, via email, send a message with subject or body >> >> 'help' >> >> >> to >> >> >> >> > >> >> >>> pyt...@li... >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> You can reach the person managing the list at >> >> >> >> > >> >> >>> pyt...@li... >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> When replying, please edit your Subject line so it is >> >> more >> >> >> >> > specific >> >> >> >> > >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> Today's Topics: >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 >> (David >> >> >> Reed) >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> Message: 1 >> >> >> >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> >> >> >> > >> >> >>> From: David Reed <dav...@gm...> >> >> >> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, >> Vol >> >> >> 80, >> >> >> >> > Issue >> >> >> >> > >> 8 >> >> >> >> > >> >> >>> To: pyt...@li... >> >> >> >> > >> >> >>> Message-ID: >> >> >> >> > >> >> >>> < >> >> >> >> > >> >> >>> >> >> >> >> > >> >> CAM...@ma... >> >> >> >> > >> > >> >> >> >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> I can't thank you guys enough for the help. I was >> able >> >> to >> >> >> add >> >> >> >> > the >> >> >> >> > >> >> >>> __iter__ >> >> >> >> > >> >> >>> function to the table.py file and everything seems >> to be >> >> >> >> working >> >> >> >> > >> >> great! >> >> >> >> > >> >> >>> I'm not quite as fast as I was with iterating right >> of >> >> a >> >> >> >> matrix >> >> >> >> > >> but >> >> >> >> > >> >> >>> pretty >> >> >> >> > >> >> >>> close. I was at 555 comparisons per second, and now >> im >> >> at >> >> >> >> 420. >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> I handled the problem I mentioned earlier by doing >> this, >> >> >> and >> >> >> >> it >> >> >> >> > >> seems >> >> >> >> > >> >> to >> >> >> >> > >> >> >>> work great: >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> A = f.root.data.cols.A >> >> >> >> > >> >> >>> B = f.root.data.cols.B >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> D = np.empty((len(A), len(A)) >> >> >> >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in >> combinations(izip(A, >> >> B, >> >> >> >> > >> >> range(len(A))), >> >> >> >> > >> >> >>> 2): >> >> >> >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> Again, thanks a lot. >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> -Dave >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> >> >> >> > >> >> >>> pyt...@li...> wrote: >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >>> > Send Pytables-users mailing list submissions to >> >> >> >> > >> >> >>> > pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, >> >> visit >> >> >> >> > >> >> >>> > >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > or, via email, send a message with subject or body >> >> >> 'help' to >> >> >> >> > >> >> >>> > >> pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > You can reach the person managing the list at >> >> >> >> > >> >> >>> > pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > When replying, please edit your Subject line so it >> is >> >> >> more >> >> >> >> > >> specific >> >> >> >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Today's Topics: >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 >> >> (Anthony >> >> >> >> > >> Scopatz) >> >> >> >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 >> >> (Anthony >> >> >> >> > >> Scopatz) >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Message: 1 >> >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users >> Digest, >> >> Vol >> >> >> 80, >> >> >> >> > >> Issue 3 >> >> >> >> > >> >> >>> > To: Discussion list for PyTables >> >> >> >> > >> >> >>> > <pyt...@li...> >> >> >> >> > >> >> >>> > Message-ID: >> >> >> >> > >> >> >>> > >> <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> >> >> >> > >> >> >>> > Gz...@ma...> >> >> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < >> >> >> >> > >> dav...@gm...> >> >> >> >> > >> >> >>> wrote: >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > > Thanks a lot for the help so far guys! >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > Looking at itertools, I found what I believe to >> be >> >> the >> >> >> >> > perfect >> >> >> >> > >> >> >>> function >> >> >> >> > >> >> >>> > > for what I need, itertools.combinations. This >> >> appears >> >> >> to >> >> >> >> be a >> >> >> >> > >> >> valid >> >> >> >> > >> >> >>> > > replacement to the method proposed. >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Yes, combinations is awesome! >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > There is a small problem that I didn't mention is >> >> that >> >> >> my >> >> >> >> > >> compare >> >> >> >> > >> >> >>> > function >> >> >> >> > >> >> >>> > > actually takes as inputs 2 columns from the >> table. >> >> Like >> >> >> >> so: >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) >> >> >> >> > >> >> >>> > > for ii in xrange(N_elements): >> >> >> >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): >> >> >> >> > >> >> >>> > > D[ii, jj] = >> compare(data['element1'][ii], >> >> >> >> > >> >> >>> > data['element1'][jj],data['element2'][ii], >> >> >> >> > >> >> >>> > > data['element2'][jj]) >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > Is there an efficient way of using itertools with >> >> this >> >> >> >> > >> structure? >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > You can always make two other iterators for each >> >> column. >> >> >> >> Since >> >> >> >> > >> you >> >> >> >> > >> >> >>> have >> >> >> >> > >> >> >>> > two columns you would have 4 iterators. I am not >> sure >> >> >> how >> >> >> >> fast >> >> >> >> > >> >> this is >> >> >> >> > >> >> >>> > going to be but I am confident that there is >> >> definitely a >> >> >> >> way >> >> >> >> > to >> >> >> >> > >> do >> >> >> >> > >> >> >>> this in >> >> >> >> > >> >> >>> > one for-loop, which is going to be way faster than >> >> nested >> >> >> >> > loops. >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Be Well >> >> >> >> > >> >> >>> > Anthony >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> >> >> >> > >> >> >>> > > pyt...@li...> >> >> wrote: >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > >> Send Pytables-users mailing list submissions to >> >> >> >> > >> >> >>> > >> pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide >> Web, >> >> >> visit >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > >> or, via email, send a message with subject or >> body >> >> >> >> 'help' to >> >> >> >> > >> >> >>> > >> >> >> pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> You can reach the person managing the list at >> >> >> >> > >> >> >>> > >> >> pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> When replying, please edit your Subject line so >> it >> >> is >> >> >> >> more >> >> >> >> > >> >> specific >> >> >> >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> Today's Topics: >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using >> PyTables >> >> >> (Josh >> >> >> >> > Ayers) >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> Message: 1 >> >> >> >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> >> >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> >> >> >> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration >> of >> >> HDF5 >> >> >> >> using >> >> >> >> > >> >> >>> PyTables >> >> >> >> > >> >> >>> > >> To: Discussion list for PyTables >> >> >> >> > >> >> >>> > >> <pyt...@li...> >> >> >> >> > >> >> >>> > >> Message-ID: >> >> >> >> > >> >> >>> > >> < >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >> >> >> >> >> CAC...@ma...> >> >> >> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> David, >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> The change in issue 27 was only for iteration >> over >> >> a >> >> >> >> > >> >> tables.Column >> >> >> >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as >> >> follows. >> >> >> >> This >> >> >> >> > >> will >> >> >> >> > >> >> >>> > iterate >> >> >> >> > >> >> >>> > >> over the "element" column, as in your original >> >> >> example. >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> Note also that this will only work with the >> >> >> development >> >> >> >> > >> version >> >> >> >> > >> >> of >> >> >> >> > >> >> >>> > >> PyTables >> >> >> >> > >> >> >>> > >> available on github. It will be very slow using >> >> the >> >> >> >> > released >> >> >> >> > >> >> >>> v2.4.0. >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> from itertools import izip >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> with tb.openFile(...) as f: >> >> >> >> > >> >> >>> > >> data = f.root.data.cols.element >> >> >> >> > >> >> >>> > >> data_i = iter(data) >> >> >> >> > >> >> >>> > >> data_j = iter(data) >> >> >> >> > >> >> >>> > >> data_i.next() # throw the first value away >> >> >> >> > >> >> >>> > >> for i, j in izip(data_i, data_j): >> >> >> >> > >> >> >>> > >> compare(i, j) >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> Hope that helps, >> >> >> >> > >> >> >>> > >> Josh >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz >> < >> >> >> >> > >> >> sc...@gm...> >> >> >> >> > >> >> >>> > >> wrote: >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> > HI David, >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > Tables and table column iteration have been >> >> >> overhauled >> >> >> >> > >> fairly >> >> >> >> > >> >> >>> recently >> >> >> >> > >> >> >>> > >> > [1]. So you might try creating two iterators, >> >> >> offset >> >> >> >> by >> >> >> >> > >> one, >> >> >> >> > >> >> and >> >> >> >> > >> >> >>> then >> >> >> >> > >> >> >>> > >> > doing the comparison. I am hacking this out >> >> super >> >> >> >> quick >> >> >> >> > so >> >> >> >> > >> >> please >> >> >> >> > >> >> >>> > >> forgive >> >> >> >> > >> >> >>> > >> > me: >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > from itertools import izip >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > with tb.openFile(...) as f: >> >> >> >> > >> >> >>> > >> > data = f.root.data >> >> >> >> > >> >> >>> > >> > data_i = iter(data) >> >> >> >> > >> >> >>> > >> > data_j = iter(data) >> >> >> >> > >> >> >>> > >> > data_i.next() # throw the first value away >> >> >> >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): >> >> >> >> > >> >> >>> > >> > compare(i, j) >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > You get the idea ;) >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > Be Well >> >> >> >> > >> >> >>> > >> > Anthony >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > 1. >> >> https://github.com/PyTables/PyTables/issues/27 >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >> >> > >> >> >>> dav...@gm...> >> >> >> >> > >> >> >>> > >> wrote: >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> >> I was hoping someone could help me out here. >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> This is from a post I put up on >> StackOverflow, >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I >> store in >> >> >> HDF5 >> >> >> >> and >> >> >> >> > >> >> access >> >> >> >> > >> >> >>> > using >> >> >> >> > >> >> >>> > >> >> PyTables. One operation I need to do on this >> >> >> dataset >> >> >> >> are >> >> >> >> > >> >> pairwise >> >> >> >> > >> >> >>> > >> >> comparisons between each of the elements. >> This >> >> >> >> requires 2 >> >> >> >> > >> >> loops, >> >> >> >> > >> >> >>> one >> >> >> >> > >> >> >>> > to >> >> >> >> > >> >> >>> > >> >> iterate over each element, and an inner loop >> to >> >> >> >> iterate >> >> >> >> > >> over >> >> >> >> > >> >> >>> every >> >> >> >> > >> >> >>> > >> other >> >> >> >> > >> >> >>> > >> >> element. This operation thus looks at >> N(N-1)/2 >> >> >> >> > comparisons. >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> For fairly small sets I found it to be >> faster to >> >> >> dump >> >> >> >> the >> >> >> >> > >> >> >>> contents >> >> >> >> > >> >> >>> > >> into a >> >> >> >> > >> >> >>> > >> >> multdimensional numpy array and then do my >> >> >> iteration. >> >> >> >> I >> >> >> >> > run >> >> >> >> > >> >> into >> >> >> >> > >> >> >>> > >> problems >> >> >> >> > >> >> >>> > >> >> with large sets because of memory issues and >> >> need >> >> >> to >> >> >> >> > access >> >> >> >> > >> >> each >> >> >> >> > >> >> >>> > >> element of >> >> >> >> > >> >> >>> > >> >> the dataset at run time. >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> Putting the elements into an array gives me >> >> about >> >> >> 600 >> >> >> >> > >> >> >>> comparisons per >> >> >> >> > >> >> >>> > >> >> second, while operating on hdf5 data itself >> >> gives >> >> >> me >> >> >> >> > about >> >> >> >> > >> 300 >> >> >> >> > >> >> >>> > >> comparisons >> >> >> >> > >> >> >>> > >> >> per second. >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> Is there a way to speed this process up? >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> Example follows (this is not my real code, >> just >> >> an >> >> >> >> > >> example): >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> *Small Set*: >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> >> > >> >> >>> > >> >> data = f.root.data >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> >> >> >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): >> >> >> >> > >> >> >>> > >> >> elements[ii] = data['element'] >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> >> >> >> > >> >> xrange(N_elements): >> >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >> >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], >> >> >> >> elements[jj]) >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> *Large Set*: >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> >> > >> >> >>> > >> >> data = f.root.data >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> >> >> >> > >> >> >>> > >> >> for ii in xrange(N_elements): >> >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >> >> > >> >> >>> > >> >> D[ii, jj] = >> >> >> compare(data['element'][ii], >> >> >> >> > >> >> >>> > >> data['element'][jj]) >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, >> ASP.NET, >> >> C# >> >> >> >> 2012, >> >> >> >> > >> >> HTML5, >> >> >> >> > >> >> >>> CSS, >> >> >> >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much >> more. >> >> Keep >> >> >> >> your >> >> >> >> > >> >> skills >> >> >> >> > >> >> >>> > current >> >> >> >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video >> >> >> tutorials >> >> >> >> by >> >> >> >> > >> >> >>> Microsoft >> >> >> >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- >> >> learn >> >> >> >> more >> >> >> >> > at: >> >> >> >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >> >> > >> >> >>> > >> >> >> _______________________________________________ >> >> >> >> > >> >> >>> > >> >> Pytables-users mailing list >> >> >> >> > >> >> >>> > >> >> Pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> >> >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, >> ASP.NET, >> >> C# >> >> >> >> 2012, >> >> >> >> > >> >> HTML5, >> >> >> >> > >> >> >>> CSS, >> >> >> >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. >> >> Keep >> >> >> >> your >> >> >> >> > >> skills >> >> >> >> > >> >> >>> > current >> >> >> >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video >> >> >> tutorials >> >> >> >> by >> >> >> >> > >> >> Microsoft >> >> >> >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- >> >> learn >> >> >> more >> >> >> >> > at: >> >> >> >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> >> >> >> > >> >> >>> > >> > >> _______________________________________________ >> >> >> >> > >> >> >>> > >> > Pytables-users mailing list >> >> >> >> > >> >> >>> > >> > Pyt...@li... >> >> >> >> > >> >> >>> > >> > >> >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> > >> >> >> >> > >> >> >>> > >> -------------- next part -------------- >> >> >> >> > >> >> >>> > >> An HTML attachment was scrubbed... >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> ------------------------------ >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, >> C# >> >> >> 2012, >> >> >> >> > >> HTML5, >> >> >> >> > >> >> >>> CSS, >> >> >> >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. >> Keep >> >> >> your >> >> >> >> > >> skills >> >> >> >> > >> >> >>> current >> >> >> >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video >> >> tutorials >> >> >> by >> >> >> >> > >> >> Microsoft >> >> >> >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- >> learn >> >> >> more >> >> >> >> at: >> >> >> >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> ------------------------------ >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> _______________________________________________ >> >> >> >> > >> >> >>> > >> Pytables-users mailing list >> >> >> >> > >> >> >>> > >> Pyt...@li... >> >> >> >> > >> >> >>> > >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> >> >> >> > >> >> >>> > >> ********************************************* >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, >> C# >> >> >> 2012, >> >> >> >> > >> HTML5, >> >> >> >> > >> >> CSS, >> >> >> >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. >> Keep >> >> >> your >> >> >> >> > skills >> >> >> >> > >> >> >>> current >> >> >> >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video >> >> tutorials >> >> >> by >> >> >> >> > >> Microsoft >> >> >> >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- >> learn >> >> more >> >> >> >> at: >> >> >> >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >> >> > >> >> >>> > > _______________________________________________ >> >> >> >> > >> >> >>> > > Pytables-users mailing list >> >> >> >> > >> >> >>> > > Pyt...@li... >> >> >> >> > >> >> >>> > > >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > > >> >> >> >> > >> >> >>> > -------------- next part -------------- >> >> >> >> > >> >> >>> > An HTML attachment was scrubbed... >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > ------------------------------ >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > Message: 2 >> >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users >> Digest, >> >> Vol >> >> >> 80, >> >> >> >> > >> Issue 4 >> >> >> >> > >> ... [truncated message content] |
From: Anthony S. <sc...@gm...> - 2013-02-27 20:37:36
|
Sorry, I don't know. I never have used this feature. Maybe someone who has can chime in. On Wed, Feb 27, 2013 at 2:26 PM, Frédéric Bastien <no...@no...> wrote: > That is fine with me. I just want to detect if my data got corrupted > by hardware problems. > > Do someone know if it always get verified? Do you know if this cause > significant speed difference? > > thanks > > Frédéric > > On Wed, Feb 27, 2013 at 3:21 PM, Anthony Scopatz <sc...@gm...> > wrote: > > I think that the checksum is on the compressed data... > > > > > > On Wed, Feb 27, 2013 at 2:16 PM, Frédéric Bastien <no...@no...> > wrote: > >> > >> Hi, > >> > >> we just got some problem with our file server and this bring me > >> question on how to detect corrupted files. > >> > >> There is a way to specify a filter when creating a table that add a > >> checksum[1]. > >> > >> My questions is, when a file is created with checksum, are they always > >> verified when the chunks are uncompressed? Can we specify when we open > >> the file if we want to check it or not? The examples I found only talk > >> about it when we create the file. > >> > >> thanks > >> > >> Frédéric Bastien > >> > >> > >> [1] http://pytables.github.com/usersguide/libref/helper_classes.html > >> > >> > >> > ------------------------------------------------------------------------------ > >> Everyone hates slow websites. So do we. > >> Make your web apps faster with AppDynamics > >> Download AppDynamics Lite for free today: > >> http://p.sf.net/sfu/appdyn_d2d_feb > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_feb > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Frédéric B. <no...@no...> - 2013-02-27 20:27:55
|
That is fine with me. I just want to detect if my data got corrupted by hardware problems. Do someone know if it always get verified? Do you know if this cause significant speed difference? thanks Frédéric On Wed, Feb 27, 2013 at 3:21 PM, Anthony Scopatz <sc...@gm...> wrote: > I think that the checksum is on the compressed data... > > > On Wed, Feb 27, 2013 at 2:16 PM, Frédéric Bastien <no...@no...> wrote: >> >> Hi, >> >> we just got some problem with our file server and this bring me >> question on how to detect corrupted files. >> >> There is a way to specify a filter when creating a table that add a >> checksum[1]. >> >> My questions is, when a file is created with checksum, are they always >> verified when the chunks are uncompressed? Can we specify when we open >> the file if we want to check it or not? The examples I found only talk >> about it when we create the file. >> >> thanks >> >> Frédéric Bastien >> >> >> [1] http://pytables.github.com/usersguide/libref/helper_classes.html >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: David R. <dav...@gm...> - 2013-02-27 20:25:47
|
Thanks for getting back Anthony, So I originally published to this list looking for an efficient way of doing pairwise comparisons on an HDF5 table that had only about 700 elements. You guys sorta guided me in the direction of itertools while also notifying me of a bug fix that was recently pushed which had more efficient iteration. This worked great! and really sped up my comparisons, and I was flying high for quite awhile. Things started breaking though when I upped the # of elements to about 5000. I gave some code that created some sim data and you were getting the same error on your machine. I put this code up as Gist here: https://gist.github.com/dvreed77/fa3060b18257008df383 Again, if you can think of any thing, I'll try to do the leg work as best as I can. On Wed, Feb 27, 2013 at 3:06 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 11 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 27 Feb 2013 14:05:38 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 11 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hi David, > > Sorry about the delay. I have mostly forgotten what exactly this issue > was. I am pretty swamped this week so I could throw out some WAGs but I > don't think I'll be able to do any real work myself on it. > > Be Well > Anthony > > > On Mon, Feb 25, 2013 at 2:15 PM, David Reed <dav...@gm...> > wrote: > > > Anthony, > > > > I've had a chance recently to revisit this problem and am not getting > > anywhere. I was hoping I might be able to get more support in getting > this > > working. If you have some ideas, through them out and I can do the leg > > work and see what I can come up with. > > > > -David > > > > > > On Mon, Feb 4, 2013 at 3:44 PM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 9 (Anthony Scopatz) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Mon, 4 Feb 2013 14:43:37 -0600 > >> From: Anthony Scopatz <sc...@gm...> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 9 > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> < > >> CAP...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> Hey David, > >> > >> I am getting the following error now: > >> > >> scopatz@ares ~ $ python t.py > >> 10669890 Comparisons > >> Traceback (most recent call last): > >> File "t.py", line 61, in <module> > >> get_hd() > >> File "t.py", line 54, in get_hd > >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, > masks, > >> range(N_irises)), 2): > >> File > "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > >> line 3308, in __iter__ > >> out=buf_slice) > >> File > "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > >> line 1807, in read > >> arr = self._read(start, stop, step, field, out) > >> File > "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > >> line 1732, in _read > >> bytes_required)) > >> ValueError: output array size invalid, got 4620 bytes, need 753984000 > >> bytes > >> > >> And I had to change the phasors line to ths following: > >> > >> r['phasors'] = np.empty((17, 20*240), complex) > >> > >> Thanks. > >> Be Well > >> Anthony > >> > >> > >> > >> On Mon, Feb 4, 2013 at 1:41 PM, David Reed <dav...@gm...> > >> wrote: > >> > >> > I didn't have any luck. I replaced that __iter__ function which led > to > >> me > >> > replacing the read function which lead to me replaceing the _read > >> function > >> > and I eventually got another error. > >> > > >> > Below are 2 functions and my HDF5 Table class declaration. They > should > >> be > >> > self explanatory. I wasn't sure if attachments would go through and > >> this > >> > is pretty small, so I figured it would be ok just to post. I > apologize > >> if > >> > this is a bit cluttered. I would also appreciate any comments on how > I > >> > assign the results to the matrix D, this does not seem very pythonic > at > >> all > >> > and could use some advice there if its easy. (the ii*jj is just a > place > >> > holder for a more sophisticated measure). Thanks again! > >> > > >> > import numpy as np > >> > import tables as tb > >> > > >> > class Iris(tb.IsDescription): > >> > subject_id = tb.IntCol() > >> > iris_id = tb.IntCol() > >> > database = tb.StringCol(5) > >> > is_left = tb.BoolCol() > >> > is_flipped = tb.BoolCol() > >> > templates = tb.BoolCol(shape=(17, 20*480)) > >> > masks1 = tb.BoolCol(shape=(17, 20*480)) > >> > phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) > >> > masks2 = tb.BoolCol(shape=(17, 20*240)) > >> > > >> > > >> > def create_hdf5(): > >> > """ > >> > """ > >> > with tb.openFile('test.h5', 'w') as f: > >> > > >> > # Create and fill the table of irises", > >> > irises = f.createTable(f.root, 'irises', Iris, 'Irises', > >> > filters=tb.Filters(1)) > >> > for ii in range(4620): > >> > > >> > r = irises.row > >> > r['subject_id'] = ii > >> > r['iris_id'] = 0 > >> > r['database'] = 'test' > >> > r['is_left'] = True > >> > r['is_flipped'] = False > >> > r['templates'] = np.empty((17, 20*480), np.bool8) > >> > r['masks1'] = np.empty((17, 20*480), np.bool8) > >> > r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) > >> > r['masks2'] = np.empty((17, 20*240), np.bool8) > >> > r.append() > >> > > >> > irises.flush() > >> > > >> > def get_hd(): > >> > """ > >> > """ > >> > from itertools import combinations, izip > >> > with tb.openFile('test.h5') as f: > >> > irises = f.root.irises > >> > > >> > templates = f.root.irises.cols.templates > >> > masks = f.root.irises.cols.masks1 > >> > > >> > N_irises = len(irises) > >> > > >> > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> > D = np.empty((N_irises, N_irises)) > >> > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > >> > range(N_irises)), 2): > >> > D[ii, jj] = ii*jj > >> > > >> > np.save('test', D) > >> > > >> > > >> > > >> > > >> > On Mon, Feb 4, 2013 at 11:16 AM, < > >> > pyt...@li...> wrote: > >> > > >> >> Send Pytables-users mailing list submissions to > >> >> pyt...@li... > >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> or, via email, send a message with subject or body 'help' to > >> >> pyt...@li... > >> >> > >> >> You can reach the person managing the list at > >> >> pyt...@li... > >> >> > >> >> When replying, please edit your Subject line so it is more specific > >> >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> > >> >> Today's Topics: > >> >> > >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) > >> >> > >> >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> Message: 1 > >> >> Date: Mon, 4 Feb 2013 10:16:24 -0600 > >> >> From: Anthony Scopatz <sc...@gm...> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 > >> >> To: Discussion list for PyTables > >> >> <pyt...@li...> > >> >> Message-ID: > >> >> < > >> >> CAP...@ma...> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> > >> >> wrote: > >> >> > >> >> > Hi Josh, > >> >> > > >> >> > Here is my __iter__ code: > >> >> > > >> >> > def __iter__(self): > >> >> > table = self.table > >> >> > itemsize = self.dtype.itemsize > >> >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // > >> itemsize > >> >> > max_row = len(self) > >> >> > for start_row in xrange(0, len(self), nrowsinbuf): > >> >> > end_row = min([start_row + nrowsinbuf, max_row]) > >> >> > buf = table.read(start_row, end_row, 1, > >> field=self.pathname) > >> >> > for row in buf: > >> >> > yield row > >> >> > > >> >> > It does look different, I will try swapping in the code from github > >> and > >> >> > see what happens. > >> >> > > >> >> > >> >> Yes, please let us know how that goes! Otherwise send the list both > >> the > >> >> test data generator script and the script that fails. > >> >> > >> >> Be Well > >> >> Anthony > >> >> > >> >> > >> >> > > >> >> > > >> >> > On Mon, Feb 4, 2013 at 9:59 AM, < > >> >> > pyt...@li...> wrote: > >> >> > > >> >> >> Send Pytables-users mailing list submissions to > >> >> >> pyt...@li... > >> >> >> > >> >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> or, via email, send a message with subject or body 'help' to > >> >> >> pyt...@li... > >> >> >> > >> >> >> You can reach the person managing the list at > >> >> >> pyt...@li... > >> >> >> > >> >> >> When replying, please edit your Subject line so it is more > specific > >> >> >> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> >> > >> >> >> Today's Topics: > >> >> >> > >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) > >> >> >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) > >> >> >> > >> >> >> > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> > >> >> >> Message: 1 > >> >> >> Date: Fri, 1 Feb 2013 14:08:47 -0800 > >> >> >> From: Josh Ayers <jos...@gm...> > >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > Issue 4 > >> >> >> To: Discussion list for PyTables > >> >> >> <pyt...@li...> > >> >> >> Message-ID: > >> >> >> <CACOB4aPG4NZ6b2a3v= > >> >> >> 1Ue...@ma...> > >> >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >> David, > >> >> >> > >> >> >> You added a custom version of table.Column.__iter__, correct? > Could > >> >> you > >> >> >> also include that along with the script to reproduce the error? > >> >> >> > >> >> >> It seems like the problem may be in the 'nrowsinbuf' calculation - > >> see > >> >> >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're > >> using > >> >> the > >> >> >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows > >> of 6 > >> >> >> chunks. Instead, it's reading the entire table. > >> >> >> > >> >> >> [1]: > >> >> >> > >> >> > >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz < > sc...@gm...> > >> >> >> wrote: > >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < > >> dav...@gm...> > >> >> >> wrote: > >> >> >> > > >> >> >> >> at the error: > >> >> >> >> > >> >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> >> >> > >> >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> >> >> > >> >> >> >> I'm not sure what that means as a dtype, but thats what it is. > >> >> >> >> > >> >> >> >> Forgive me if I'm being totally naive, but I thought the whole > >> >> point of > >> >> >> >> __iter__ with pyttables was to do iteration on the fly, so > there > >> is > >> >> no > >> >> >> >> preallocation. > >> >> >> >> > >> >> >> > > >> >> >> > Nope you are not being naive at all. That is the point. > >> >> >> > > >> >> >> > > >> >> >> >> If you have any ideas on this I'm all ears. > >> >> >> >> > >> >> >> > > >> >> >> > If you could send a minimal script which reproduces this error, > >> that > >> >> >> would > >> >> >> > help a lot. > >> >> >> > > >> >> >> > Be Well > >> >> >> > Anthony > >> >> >> > > >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> Thanks again. > >> >> >> >> > >> >> >> >> Dave > >> >> >> >> > >> >> >> >> > >> >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> >> >> pyt...@li...> wrote: > >> >> >> >> > >> >> >> >>> Send Pytables-users mailing list submissions to > >> >> >> >>> pyt...@li... > >> >> >> >>> > >> >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> >>> or, via email, send a message with subject or body 'help' to > >> >> >> >>> pyt...@li... > >> >> >> >>> > >> >> >> >>> You can reach the person managing the list at > >> >> >> >>> pyt...@li... > >> >> >> >>> > >> >> >> >>> When replying, please edit your Subject line so it is more > >> specific > >> >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> Today's Topics: > >> >> >> >>> > >> >> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony > >> Scopatz) > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> > ---------------------------------------------------------------------- > >> >> >> >>> > >> >> >> >>> Message: 1 > >> >> >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >> >> >>> From: Anthony Scopatz <sc...@gm...> > >> >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > >> Issue > >> >> 2 > >> >> >> >>> To: Discussion list for PyTables > >> >> >> >>> <pyt...@li...> > >> >> >> >>> Message-ID: > >> >> >> >>> < > >> >> >> >>> > >> CAP...@ma... > >> >> > > >> >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> >>> > >> >> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > >> >> dav...@gm...> > >> >> >> >>> wrote: > >> >> >> >>> > >> >> >> >>> > Hi Anthony, > >> >> >> >>> > > >> >> >> >>> > Thanks for the reply. > >> >> >> >>> > > >> >> >> >>> > I honestly don't know how to monitor my Python memory usage, > >> but > >> >> I'm > >> >> >> >>> sure > >> >> >> >>> > that its caused by out of memory. > >> >> >> >>> > > >> >> >> >>> > >> >> >> >>> Well, I would just run top or process monitor or something > while > >> >> >> running > >> >> >> >>> the python script to see what happens to memory usage as the > >> script > >> >> >> chugs > >> >> >> >>> along... > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > I'm just trying to find out how to fix it. My HDF5 table > has > >> >> 4620 > >> >> >> >>> rows > >> >> >> >>> > and the column I'm iterating over is a 17x9600 boolean > matrix. > >> >> The > >> >> >> >>> > __iter__ method is preallocating an array that is this size > >> which > >> >> >> >>> appears > >> >> >> >>> > to be root of the error. I was hoping there is a fix > >> somewhere > >> >> in > >> >> >> >>> here to > >> >> >> >>> > not have to do this preallocation. > >> >> >> >>> > > >> >> >> >>> > >> >> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. > >> >> 4620 of > >> >> >> >>> these is ~760 MB. If you have 2 GB of memory and you are > >> iterating > >> >> >> over > >> >> >> >>> 2 > >> >> >> >>> of these (templates & masks) it is conceivable that you are > just > >> >> >> running > >> >> >> >>> out of memory. Maybe there is a way that __iter__ could not > >> >> >> preallocate > >> >> >> >>> something that is basically a temporary. What is the dtype of > >> the > >> >> >> >>> templates array? > >> >> >> >>> > >> >> >> >>> Be Well > >> >> >> >>> Anthony > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > > >> >> >> >>> > Thanks again. > >> >> >> >>> > >> >> >> >>> > >> >> >> -------------- next part -------------- > >> >> >> An HTML attachment was scrubbed... > >> >> >> > >> >> >> ------------------------------ > >> >> >> > >> >> >> Message: 2 > >> >> >> Date: Mon, 4 Feb 2013 09:58:53 -0500 > >> >> >> From: David Reed <dav...@gm...> > >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > Issue 6 > >> >> >> To: pyt...@li... > >> >> >> Message-ID: > >> >> >> <CAM6XA7= > >> >> >> h50...@ma...> > >> >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >> Hi Anthony, > >> >> >> > >> >> >> Sorry to just get back to you. I can send a script, should I send > a > >> >> script > >> >> >> that creates some fake data as well? > >> >> >> > >> >> >> -Dave > >> >> >> > >> >> >> > >> >> >> On Fri, Feb 1, 2013 at 4:50 PM, < > >> >> >> pyt...@li...> wrote: > >> >> >> > >> >> >> > Send Pytables-users mailing list submissions to > >> >> >> > pyt...@li... > >> >> >> > > >> >> >> > To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > or, via email, send a message with subject or body 'help' to > >> >> >> > pyt...@li... > >> >> >> > > >> >> >> > You can reach the person managing the list at > >> >> >> > pyt...@li... > >> >> >> > > >> >> >> > When replying, please edit your Subject line so it is more > >> specific > >> >> >> > than "Re: Contents of Pytables-users digest..." > >> >> >> > > >> >> >> > > >> >> >> > Today's Topics: > >> >> >> > > >> >> >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony > Scopatz) > >> >> >> > > >> >> >> > > >> >> >> > > >> >> > ---------------------------------------------------------------------- > >> >> >> > > >> >> >> > Message: 1 > >> >> >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 > >> >> >> > From: Anthony Scopatz <sc...@gm...> > >> >> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > >> Issue 4 > >> >> >> > To: Discussion list for PyTables > >> >> >> > <pyt...@li...> > >> >> >> > Message-ID: > >> >> >> > < > >> >> >> > > >> CAP...@ma...> > >> >> >> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > > >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < > >> dav...@gm...> > >> >> >> wrote: > >> >> >> > > >> >> >> > > at the error: > >> >> >> > > > >> >> >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> >> > > > >> >> >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> >> > > > >> >> >> > > I'm not sure what that means as a dtype, but thats what it is. > >> >> >> > > > >> >> >> > > Forgive me if I'm being totally naive, but I thought the whole > >> >> point > >> >> >> of > >> >> >> > > __iter__ with pyttables was to do iteration on the fly, so > there > >> >> is no > >> >> >> > > preallocation. > >> >> >> > > > >> >> >> > > >> >> >> > Nope you are not being naive at all. That is the point. > >> >> >> > > >> >> >> > > >> >> >> > > If you have any ideas on this I'm all ears. > >> >> >> > > > >> >> >> > > >> >> >> > If you could send a minimal script which reproduces this error, > >> that > >> >> >> would > >> >> >> > help a lot. > >> >> >> > > >> >> >> > Be Well > >> >> >> > Anthony > >> >> >> > > >> >> >> > > >> >> >> > > > >> >> >> > > > >> >> >> > > Thanks again. > >> >> >> > > > >> >> >> > > Dave > >> >> >> > > > >> >> >> > > > >> >> >> > > On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> >> > > pyt...@li...> wrote: > >> >> >> > > > >> >> >> > >> Send Pytables-users mailing list submissions to > >> >> >> > >> pyt...@li... > >> >> >> > >> > >> >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> > >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> or, via email, send a message with subject or body 'help' to > >> >> >> > >> pyt...@li... > >> >> >> > >> > >> >> >> > >> You can reach the person managing the list at > >> >> >> > >> pyt...@li... > >> >> >> > >> > >> >> >> > >> When replying, please edit your Subject line so it is more > >> >> specific > >> >> >> > >> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> Today's Topics: > >> >> >> > >> > >> >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony > >> Scopatz) > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> > >> > >> >> >> > >> Message: 1 > >> >> >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >> >> > >> From: Anthony Scopatz <sc...@gm...> > >> >> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > >> >> Issue 2 > >> >> >> > >> To: Discussion list for PyTables > >> >> >> > >> <pyt...@li...> > >> >> >> > >> Message-ID: > >> >> >> > >> < > >> >> >> > >> > >> >> CAP...@ma...> > >> >> >> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> > >> >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > >> >> dav...@gm...> > >> >> >> > >> wrote: > >> >> >> > >> > >> >> >> > >> > Hi Anthony, > >> >> >> > >> > > >> >> >> > >> > Thanks for the reply. > >> >> >> > >> > > >> >> >> > >> > I honestly don't know how to monitor my Python memory > usage, > >> but > >> >> >> I'm > >> >> >> > >> sure > >> >> >> > >> > that its caused by out of memory. > >> >> >> > >> > > >> >> >> > >> > >> >> >> > >> Well, I would just run top or process monitor or something > >> while > >> >> >> running > >> >> >> > >> the python script to see what happens to memory usage as the > >> >> script > >> >> >> > chugs > >> >> >> > >> along... > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table > >> has > >> >> 4620 > >> >> >> > rows > >> >> >> > >> > and the column I'm iterating over is a 17x9600 boolean > >> matrix. > >> >> The > >> >> >> > >> > __iter__ method is preallocating an array that is this size > >> >> which > >> >> >> > >> appears > >> >> >> > >> > to be root of the error. I was hoping there is a fix > >> somewhere > >> >> in > >> >> >> > here > >> >> >> > >> to > >> >> >> > >> > not have to do this preallocation. > >> >> >> > >> > > >> >> >> > >> > >> >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. > >> >> 4620 > >> >> >> of > >> >> >> > >> these is ~760 MB. If you have 2 GB of memory and you are > >> >> iterating > >> >> >> > over 2 > >> >> >> > >> of these (templates & masks) it is conceivable that you are > >> just > >> >> >> running > >> >> >> > >> out of memory. Maybe there is a way that __iter__ could not > >> >> >> preallocate > >> >> >> > >> something that is basically a temporary. What is the dtype > of > >> the > >> >> >> > >> templates array? > >> >> >> > >> > >> >> >> > >> Be Well > >> >> >> > >> Anthony > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > > >> >> >> > >> > Thanks again. > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < > >> >> >> > >> > pyt...@li...> wrote: > >> >> >> > >> > > >> >> >> > >> >> Send Pytables-users mailing list submissions to > >> >> >> > >> >> pyt...@li... > >> >> >> > >> >> > >> >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >> > >> >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> or, via email, send a message with subject or body 'help' > to > >> >> >> > >> >> pyt...@li... > >> >> >> > >> >> > >> >> >> > >> >> You can reach the person managing the list at > >> >> >> > >> >> pyt...@li... > >> >> >> > >> >> > >> >> >> > >> >> When replying, please edit your Subject line so it is more > >> >> >> specific > >> >> >> > >> >> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > >> >> Today's Topics: > >> >> >> > >> >> > >> >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony > >> >> Scopatz) > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > > >> >> > ---------------------------------------------------------------------- > >> >> >> > >> >> > >> >> >> > >> >> Message: 1 > >> >> >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > >> >> >> > >> >> From: Anthony Scopatz <sc...@gm...> > >> >> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > 80, > >> >> >> Issue 9 > >> >> >> > >> >> To: Discussion list for PyTables > >> >> >> > >> >> <pyt...@li...> > >> >> >> > >> >> Message-ID: > >> >> >> > >> >> < > >> >> >> > >> >> > >> >> >> > CAP...@ma...> > >> >> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> > >> >> >> > >> >> Hi David, > >> >> >> > >> >> > >> >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to > be > >> >> >> getting > >> >> >> > a > >> >> >> > >> >> memory error on creating a numpy array. This kind of > thing > >> >> >> typically > >> >> >> > >> >> happens when you are out of memory. Does this seem to be > >> the > >> >> case > >> >> >> > with > >> >> >> > >> >> you? When this dies, is your memory usage at 100%? If > so, > >> >> this > >> >> >> > >> algorithm > >> >> >> > >> >> might require a little tweaking... > >> >> >> > >> >> > >> >> >> > >> >> Be Well > >> >> >> > >> >> Anthony > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < > >> >> >> dav...@gm...> > >> >> >> > >> >> wrote: > >> >> >> > >> >> > >> >> >> > >> >> > I'm still having problems with this one. I can't tell > if > >> >> this > >> >> >> > >> something > >> >> >> > >> >> > dumb Im doing with itertools, or if its something in > >> >> pytables. > >> >> >> > >> >> > > >> >> >> > >> >> > Would appreciate any help. > >> >> >> > >> >> > > >> >> >> > >> >> > Thanks > >> >> >> > >> >> > > >> >> >> > >> >> > > >> >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < > >> >> >> > dav...@gm... > >> >> >> > >> >> >wrote: > >> >> >> > >> >> > > >> >> >> > >> >> >> I think I have to reopen this issue. I have been > running > >> >> fine > >> >> >> for > >> >> >> > >> >> awhile > >> >> >> > >> >> >> using the combinations method from itertools, but have > >> >> recently > >> >> >> > run > >> >> >> > >> >> into a > >> >> >> > >> >> >> memory since I have recently quadrupled the size of the > >> hdf > >> >> >> file. > >> >> >> > >> >> >> > >> >> >> > >> >> >> Here is my code again: > >> >> >> > >> >> >> > >> >> >> > >> >> >> from itertools import combinations, izip > >> >> >> > >> >> >> with tb.openFile(h5_all, 'r') as f: > >> >> >> > >> >> >> irises = f.root.irises > >> >> >> > >> >> >> > >> >> >> > >> >> >> templates = f.root.irises.cols.templates > >> >> >> > >> >> >> masks = f.root.irises.cols.masks1 > >> >> >> > >> >> >> > >> >> >> > >> >> >> N_irises = len(irises) > >> >> >> > >> >> >> index = np.ones((20 * 480), np.bool) > >> >> >> > >> >> >> > >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> >> >> > >> >> >> D = np.empty((N_irises, N_irises)) > >> >> >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in > >> >> combinations(izip(templates, > >> >> >> > >> masks, > >> >> >> > >> >> >> range(N_irises)), 2): > >> >> >> > >> >> >> # print ii > >> >> >> > >> >> >> D[ii, jj] = ham_dist( > >> >> >> > >> >> >> t1[8, index], > >> >> >> > >> >> >> t2[:, index], > >> >> >> > >> >> >> m1[8, index], > >> >> >> > >> >> >> m2[:, index], > >> >> >> > >> >> >> ) > >> >> >> > >> >> >> > >> >> >> > >> >> >> And here is the error: > >> >> >> > >> >> >> > >> >> >> > >> >> >> In [10]: get_hd3() > >> >> >> > >> >> >> 10669890 Comparisons > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > --------------------------------------------------------------------------- > >> >> >> > >> >> >> MemoryError Traceback > (most > >> >> >> recent > >> >> >> > >> call > >> >> >> > >> >> >> last) > >> >> >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > >> >> >> > >> >> >> ----> 1 get_hd3() > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> 118 print '%i Comparisons' % > >> >> >> > >> (N_irises*(N_irises - > >> >> >> > >> >> >> 1)/2) > >> >> >> > >> >> >> 119 D = np.empty((N_irises, > >> N_irises)) > >> >> >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) > in > >> >> >> > >> >> >> combinations(izip(temp > >> >> >> > >> >> >> lates, masks, range(N_irises)), 2): > >> >> >> > >> >> >> 121 # print ii > >> >> >> > >> >> >> 122 D[ii, jj] = ham_dist( > >> >> >> > >> >> >> > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> >> >> __iter__(self) > >> >> >> > >> >> >> 3274 for start_row in xrange(0, len(self), > >> >> >> nrowsinbuf): > >> >> >> > >> >> >> 3275 end_row = min([start_row + > >> nrowsinbuf, > >> >> >> > max_row]) > >> >> >> > >> >> >> -> 3276 buf = table.read(start_row, > end_row, > >> 1, > >> >> >> > >> >> >> field=self.pathname) > >> >> >> > >> >> >> > >> >> >> > >> >> >> 3277 for row in buf: > >> >> >> > >> >> >> 3278 yield row > >> >> >> > >> >> >> > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> read(self, > >> >> >> > start, > >> >> >> > >> >> stop, > >> >> >> > >> >> >> step, > >> >> >> > >> >> >> field) > >> >> >> > >> >> >> 1772 (start, stop, step) = > >> >> >> > self._processRangeRead(start, > >> >> >> > >> >> stop, > >> >> >> > >> >> >> step) > >> >> >> > >> >> >> 1773 > >> >> >> > >> >> >> -> 1774 arr = self._read(start, stop, step, > >> field) > >> >> >> > >> >> >> 1775 return internal_to_flavor(arr, > >> self.flavor) > >> >> >> > >> >> >> 1776 > >> >> >> > >> >> >> > >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> >> _read(self, > >> >> >> > start, > >> >> >> > >> >> >> stop, step, > >> >> >> > >> >> >> field) > >> >> >> > >> >> >> 1719 if field: > >> >> >> > >> >> >> 1720 # Create a container for the > results > >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> >> >> > >> dtype=dtypeField) > >> >> >> > >> >> >> 1722 else: > >> >> >> > >> >> >> 1723 # Recarray case > >> >> >> > >> >> >> > >> >> >> > >> >> >> MemoryError: > >> >> >> > >> >> >> > > >> c:\python27\lib\site-packages\tables\table.py(1721)_read() > >> >> >> > >> >> >> 1720 # Create a container for the > results > >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> >> >> > >> dtype=dtypeField) > >> >> >> > >> >> >> 1722 else: > >> >> >> > >> >> >> > >> >> >> > >> >> >> Also, if you guys see any performance problems in my > >> code, > >> >> >> please > >> >> >> > >> let > >> >> >> > >> >> me > >> >> >> > >> >> >> know. > >> >> >> > >> >> >> > >> >> >> > >> >> >> Thank you so much for the help. > >> >> >> > >> >> >> > >> >> >> > >> >> >> -Dave > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > >> >> >> > >> >> >> pyt...@li...> wrote: > >> >> >> > >> >> >> > >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to > >> >> >> > >> >> >>> pyt...@li... > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, > >> visit > >> >> >> > >> >> >>> > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> or, via email, send a message with subject or body > >> 'help' > >> >> to > >> >> >> > >> >> >>> pyt...@li... > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> You can reach the person managing the list at > >> >> >> > >> >> >>> pyt...@li... > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> When replying, please edit your Subject line so it is > >> more > >> >> >> > specific > >> >> >> > >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> Today's Topics: > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 > (David > >> >> Reed) > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> Message: 1 > >> >> >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > >> >> >> > >> >> >>> From: David Reed <dav...@gm...> > >> >> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, > Vol > >> >> 80, > >> >> >> > Issue > >> >> >> > >> 8 > >> >> >> > >> >> >>> To: pyt...@li... > >> >> >> > >> >> >>> Message-ID: > >> >> >> > >> >> >>> < > >> >> >> > >> >> >>> > >> >> >> > > >> CAM...@ma... > >> >> >> > >> > > >> >> >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> I can't thank you guys enough for the help. I was > able > >> to > >> >> add > >> >> >> > the > >> >> >> > >> >> >>> __iter__ > >> >> >> > >> >> >>> function to the table.py file and everything seems to > be > >> >> >> working > >> >> >> > >> >> great! > >> >> >> > >> >> >>> I'm not quite as fast as I was with iterating right > of > >> a > >> >> >> matrix > >> >> >> > >> but > >> >> >> > >> >> >>> pretty > >> >> >> > >> >> >>> close. I was at 555 comparisons per second, and now > im > >> at > >> >> >> 420. > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> I handled the problem I mentioned earlier by doing > this, > >> >> and > >> >> >> it > >> >> >> > >> seems > >> >> >> > >> >> to > >> >> >> > >> >> >>> work great: > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> A = f.root.data.cols.A > >> >> >> > >> >> >>> B = f.root.data.cols.B > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> D = np.empty((len(A), len(A)) > >> >> >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, > >> B, > >> >> >> > >> >> range(len(A))), > >> >> >> > >> >> >>> 2): > >> >> >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> Again, thanks a lot. > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> -Dave > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > >> >> >> > >> >> >>> pyt...@li...> wrote: > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Send Pytables-users mailing list submissions to > >> >> >> > >> >> >>> > pyt...@li... > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, > >> visit > >> >> >> > >> >> >>> > > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > or, via email, send a message with subject or body > >> >> 'help' to > >> >> >> > >> >> >>> > > pyt...@li... > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > You can reach the person managing the list at > >> >> >> > >> >> >>> > pyt...@li... > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > When replying, please edit your Subject line so it > is > >> >> more > >> >> >> > >> specific > >> >> >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Today's Topics: > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 > >> (Anthony > >> >> >> > >> Scopatz) > >> >> >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 > >> (Anthony > >> >> >> > >> Scopatz) > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> > >> >> >> > > >> >> > ---------------------------------------------------------------------- > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Message: 1 > >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, > >> Vol > >> >> 80, > >> >> >> > >> Issue 3 > >> >> >> > >> >> >>> > To: Discussion list for PyTables > >> >> >> > >> >> >>> > <pyt...@li...> > >> >> >> > >> >> >>> > Message-ID: > >> >> >> > >> >> >>> > > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > >> >> >> > >> >> >>> > Gz...@ma...> > >> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > >> >> >> > >> dav...@gm...> > >> >> >> > >> >> >>> wrote: > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > Thanks a lot for the help so far guys! > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > Looking at itertools, I found what I believe to be > >> the > >> >> >> > perfect > >> >> >> > >> >> >>> function > >> >> >> > >> >> >>> > > for what I need, itertools.combinations. This > >> appears > >> >> to > >> >> >> be a > >> >> >> > >> >> valid > >> >> >> > >> >> >>> > > replacement to the method proposed. > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Yes, combinations is awesome! > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > There is a small problem that I didn't mention is > >> that > >> >> my > >> >> >> > >> compare > >> >> >> > >> >> >>> > function > >> >> >> > >> >> >>> > > actually takes as inputs 2 columns from the table. > >> Like > >> >> >> so: > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) > >> >> >> > >> >> >>> > > for ii in xrange(N_elements): > >> >> >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): > >> >> >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], > >> >> >> > >> >> >>> > data['element1'][jj],data['element2'][ii], > >> >> >> > >> >> >>> > > data['element2'][jj]) > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > Is there an efficient way of using itertools with > >> this > >> >> >> > >> structure? > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > You can always make two other iterators for each > >> column. > >> >> >> Since > >> >> >> > >> you > >> >> >> > >> >> >>> have > >> >> >> > >> >> >>> > two columns you would have 4 iterators. I am not > sure > >> >> how > >> >> >> fast > >> >> >> > >> >> this is > >> >> >> > >> >> >>> > going to be but I am confident that there is > >> definitely a > >> >> >> way > >> >> >> > to > >> >> >> > >> do > >> >> >> > >> >> >>> this in > >> >> >> > >> >> >>> > one for-loop, which is going to be way faster than > >> nested > >> >> >> > loops. > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Be Well > >> >> >> > >> >> >>> > Anthony > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > >> >> >> > >> >> >>> > > pyt...@li...> > >> wrote: > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > >> Send Pytables-users mailing list submissions to > >> >> >> > >> >> >>> > >> pyt...@li... > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide > Web, > >> >> visit > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > >> or, via email, send a message with subject or > body > >> >> >> 'help' to > >> >> >> > >> >> >>> > >> > >> pyt...@li... > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> You can reach the person managing the list at > >> >> >> > >> >> >>> > >> > pyt...@li... > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> When replying, please edit your Subject line so > it > >> is > >> >> >> more > >> >> >> > >> >> specific > >> >> >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> Today's Topics: > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables > >> >> (Josh > >> >> >> > Ayers) > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> >> >> > >> > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> Message: 1 > >> >> >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> >> >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> > >> >> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of > >> HDF5 > >> >> >> using > >> >> >> > >> >> >>> PyTables > >> >> >> > >> >> >>> > >> To: Discussion list for PyTables > >> >> >> > >> >> >>> > >> <pyt...@li...> > >> >> >> > >> >> >>> > >> Message-ID: > >> >> >> > >> >> >>> > >> < > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> > >> >> >> > CAC...@ma...> > >> >> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> David, > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> The change in issue 27 was only for iteration > over > >> a > >> >> >> > >> >> tables.Column > >> >> >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as > >> follows. > >> >> >> This > >> >> >> > >> will > >> >> >> > >> >> >>> > iterate > >> >> >> > >> >> >>> > >> over the "element" column, as in your original > >> >> example. > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> Note also that this will only work with the > >> >> development > >> >> >> > >> version > >> >> >> > >> >> of > >> >> >> > >> >> >>> > >> PyTables > >> >> >> > >> >> >>> > >> available on github. It will be very slow using > >> the > >> >> >> > released > >> >> >> > >> >> >>> v2.4.0. > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> from itertools import izip > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> with tb.openFile(...) as f: > >> >> >> > >> >> >>> > >> data = f.root.data.cols.element > >> >> >> > >> >> >>> > >> data_i = iter(data) > >> >> >> > >> >> >>> > >> data_j = iter(data) > >> >> >> > >> >> >>> > >> data_i.next() # throw the first value away > >> >> >> > >> >> >>> > >> for i, j in izip(data_i, data_j): > >> >> >> > >> >> >>> > >> compare(i, j) > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> Hope that helps, > >> >> >> > >> >> >>> > >> Josh > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> >> >> > >> >> sc...@gm...> > >> >> >> > >> >> >>> > >> wrote: > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > HI David, > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > Tables and table column iteration have been > >> >> overhauled > >> >> >> > >> fairly > >> >> >> > >> >> >>> recently > >> >> >> > >> >> >>> > >> > [1]. So you might try creating two iterators, > >> >> offset > >> >> >> by > >> >> >> > >> one, > >> >> >> > >> >> and > >> >> >> > >> >> >>> then > >> >> >> > >> >> >>> > >> > doing the comparison. I am hacking this out > >> super > >> >> >> quick > >> >> >> > so > >> >> >> > >> >> please > >> >> >> > >> >> >>> > >> forgive > >> >> >> > >> >> >>> > >> > me: > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > from itertools import izip > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > with tb.openFile(...) as f: > >> >> >> > >> >> >>> > >> > data = f.root.data > >> >> >> > >> >> >>> > >> > data_i = iter(data) > >> >> >> > >> >> >>> > >> > data_j = iter(data) > >> >> >> > >> >> >>> > >> > data_i.next() # throw the first value away > >> >> >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): > >> >> >> > >> >> >>> > >> > compare(i, j) > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > You get the idea ;) > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > Be Well > >> >> >> > >> >> >>> > >> > Anthony > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > 1. > >> https://github.com/PyTables/PyTables/issues/27 > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >> >> > >> >> >>> dav...@gm...> > >> >> >> > >> >> >>> > >> wrote: > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> >> I was hoping someone could help me out here. > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store > in > >> >> HDF5 > >> >> >> and > >> >> >> > >> >> access > >> >> >> > >> >> >>> > using > >> >> >> > >> >> >>> > >> >> PyTables. One operation I need to do on this > >> >> dataset > >> >> >> are > >> >> >> > >> >> pairwise > >> >> >> > >> >> >>> > >> >> comparisons between each of the elements. This > >> >> >> requires 2 > >> >> >> > >> >> loops, > >> >> >> > >> >> >>> one > >> >> >> > >> >> >>> > to > >> >> >> > >> >> >>> > >> >> iterate over each element, and an inner loop > to > >> >> >> iterate > >> >> >> > >> over > >> >> >> > >> >> >>> every > >> >> >> > >> >> >>> > >> other > >> >> >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 > >> >> >> > comparisons. > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster > to > >> >> dump > >> >> >> the > >> >> >> > >> >> >>> contents > >> >> >> > >> >> >>> > >> into a > >> >> >> > >> >> >>> > >> >> multdimensional numpy array and then do my > >> >> iteration. > >> >> >> I > >> >> >> > run > >> >> >> > >> >> into > >> >> >> > >> >> >>> > >> problems > >> >> >> > >> >> >>> > >> >> with large sets because of memory issues and > >> need > >> >> to > >> >> >> > access > >> >> >> > >> >> each > >> >> >> > >> >> >>> > >> element of > >> >> >> > >> >> >>> > >> >> the dataset at run time. > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> Putting the elements into an array gives me > >> about > >> >> 600 > >> >> >> > >> >> >>> comparisons per > >> >> >> > >> >> >>> > >> >> second, while operating on hdf5 data itself > >> gives > >> >> me > >> >> >> > about > >> >> >> > >> 300 > >> >> >> > >> >> >>> > >> comparisons > >> >> >> > >> >> >>> > >> >> per second. > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> Is there a way to speed this process up? > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> Example follows (this is not my real code, > just > >> an > >> >> >> > >> example): > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> *Small Set*: > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> >> > >> >> >>> > >> >> data = f.root.data > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): > >> >> >> > >> >> >>> > >> >> elements[ii] = data['element'] > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > >> >> >> > >> >> xrange(N_elements): > >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], > >> >> >> elements[jj]) > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> *Large Set*: > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> >> > >> >> >>> > >> >> data = f.root.data > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) > >> >> >> > >> >> >>> > >> >> for ii in xrange(N_elements): > >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> >> > >> >> >>> > >> >> D[ii, jj] = > >> >> compare(data['element'][ii], > >> >> >> > >> >> >>> > >> data['element'][jj]) > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, > ASP.NET, > >> C# > >> >> >> 2012, > >> >> >> > >> >> HTML5, > >> >> >> > >> >> >>> CSS, > >> >> >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. > >> Keep > >> >> >> your > >> >> >> > >> >> skills > >> >> >> > >> >> >>> > current > >> >> >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video > >> >> tutorials > >> >> >> by > >> >> >> > >> >> >>> Microsoft > >> >> >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- > >> learn > >> >> >> more > >> >> >> > at: > >> >> >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > >> >> >> > >> >> >>> > >> >> > _______________________________________________ > >> >> >> > >> >> >>> > >> >> Pytables-users mailing list > >> >> >> > >> >> >>> > >> >> Pyt...@li... > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> >> > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET > , > >> C# > >> >> >> 2012, > >> >> >> > >> >> HTML5, > >> >> >> > >> >> >>> CSS, > >> >> >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. > >> Keep > >> >> >> your > >> >> >> > >> skills > >> >> >> > >> >> >>> > current > >> >> >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video > >> >> tutorials > >> >> >> by > >> >> >> > >> >> Microsoft > >> >> >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- > >> learn > >> >> more > >> >> >> > at: > >> >> >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > >> >> >> > >> >> >>> > >> > _______________________________________________ > >> >> >> > >> >> >>> > >> > Pytables-users mailing list > >> >> >> > >> >> >>> > >> > Pyt...@li... > >> >> >> > >> >> >>> > >> > > >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> > > >> >> >> > >> >> >>> > >> -------------- next part -------------- > >> >> >> > >> >> >>> > >> An HTML attachment was scrubbed... > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> ------------------------------ > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> >> 2012, > >> >> >> > >> HTML5, > >> >> >> > >> >> >>> CSS, > >> >> >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> >> your > >> >> >> > >> skills > >> >> >> > >> >> >>> current > >> >> >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> >> by > >> >> >> > >> >> Microsoft > >> >> >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- > learn > >> >> more > >> >> >> at: > >> >> >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> ------------------------------ > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> _______________________________________________ > >> >> >> > >> >> >>> > >> Pytables-users mailing list > >> >> >> > >> >> >>> > >> Pyt...@li... > >> >> >> > >> >> >>> > >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> >> >> > >> >> >>> > >> ********************************************* > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > ------------------------------------------------------------------------------ > >> >> >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> >> 2012, > >> >> >> > >> HTML5, > >> >> >> > >> >> CSS, > >> >> >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> >> your > >> >> >> > skills > >> >> >> > >> >> >>> current > >> >> >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> >> by > >> >> >> > >> Microsoft > >> >> >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn > >> more > >> >> >> at: > >> >> >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >> >> > >> >> >>> > > _______________________________________________ > >> >> >> > >> >> >>> > > Pytables-users mailing list > >> >> >> > >> >> >>> > > Pyt...@li... > >> >> >> > >> >> >>> > > > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > -------------- next part -------------- > >> >> >> > >> >> >>> > An HTML attachment was scrubbed... > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > ------------------------------ > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Message: 2 > >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, > >> Vol > >> >> 80, > >> >> >> > >> Issue 4 > >> >> >> > >> >> >>> > To: Discussion list for PyTables > >> >> >> > >> >> >>> > <pyt...@li...> > >> >> >> > >> >> >>> > Message-ID: > >> >> >> > >> >> >>> > < > >> >> >> > >> >> >>> > > >> >> >> > >> > >> >> CAP...@ma...> > >> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Josh is right that you can just edit the code by > hand > >> >> (which > >> >> >> > >> works > >> >> >> > >> >> but > >> >> >> > >> >> >>> > sucks). > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > However, on Windows -- on the rare occasion when I > >> also > >> >> >> have to > >> >> >> > >> >> >>> develop on > >> >> >> > >> >> >>> > it -- I typically use a distribution that includes a > >> >> >> compiler, > >> >> >> > >> >> cython, > >> >> >> > >> >> >>> > hdf5, and pytables already and then I install my > >> >> development > >> >> >> > >> version > >> >> >> > >> >> >>> from > >> >> >> > >> >> >>> > github OVER this. I recommend either EPD or > Anaconda, > >> >> >> though > >> >> >> > >> other > >> >> >> > >> >> >>> > distributions listed here [1] might also work. > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > Be well > >> >> >> > >> >> >>> > Anthony > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > 1. > >> >> http://numfocus.org/projects-2/software-distributions/ > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < > >> >> >> > jos...@gm... > >> >> >> > >> > > >> >> >> > >> >> >>> wrote: > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > The change was in pure Python code, so you should > be > >> >> able > >> >> >> to > >> >> >> > >> just > >> >> >> > >> >> >>> paste > >> >> >> > >> >> >>> > in > >> >> >> > >> >> >>> > > the changes to your local copy. Start with the > >> >> >> > >> >> table.Column.__iter__ > >> >> >> > >> >> >>> > > method (lines 3296-3310) here. > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > >> >> >> > >> >> > >> > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > It needs to be modified slightly because it uses > >> some > >> >> >> > >> additional > >> >> >> > >> >> >>> features > >> >> >> > >> >> >>> > > that aren't available in the released version (the > >> >> >> > >> out=buf_slice > >> >> >> > >> >> >>> argument > >> >> >> > >> >> >>> > > to table.read). The following should work. > >> >> >> > >> >> >>> > > > >> >> >> > >> >> >>> > > def __iter__(self): > >> >> >> > >> >> >>> > > table = self.table > >> >> >> > >> >> >>> > > itemsize = self.dtype.itemsize > >> >> >> > >> >> >>> > > nrowsinbuf = > >> >> >> table._v_file.params['IO_BUFFER_SIZE'] > >> >> >> > // > >> >> >> > >> >> >>> itemsize > >> >> >> > >> >> >>> > > max_row = len(self) > >> >> >> > >> >> >>> > > for start_row in xrange(0, len(self), > >> >> nrow... [truncated message content] |
From: Anthony S. <sc...@gm...> - 2013-02-27 20:23:44
|
I think that the checksum is on the compressed data... On Wed, Feb 27, 2013 at 2:16 PM, Frédéric Bastien <no...@no...> wrote: > Hi, > > we just got some problem with our file server and this bring me > question on how to detect corrupted files. > > There is a way to specify a filter when creating a table that add a > checksum[1]. > > My questions is, when a file is created with checksum, are they always > verified when the chunks are uncompressed? Can we specify when we open > the file if we want to check it or not? The examples I found only talk > about it when we create the file. > > thanks > > Frédéric Bastien > > > [1] http://pytables.github.com/usersguide/libref/helper_classes.html > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Frédéric B. <no...@no...> - 2013-02-27 20:18:35
|
Hi, we just got some problem with our file server and this bring me question on how to detect corrupted files. There is a way to specify a filter when creating a table that add a checksum[1]. My questions is, when a file is created with checksum, are they always verified when the chunks are uncompressed? Can we specify when we open the file if we want to check it or not? The examples I found only talk about it when we create the file. thanks Frédéric Bastien [1] http://pytables.github.com/usersguide/libref/helper_classes.html |
From: Anthony S. <sc...@gm...> - 2013-02-27 20:06:49
|
Hi David, Sorry about the delay. I have mostly forgotten what exactly this issue was. I am pretty swamped this week so I could throw out some WAGs but I don't think I'll be able to do any real work myself on it. Be Well Anthony On Mon, Feb 25, 2013 at 2:15 PM, David Reed <dav...@gm...> wrote: > Anthony, > > I've had a chance recently to revisit this problem and am not getting > anywhere. I was hoping I might be able to get more support in getting this > working. If you have some ideas, through them out and I can do the leg > work and see what I can come up with. > > -David > > > On Mon, Feb 4, 2013 at 3:44 PM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 9 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 4 Feb 2013 14:43:37 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 9 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hey David, >> >> I am getting the following error now: >> >> scopatz@ares ~ $ python t.py >> 10669890 Comparisons >> Traceback (most recent call last): >> File "t.py", line 61, in <module> >> get_hd() >> File "t.py", line 54, in get_hd >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, >> range(N_irises)), 2): >> File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> line 3308, in __iter__ >> out=buf_slice) >> File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> line 1807, in read >> arr = self._read(start, stop, step, field, out) >> File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", >> line 1732, in _read >> bytes_required)) >> ValueError: output array size invalid, got 4620 bytes, need 753984000 >> bytes >> >> And I had to change the phasors line to ths following: >> >> r['phasors'] = np.empty((17, 20*240), complex) >> >> Thanks. >> Be Well >> Anthony >> >> >> >> On Mon, Feb 4, 2013 at 1:41 PM, David Reed <dav...@gm...> >> wrote: >> >> > I didn't have any luck. I replaced that __iter__ function which led to >> me >> > replacing the read function which lead to me replaceing the _read >> function >> > and I eventually got another error. >> > >> > Below are 2 functions and my HDF5 Table class declaration. They should >> be >> > self explanatory. I wasn't sure if attachments would go through and >> this >> > is pretty small, so I figured it would be ok just to post. I apologize >> if >> > this is a bit cluttered. I would also appreciate any comments on how I >> > assign the results to the matrix D, this does not seem very pythonic at >> all >> > and could use some advice there if its easy. (the ii*jj is just a place >> > holder for a more sophisticated measure). Thanks again! >> > >> > import numpy as np >> > import tables as tb >> > >> > class Iris(tb.IsDescription): >> > subject_id = tb.IntCol() >> > iris_id = tb.IntCol() >> > database = tb.StringCol(5) >> > is_left = tb.BoolCol() >> > is_flipped = tb.BoolCol() >> > templates = tb.BoolCol(shape=(17, 20*480)) >> > masks1 = tb.BoolCol(shape=(17, 20*480)) >> > phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) >> > masks2 = tb.BoolCol(shape=(17, 20*240)) >> > >> > >> > def create_hdf5(): >> > """ >> > """ >> > with tb.openFile('test.h5', 'w') as f: >> > >> > # Create and fill the table of irises", >> > irises = f.createTable(f.root, 'irises', Iris, 'Irises', >> > filters=tb.Filters(1)) >> > for ii in range(4620): >> > >> > r = irises.row >> > r['subject_id'] = ii >> > r['iris_id'] = 0 >> > r['database'] = 'test' >> > r['is_left'] = True >> > r['is_flipped'] = False >> > r['templates'] = np.empty((17, 20*480), np.bool8) >> > r['masks1'] = np.empty((17, 20*480), np.bool8) >> > r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) >> > r['masks2'] = np.empty((17, 20*240), np.bool8) >> > r.append() >> > >> > irises.flush() >> > >> > def get_hd(): >> > """ >> > """ >> > from itertools import combinations, izip >> > with tb.openFile('test.h5') as f: >> > irises = f.root.irises >> > >> > templates = f.root.irises.cols.templates >> > masks = f.root.irises.cols.masks1 >> > >> > N_irises = len(irises) >> > >> > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> > D = np.empty((N_irises, N_irises)) >> > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, >> > range(N_irises)), 2): >> > D[ii, jj] = ii*jj >> > >> > np.save('test', D) >> > >> > >> > >> > >> > On Mon, Feb 4, 2013 at 11:16 AM, < >> > pyt...@li...> wrote: >> > >> >> Send Pytables-users mailing list submissions to >> >> pyt...@li... >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> or, via email, send a message with subject or body 'help' to >> >> pyt...@li... >> >> >> >> You can reach the person managing the list at >> >> pyt...@li... >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Mon, 4 Feb 2013 10:16:24 -0600 >> >> From: Anthony Scopatz <sc...@gm...> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 >> >> To: Discussion list for PyTables >> >> <pyt...@li...> >> >> Message-ID: >> >> < >> >> CAP...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> >> >> wrote: >> >> >> >> > Hi Josh, >> >> > >> >> > Here is my __iter__ code: >> >> > >> >> > def __iter__(self): >> >> > table = self.table >> >> > itemsize = self.dtype.itemsize >> >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // >> itemsize >> >> > max_row = len(self) >> >> > for start_row in xrange(0, len(self), nrowsinbuf): >> >> > end_row = min([start_row + nrowsinbuf, max_row]) >> >> > buf = table.read(start_row, end_row, 1, >> field=self.pathname) >> >> > for row in buf: >> >> > yield row >> >> > >> >> > It does look different, I will try swapping in the code from github >> and >> >> > see what happens. >> >> > >> >> >> >> Yes, please let us know how that goes! Otherwise send the list both >> the >> >> test data generator script and the script that fails. >> >> >> >> Be Well >> >> Anthony >> >> >> >> >> >> > >> >> > >> >> > On Mon, Feb 4, 2013 at 9:59 AM, < >> >> > pyt...@li...> wrote: >> >> > >> >> >> Send Pytables-users mailing list submissions to >> >> >> pyt...@li... >> >> >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> or, via email, send a message with subject or body 'help' to >> >> >> pyt...@li... >> >> >> >> >> >> You can reach the person managing the list at >> >> >> pyt...@li... >> >> >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> >> >> >> Today's Topics: >> >> >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) >> >> >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) >> >> >> >> >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> >> >> Message: 1 >> >> >> Date: Fri, 1 Feb 2013 14:08:47 -0800 >> >> >> From: Josh Ayers <jos...@gm...> >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 >> >> >> To: Discussion list for PyTables >> >> >> <pyt...@li...> >> >> >> Message-ID: >> >> >> <CACOB4aPG4NZ6b2a3v= >> >> >> 1Ue...@ma...> >> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >> >> David, >> >> >> >> >> >> You added a custom version of table.Column.__iter__, correct? Could >> >> you >> >> >> also include that along with the script to reproduce the error? >> >> >> >> >> >> It seems like the problem may be in the 'nrowsinbuf' calculation - >> see >> >> >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're >> using >> >> the >> >> >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows >> of 6 >> >> >> chunks. Instead, it's reading the entire table. >> >> >> >> >> >> [1]: >> >> >> >> >> >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> >> >> >> wrote: >> >> >> >> >> >> > >> >> >> > >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < >> dav...@gm...> >> >> >> wrote: >> >> >> > >> >> >> >> at the error: >> >> >> >> >> >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> >> >> >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> >> >> >> >> >> I'm not sure what that means as a dtype, but thats what it is. >> >> >> >> >> >> >> >> Forgive me if I'm being totally naive, but I thought the whole >> >> point of >> >> >> >> __iter__ with pyttables was to do iteration on the fly, so there >> is >> >> no >> >> >> >> preallocation. >> >> >> >> >> >> >> > >> >> >> > Nope you are not being naive at all. That is the point. >> >> >> > >> >> >> > >> >> >> >> If you have any ideas on this I'm all ears. >> >> >> >> >> >> >> > >> >> >> > If you could send a minimal script which reproduces this error, >> that >> >> >> would >> >> >> > help a lot. >> >> >> > >> >> >> > Be Well >> >> >> > Anthony >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> Thanks again. >> >> >> >> >> >> >> >> Dave >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < >> >> >> >> pyt...@li...> wrote: >> >> >> >> >> >> >> >>> Send Pytables-users mailing list submissions to >> >> >> >>> pyt...@li... >> >> >> >>> >> >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> >>> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >>> or, via email, send a message with subject or body 'help' to >> >> >> >>> pyt...@li... >> >> >> >>> >> >> >> >>> You can reach the person managing the list at >> >> >> >>> pyt...@li... >> >> >> >>> >> >> >> >>> When replying, please edit your Subject line so it is more >> specific >> >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >> >>> >> >> >> >>> >> >> >> >>> Today's Topics: >> >> >> >>> >> >> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony >> Scopatz) >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> ---------------------------------------------------------------------- >> >> >> >>> >> >> >> >>> Message: 1 >> >> >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >> >> >>> From: Anthony Scopatz <sc...@gm...> >> >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> Issue >> >> 2 >> >> >> >>> To: Discussion list for PyTables >> >> >> >>> <pyt...@li...> >> >> >> >>> Message-ID: >> >> >> >>> < >> >> >> >>> >> CAP...@ma... >> >> > >> >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >>> >> >> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < >> >> dav...@gm...> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>> > Hi Anthony, >> >> >> >>> > >> >> >> >>> > Thanks for the reply. >> >> >> >>> > >> >> >> >>> > I honestly don't know how to monitor my Python memory usage, >> but >> >> I'm >> >> >> >>> sure >> >> >> >>> > that its caused by out of memory. >> >> >> >>> > >> >> >> >>> >> >> >> >>> Well, I would just run top or process monitor or something while >> >> >> running >> >> >> >>> the python script to see what happens to memory usage as the >> script >> >> >> chugs >> >> >> >>> along... >> >> >> >>> >> >> >> >>> >> >> >> >>> > I'm just trying to find out how to fix it. My HDF5 table has >> >> 4620 >> >> >> >>> rows >> >> >> >>> > and the column I'm iterating over is a 17x9600 boolean matrix. >> >> The >> >> >> >>> > __iter__ method is preallocating an array that is this size >> which >> >> >> >>> appears >> >> >> >>> > to be root of the error. I was hoping there is a fix >> somewhere >> >> in >> >> >> >>> here to >> >> >> >>> > not have to do this preallocation. >> >> >> >>> > >> >> >> >>> >> >> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. >> >> 4620 of >> >> >> >>> these is ~760 MB. If you have 2 GB of memory and you are >> iterating >> >> >> over >> >> >> >>> 2 >> >> >> >>> of these (templates & masks) it is conceivable that you are just >> >> >> running >> >> >> >>> out of memory. Maybe there is a way that __iter__ could not >> >> >> preallocate >> >> >> >>> something that is basically a temporary. What is the dtype of >> the >> >> >> >>> templates array? >> >> >> >>> >> >> >> >>> Be Well >> >> >> >>> Anthony >> >> >> >>> >> >> >> >>> >> >> >> >>> > >> >> >> >>> > Thanks again. >> >> >> >>> >> >> >> >>> >> >> >> -------------- next part -------------- >> >> >> An HTML attachment was scrubbed... >> >> >> >> >> >> ------------------------------ >> >> >> >> >> >> Message: 2 >> >> >> Date: Mon, 4 Feb 2013 09:58:53 -0500 >> >> >> From: David Reed <dav...@gm...> >> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 >> >> >> To: pyt...@li... >> >> >> Message-ID: >> >> >> <CAM6XA7= >> >> >> h50...@ma...> >> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> >> >> Hi Anthony, >> >> >> >> >> >> Sorry to just get back to you. I can send a script, should I send a >> >> script >> >> >> that creates some fake data as well? >> >> >> >> >> >> -Dave >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 4:50 PM, < >> >> >> pyt...@li...> wrote: >> >> >> >> >> >> > Send Pytables-users mailing list submissions to >> >> >> > pyt...@li... >> >> >> > >> >> >> > To subscribe or unsubscribe via the World Wide Web, visit >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > or, via email, send a message with subject or body 'help' to >> >> >> > pyt...@li... >> >> >> > >> >> >> > You can reach the person managing the list at >> >> >> > pyt...@li... >> >> >> > >> >> >> > When replying, please edit your Subject line so it is more >> specific >> >> >> > than "Re: Contents of Pytables-users digest..." >> >> >> > >> >> >> > >> >> >> > Today's Topics: >> >> >> > >> >> >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) >> >> >> > >> >> >> > >> >> >> > >> >> ---------------------------------------------------------------------- >> >> >> > >> >> >> > Message: 1 >> >> >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 >> >> >> > From: Anthony Scopatz <sc...@gm...> >> >> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> Issue 4 >> >> >> > To: Discussion list for PyTables >> >> >> > <pyt...@li...> >> >> >> > Message-ID: >> >> >> > < >> >> >> > >> CAP...@ma...> >> >> >> > Content-Type: text/plain; charset="iso-8859-1" >> >> >> > >> >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed < >> dav...@gm...> >> >> >> wrote: >> >> >> > >> >> >> > > at the error: >> >> >> > > >> >> >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> > > >> >> >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> > > >> >> >> > > I'm not sure what that means as a dtype, but thats what it is. >> >> >> > > >> >> >> > > Forgive me if I'm being totally naive, but I thought the whole >> >> point >> >> >> of >> >> >> > > __iter__ with pyttables was to do iteration on the fly, so there >> >> is no >> >> >> > > preallocation. >> >> >> > > >> >> >> > >> >> >> > Nope you are not being naive at all. That is the point. >> >> >> > >> >> >> > >> >> >> > > If you have any ideas on this I'm all ears. >> >> >> > > >> >> >> > >> >> >> > If you could send a minimal script which reproduces this error, >> that >> >> >> would >> >> >> > help a lot. >> >> >> > >> >> >> > Be Well >> >> >> > Anthony >> >> >> > >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > > Thanks again. >> >> >> > > >> >> >> > > Dave >> >> >> > > >> >> >> > > >> >> >> > > On Fri, Feb 1, 2013 at 3:45 PM, < >> >> >> > > pyt...@li...> wrote: >> >> >> > > >> >> >> > >> Send Pytables-users mailing list submissions to >> >> >> > >> pyt...@li... >> >> >> > >> >> >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> or, via email, send a message with subject or body 'help' to >> >> >> > >> pyt...@li... >> >> >> > >> >> >> >> > >> You can reach the person managing the list at >> >> >> > >> pyt...@li... >> >> >> > >> >> >> >> > >> When replying, please edit your Subject line so it is more >> >> specific >> >> >> > >> than "Re: Contents of Pytables-users digest..." >> >> >> > >> >> >> >> > >> >> >> >> > >> Today's Topics: >> >> >> > >> >> >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony >> Scopatz) >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> > >> >> >> >> > >> Message: 1 >> >> >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >> >> > >> From: Anthony Scopatz <sc...@gm...> >> >> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> >> Issue 2 >> >> >> > >> To: Discussion list for PyTables >> >> >> > >> <pyt...@li...> >> >> >> > >> Message-ID: >> >> >> > >> < >> >> >> > >> >> >> CAP...@ma...> >> >> >> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> > >> >> >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < >> >> dav...@gm...> >> >> >> > >> wrote: >> >> >> > >> >> >> >> > >> > Hi Anthony, >> >> >> > >> > >> >> >> > >> > Thanks for the reply. >> >> >> > >> > >> >> >> > >> > I honestly don't know how to monitor my Python memory usage, >> but >> >> >> I'm >> >> >> > >> sure >> >> >> > >> > that its caused by out of memory. >> >> >> > >> > >> >> >> > >> >> >> >> > >> Well, I would just run top or process monitor or something >> while >> >> >> running >> >> >> > >> the python script to see what happens to memory usage as the >> >> script >> >> >> > chugs >> >> >> > >> along... >> >> >> > >> >> >> >> > >> >> >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table >> has >> >> 4620 >> >> >> > rows >> >> >> > >> > and the column I'm iterating over is a 17x9600 boolean >> matrix. >> >> The >> >> >> > >> > __iter__ method is preallocating an array that is this size >> >> which >> >> >> > >> appears >> >> >> > >> > to be root of the error. I was hoping there is a fix >> somewhere >> >> in >> >> >> > here >> >> >> > >> to >> >> >> > >> > not have to do this preallocation. >> >> >> > >> > >> >> >> > >> >> >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. >> >> 4620 >> >> >> of >> >> >> > >> these is ~760 MB. If you have 2 GB of memory and you are >> >> iterating >> >> >> > over 2 >> >> >> > >> of these (templates & masks) it is conceivable that you are >> just >> >> >> running >> >> >> > >> out of memory. Maybe there is a way that __iter__ could not >> >> >> preallocate >> >> >> > >> something that is basically a temporary. What is the dtype of >> the >> >> >> > >> templates array? >> >> >> > >> >> >> >> > >> Be Well >> >> >> > >> Anthony >> >> >> > >> >> >> >> > >> >> >> >> > >> > >> >> >> > >> > Thanks again. >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < >> >> >> > >> > pyt...@li...> wrote: >> >> >> > >> > >> >> >> > >> >> Send Pytables-users mailing list submissions to >> >> >> > >> >> pyt...@li... >> >> >> > >> >> >> >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> >> > >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> or, via email, send a message with subject or body 'help' to >> >> >> > >> >> pyt...@li... >> >> >> > >> >> >> >> >> > >> >> You can reach the person managing the list at >> >> >> > >> >> pyt...@li... >> >> >> > >> >> >> >> >> > >> >> When replying, please edit your Subject line so it is more >> >> >> specific >> >> >> > >> >> than "Re: Contents of Pytables-users digest..." >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> Today's Topics: >> >> >> > >> >> >> >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony >> >> Scopatz) >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> ---------------------------------------------------------------------- >> >> >> > >> >> >> >> >> > >> >> Message: 1 >> >> >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> >> >> > >> >> From: Anthony Scopatz <sc...@gm...> >> >> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> >> >> Issue 9 >> >> >> > >> >> To: Discussion list for PyTables >> >> >> > >> >> <pyt...@li...> >> >> >> > >> >> Message-ID: >> >> >> > >> >> < >> >> >> > >> >> >> >> >> CAP...@ma...> >> >> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> > >> >> >> >> >> > >> >> Hi David, >> >> >> > >> >> >> >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to be >> >> >> getting >> >> >> > a >> >> >> > >> >> memory error on creating a numpy array. This kind of thing >> >> >> typically >> >> >> > >> >> happens when you are out of memory. Does this seem to be >> the >> >> case >> >> >> > with >> >> >> > >> >> you? When this dies, is your memory usage at 100%? If so, >> >> this >> >> >> > >> algorithm >> >> >> > >> >> might require a little tweaking... >> >> >> > >> >> >> >> >> > >> >> Be Well >> >> >> > >> >> Anthony >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < >> >> >> dav...@gm...> >> >> >> > >> >> wrote: >> >> >> > >> >> >> >> >> > >> >> > I'm still having problems with this one. I can't tell if >> >> this >> >> >> > >> something >> >> >> > >> >> > dumb Im doing with itertools, or if its something in >> >> pytables. >> >> >> > >> >> > >> >> >> > >> >> > Would appreciate any help. >> >> >> > >> >> > >> >> >> > >> >> > Thanks >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < >> >> >> > dav...@gm... >> >> >> > >> >> >wrote: >> >> >> > >> >> > >> >> >> > >> >> >> I think I have to reopen this issue. I have been running >> >> fine >> >> >> for >> >> >> > >> >> awhile >> >> >> > >> >> >> using the combinations method from itertools, but have >> >> recently >> >> >> > run >> >> >> > >> >> into a >> >> >> > >> >> >> memory since I have recently quadrupled the size of the >> hdf >> >> >> file. >> >> >> > >> >> >> >> >> >> > >> >> >> Here is my code again: >> >> >> > >> >> >> >> >> >> > >> >> >> from itertools import combinations, izip >> >> >> > >> >> >> with tb.openFile(h5_all, 'r') as f: >> >> >> > >> >> >> irises = f.root.irises >> >> >> > >> >> >> >> >> >> > >> >> >> templates = f.root.irises.cols.templates >> >> >> > >> >> >> masks = f.root.irises.cols.masks1 >> >> >> > >> >> >> >> >> >> > >> >> >> N_irises = len(irises) >> >> >> > >> >> >> index = np.ones((20 * 480), np.bool) >> >> >> > >> >> >> >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> >> > >> >> >> D = np.empty((N_irises, N_irises)) >> >> >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in >> >> combinations(izip(templates, >> >> >> > >> masks, >> >> >> > >> >> >> range(N_irises)), 2): >> >> >> > >> >> >> # print ii >> >> >> > >> >> >> D[ii, jj] = ham_dist( >> >> >> > >> >> >> t1[8, index], >> >> >> > >> >> >> t2[:, index], >> >> >> > >> >> >> m1[8, index], >> >> >> > >> >> >> m2[:, index], >> >> >> > >> >> >> ) >> >> >> > >> >> >> >> >> >> > >> >> >> And here is the error: >> >> >> > >> >> >> >> >> >> > >> >> >> In [10]: get_hd3() >> >> >> > >> >> >> 10669890 Comparisons >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> --------------------------------------------------------------------------- >> >> >> > >> >> >> MemoryError Traceback (most >> >> >> recent >> >> >> > >> call >> >> >> > >> >> >> last) >> >> >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> >> >> > >> >> >> ----> 1 get_hd3() >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> 118 print '%i Comparisons' % >> >> >> > >> (N_irises*(N_irises - >> >> >> > >> >> >> 1)/2) >> >> >> > >> >> >> 119 D = np.empty((N_irises, >> N_irises)) >> >> >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in >> >> >> > >> >> >> combinations(izip(temp >> >> >> > >> >> >> lates, masks, range(N_irises)), 2): >> >> >> > >> >> >> 121 # print ii >> >> >> > >> >> >> 122 D[ii, jj] = ham_dist( >> >> >> > >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> >> __iter__(self) >> >> >> > >> >> >> 3274 for start_row in xrange(0, len(self), >> >> >> nrowsinbuf): >> >> >> > >> >> >> 3275 end_row = min([start_row + >> nrowsinbuf, >> >> >> > max_row]) >> >> >> > >> >> >> -> 3276 buf = table.read(start_row, end_row, >> 1, >> >> >> > >> >> >> field=self.pathname) >> >> >> > >> >> >> >> >> >> > >> >> >> 3277 for row in buf: >> >> >> > >> >> >> 3278 yield row >> >> >> > >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> read(self, >> >> >> > start, >> >> >> > >> >> stop, >> >> >> > >> >> >> step, >> >> >> > >> >> >> field) >> >> >> > >> >> >> 1772 (start, stop, step) = >> >> >> > self._processRangeRead(start, >> >> >> > >> >> stop, >> >> >> > >> >> >> step) >> >> >> > >> >> >> 1773 >> >> >> > >> >> >> -> 1774 arr = self._read(start, stop, step, >> field) >> >> >> > >> >> >> 1775 return internal_to_flavor(arr, >> self.flavor) >> >> >> > >> >> >> 1776 >> >> >> > >> >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> _read(self, >> >> >> > start, >> >> >> > >> >> >> stop, step, >> >> >> > >> >> >> field) >> >> >> > >> >> >> 1719 if field: >> >> >> > >> >> >> 1720 # Create a container for the results >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> >> >> > >> dtype=dtypeField) >> >> >> > >> >> >> 1722 else: >> >> >> > >> >> >> 1723 # Recarray case >> >> >> > >> >> >> >> >> >> > >> >> >> MemoryError: >> >> >> > >> >> >> > >> c:\python27\lib\site-packages\tables\table.py(1721)_read() >> >> >> > >> >> >> 1720 # Create a container for the results >> >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> >> >> > >> dtype=dtypeField) >> >> >> > >> >> >> 1722 else: >> >> >> > >> >> >> >> >> >> > >> >> >> Also, if you guys see any performance problems in my >> code, >> >> >> please >> >> >> > >> let >> >> >> > >> >> me >> >> >> > >> >> >> know. >> >> >> > >> >> >> >> >> >> > >> >> >> Thank you so much for the help. >> >> >> > >> >> >> >> >> >> > >> >> >> -Dave >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> >> >> > >> >> >> pyt...@li...> wrote: >> >> >> > >> >> >> >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to >> >> >> > >> >> >>> pyt...@li... >> >> >> > >> >> >>> >> >> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, >> visit >> >> >> > >> >> >>> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> >>> or, via email, send a message with subject or body >> 'help' >> >> to >> >> >> > >> >> >>> pyt...@li... >> >> >> > >> >> >>> >> >> >> > >> >> >>> You can reach the person managing the list at >> >> >> > >> >> >>> pyt...@li... >> >> >> > >> >> >>> >> >> >> > >> >> >>> When replying, please edit your Subject line so it is >> more >> >> >> > specific >> >> >> > >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >> > >> >> >>> >> >> >> > >> >> >>> >> >> >> > >> >> >>> Today's Topics: >> >> >> > >> >> >>> >> >> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David >> >> Reed) >> >> >> > >> >> >>> >> >> >> > >> >> >>> >> >> >> > >> >> >>> >> >> >> > >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> > >> >> >>> >> >> >> > >> >> >>> Message: 1 >> >> >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> >> >> > >> >> >>> From: David Reed <dav...@gm...> >> >> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> >> 80, >> >> >> > Issue >> >> >> > >> 8 >> >> >> > >> >> >>> To: pyt...@li... >> >> >> > >> >> >>> Message-ID: >> >> >> > >> >> >>> < >> >> >> > >> >> >>> >> >> >> > >> CAM...@ma... >> >> >> > >> > >> >> >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >> > >> >> >>> >> >> >> > >> >> >>> I can't thank you guys enough for the help. I was able >> to >> >> add >> >> >> > the >> >> >> > >> >> >>> __iter__ >> >> >> > >> >> >>> function to the table.py file and everything seems to be >> >> >> working >> >> >> > >> >> great! >> >> >> > >> >> >>> I'm not quite as fast as I was with iterating right of >> a >> >> >> matrix >> >> >> > >> but >> >> >> > >> >> >>> pretty >> >> >> > >> >> >>> close. I was at 555 comparisons per second, and now im >> at >> >> >> 420. >> >> >> > >> >> >>> >> >> >> > >> >> >>> I handled the problem I mentioned earlier by doing this, >> >> and >> >> >> it >> >> >> > >> seems >> >> >> > >> >> to >> >> >> > >> >> >>> work great: >> >> >> > >> >> >>> >> >> >> > >> >> >>> A = f.root.data.cols.A >> >> >> > >> >> >>> B = f.root.data.cols.B >> >> >> > >> >> >>> >> >> >> > >> >> >>> D = np.empty((len(A), len(A)) >> >> >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, >> B, >> >> >> > >> >> range(len(A))), >> >> >> > >> >> >>> 2): >> >> >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> >> >> > >> >> >>> >> >> >> > >> >> >>> Again, thanks a lot. >> >> >> > >> >> >>> >> >> >> > >> >> >>> -Dave >> >> >> > >> >> >>> >> >> >> > >> >> >>> >> >> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> >> >> > >> >> >>> pyt...@li...> wrote: >> >> >> > >> >> >>> >> >> >> > >> >> >>> > Send Pytables-users mailing list submissions to >> >> >> > >> >> >>> > pyt...@li... >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, >> visit >> >> >> > >> >> >>> > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> >>> > or, via email, send a message with subject or body >> >> 'help' to >> >> >> > >> >> >>> > pyt...@li... >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > You can reach the person managing the list at >> >> >> > >> >> >>> > pyt...@li... >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > When replying, please edit your Subject line so it is >> >> more >> >> >> > >> specific >> >> >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Today's Topics: >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 >> (Anthony >> >> >> > >> Scopatz) >> >> >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 >> (Anthony >> >> >> > >> Scopatz) >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >> >> >> > >> >> ---------------------------------------------------------------------- >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Message: 1 >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, >> Vol >> >> 80, >> >> >> > >> Issue 3 >> >> >> > >> >> >>> > To: Discussion list for PyTables >> >> >> > >> >> >>> > <pyt...@li...> >> >> >> > >> >> >>> > Message-ID: >> >> >> > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> >> >> > >> >> >>> > Gz...@ma...> >> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < >> >> >> > >> dav...@gm...> >> >> >> > >> >> >>> wrote: >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > > Thanks a lot for the help so far guys! >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > Looking at itertools, I found what I believe to be >> the >> >> >> > perfect >> >> >> > >> >> >>> function >> >> >> > >> >> >>> > > for what I need, itertools.combinations. This >> appears >> >> to >> >> >> be a >> >> >> > >> >> valid >> >> >> > >> >> >>> > > replacement to the method proposed. >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Yes, combinations is awesome! >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > There is a small problem that I didn't mention is >> that >> >> my >> >> >> > >> compare >> >> >> > >> >> >>> > function >> >> >> > >> >> >>> > > actually takes as inputs 2 columns from the table. >> Like >> >> >> so: >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) >> >> >> > >> >> >>> > > for ii in xrange(N_elements): >> >> >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): >> >> >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], >> >> >> > >> >> >>> > data['element1'][jj],data['element2'][ii], >> >> >> > >> >> >>> > > data['element2'][jj]) >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > Is there an efficient way of using itertools with >> this >> >> >> > >> structure? >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > You can always make two other iterators for each >> column. >> >> >> Since >> >> >> > >> you >> >> >> > >> >> >>> have >> >> >> > >> >> >>> > two columns you would have 4 iterators. I am not sure >> >> how >> >> >> fast >> >> >> > >> >> this is >> >> >> > >> >> >>> > going to be but I am confident that there is >> definitely a >> >> >> way >> >> >> > to >> >> >> > >> do >> >> >> > >> >> >>> this in >> >> >> > >> >> >>> > one for-loop, which is going to be way faster than >> nested >> >> >> > loops. >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Be Well >> >> >> > >> >> >>> > Anthony >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> >> >> > >> >> >>> > > pyt...@li...> >> wrote: >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> Send Pytables-users mailing list submissions to >> >> >> > >> >> >>> > >> pyt...@li... >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, >> >> visit >> >> >> > >> >> >>> > >> >> >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> >>> > >> or, via email, send a message with subject or body >> >> >> 'help' to >> >> >> > >> >> >>> > >> >> pyt...@li... >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> You can reach the person managing the list at >> >> >> > >> >> >>> > >> pyt...@li... >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> When replying, please edit your Subject line so it >> is >> >> >> more >> >> >> > >> >> specific >> >> >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> Today's Topics: >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables >> >> (Josh >> >> >> > Ayers) >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> >> >> >> > >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> Message: 1 >> >> >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> >> >> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of >> HDF5 >> >> >> using >> >> >> > >> >> >>> PyTables >> >> >> > >> >> >>> > >> To: Discussion list for PyTables >> >> >> > >> >> >>> > >> <pyt...@li...> >> >> >> > >> >> >>> > >> Message-ID: >> >> >> > >> >> >>> > >> < >> >> >> > >> >> >>> > >> >> >> >> > >> >> >> >> >> CAC...@ma...> >> >> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> David, >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> The change in issue 27 was only for iteration over >> a >> >> >> > >> >> tables.Column >> >> >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as >> follows. >> >> >> This >> >> >> > >> will >> >> >> > >> >> >>> > iterate >> >> >> > >> >> >>> > >> over the "element" column, as in your original >> >> example. >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> Note also that this will only work with the >> >> development >> >> >> > >> version >> >> >> > >> >> of >> >> >> > >> >> >>> > >> PyTables >> >> >> > >> >> >>> > >> available on github. It will be very slow using >> the >> >> >> > released >> >> >> > >> >> >>> v2.4.0. >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> from itertools import izip >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> with tb.openFile(...) as f: >> >> >> > >> >> >>> > >> data = f.root.data.cols.element >> >> >> > >> >> >>> > >> data_i = iter(data) >> >> >> > >> >> >>> > >> data_j = iter(data) >> >> >> > >> >> >>> > >> data_i.next() # throw the first value away >> >> >> > >> >> >>> > >> for i, j in izip(data_i, data_j): >> >> >> > >> >> >>> > >> compare(i, j) >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> Hope that helps, >> >> >> > >> >> >>> > >> Josh >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> >> >> > >> >> sc...@gm...> >> >> >> > >> >> >>> > >> wrote: >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> > HI David, >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > Tables and table column iteration have been >> >> overhauled >> >> >> > >> fairly >> >> >> > >> >> >>> recently >> >> >> > >> >> >>> > >> > [1]. So you might try creating two iterators, >> >> offset >> >> >> by >> >> >> > >> one, >> >> >> > >> >> and >> >> >> > >> >> >>> then >> >> >> > >> >> >>> > >> > doing the comparison. I am hacking this out >> super >> >> >> quick >> >> >> > so >> >> >> > >> >> please >> >> >> > >> >> >>> > >> forgive >> >> >> > >> >> >>> > >> > me: >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > from itertools import izip >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > with tb.openFile(...) as f: >> >> >> > >> >> >>> > >> > data = f.root.data >> >> >> > >> >> >>> > >> > data_i = iter(data) >> >> >> > >> >> >>> > >> > data_j = iter(data) >> >> >> > >> >> >>> > >> > data_i.next() # throw the first value away >> >> >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): >> >> >> > >> >> >>> > >> > compare(i, j) >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > You get the idea ;) >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > Be Well >> >> >> > >> >> >>> > >> > Anthony >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > 1. >> https://github.com/PyTables/PyTables/issues/27 >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> >> > >> >> >>> dav...@gm...> >> >> >> > >> >> >>> > >> wrote: >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> >> I was hoping someone could help me out here. >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store in >> >> HDF5 >> >> >> and >> >> >> > >> >> access >> >> >> > >> >> >>> > using >> >> >> > >> >> >>> > >> >> PyTables. One operation I need to do on this >> >> dataset >> >> >> are >> >> >> > >> >> pairwise >> >> >> > >> >> >>> > >> >> comparisons between each of the elements. This >> >> >> requires 2 >> >> >> > >> >> loops, >> >> >> > >> >> >>> one >> >> >> > >> >> >>> > to >> >> >> > >> >> >>> > >> >> iterate over each element, and an inner loop to >> >> >> iterate >> >> >> > >> over >> >> >> > >> >> >>> every >> >> >> > >> >> >>> > >> other >> >> >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 >> >> >> > comparisons. >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster to >> >> dump >> >> >> the >> >> >> > >> >> >>> contents >> >> >> > >> >> >>> > >> into a >> >> >> > >> >> >>> > >> >> multdimensional numpy array and then do my >> >> iteration. >> >> >> I >> >> >> > run >> >> >> > >> >> into >> >> >> > >> >> >>> > >> problems >> >> >> > >> >> >>> > >> >> with large sets because of memory issues and >> need >> >> to >> >> >> > access >> >> >> > >> >> each >> >> >> > >> >> >>> > >> element of >> >> >> > >> >> >>> > >> >> the dataset at run time. >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> Putting the elements into an array gives me >> about >> >> 600 >> >> >> > >> >> >>> comparisons per >> >> >> > >> >> >>> > >> >> second, while operating on hdf5 data itself >> gives >> >> me >> >> >> > about >> >> >> > >> 300 >> >> >> > >> >> >>> > >> comparisons >> >> >> > >> >> >>> > >> >> per second. >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> Is there a way to speed this process up? >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> Example follows (this is not my real code, just >> an >> >> >> > >> example): >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> *Small Set*: >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> > >> >> >>> > >> >> data = f.root.data >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> >> >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): >> >> >> > >> >> >>> > >> >> elements[ii] = data['element'] >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> >> >> > >> >> xrange(N_elements): >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], >> >> >> elements[jj]) >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> *Large Set*: >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> >> > >> >> >>> > >> >> data = f.root.data >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> >> >> > >> >> >>> > >> >> for ii in xrange(N_elements): >> >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> >> > >> >> >>> > >> >> D[ii, jj] = >> >> compare(data['element'][ii], >> >> >> > >> >> >>> > >> data['element'][jj]) >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> > >> >> >>> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, >> C# >> >> >> 2012, >> >> >> > >> >> HTML5, >> >> >> > >> >> >>> CSS, >> >> >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. >> Keep >> >> >> your >> >> >> > >> >> skills >> >> >> > >> >> >>> > current >> >> >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video >> >> tutorials >> >> >> by >> >> >> > >> >> >>> Microsoft >> >> >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- >> learn >> >> >> more >> >> >> > at: >> >> >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> >> >> > >> >> >>> > >> >> _______________________________________________ >> >> >> > >> >> >>> > >> >> Pytables-users mailing list >> >> >> > >> >> >>> > >> >> Pyt...@li... >> >> >> > >> >> >>> > >> >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> >> >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> > >> >> >>> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, >> C# >> >> >> 2012, >> >> >> > >> >> HTML5, >> >> >> > >> >> >>> CSS, >> >> >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. >> Keep >> >> >> your >> >> >> > >> skills >> >> >> > >> >> >>> > current >> >> >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video >> >> tutorials >> >> >> by >> >> >> > >> >> Microsoft >> >> >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- >> learn >> >> more >> >> >> > at: >> >> >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> >> >> > >> >> >>> > >> > _______________________________________________ >> >> >> > >> >> >>> > >> > Pytables-users mailing list >> >> >> > >> >> >>> > >> > Pyt...@li... >> >> >> > >> >> >>> > >> > >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> > >> >> >> > >> >> >>> > >> -------------- next part -------------- >> >> >> > >> >> >>> > >> An HTML attachment was scrubbed... >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> ------------------------------ >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> > >> >> >>> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> >> 2012, >> >> >> > >> HTML5, >> >> >> > >> >> >>> CSS, >> >> >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep >> >> your >> >> >> > >> skills >> >> >> > >> >> >>> current >> >> >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video >> tutorials >> >> by >> >> >> > >> >> Microsoft >> >> >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn >> >> more >> >> >> at: >> >> >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> ------------------------------ >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> _______________________________________________ >> >> >> > >> >> >>> > >> Pytables-users mailing list >> >> >> > >> >> >>> > >> Pyt...@li... >> >> >> > >> >> >>> > >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> >> >> > >> >> >>> > >> ********************************************* >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> >> 2012, >> >> >> > >> HTML5, >> >> >> > >> >> CSS, >> >> >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep >> >> your >> >> >> > skills >> >> >> > >> >> >>> current >> >> >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video >> tutorials >> >> by >> >> >> > >> Microsoft >> >> >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn >> more >> >> >> at: >> >> >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> >> > >> >> >>> > > _______________________________________________ >> >> >> > >> >> >>> > > Pytables-users mailing list >> >> >> > >> >> >>> > > Pyt...@li... >> >> >> > >> >> >>> > > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > -------------- next part -------------- >> >> >> > >> >> >>> > An HTML attachment was scrubbed... >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > ------------------------------ >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Message: 2 >> >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, >> Vol >> >> 80, >> >> >> > >> Issue 4 >> >> >> > >> >> >>> > To: Discussion list for PyTables >> >> >> > >> >> >>> > <pyt...@li...> >> >> >> > >> >> >>> > Message-ID: >> >> >> > >> >> >>> > < >> >> >> > >> >> >>> > >> >> >> > >> >> >> CAP...@ma...> >> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Josh is right that you can just edit the code by hand >> >> (which >> >> >> > >> works >> >> >> > >> >> but >> >> >> > >> >> >>> > sucks). >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > However, on Windows -- on the rare occasion when I >> also >> >> >> have to >> >> >> > >> >> >>> develop on >> >> >> > >> >> >>> > it -- I typically use a distribution that includes a >> >> >> compiler, >> >> >> > >> >> cython, >> >> >> > >> >> >>> > hdf5, and pytables already and then I install my >> >> development >> >> >> > >> version >> >> >> > >> >> >>> from >> >> >> > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, >> >> >> though >> >> >> > >> other >> >> >> > >> >> >>> > distributions listed here [1] might also work. >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > Be well >> >> >> > >> >> >>> > Anthony >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > 1. >> >> http://numfocus.org/projects-2/software-distributions/ >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < >> >> >> > jos...@gm... >> >> >> > >> > >> >> >> > >> >> >>> wrote: >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > > The change was in pure Python code, so you should be >> >> able >> >> >> to >> >> >> > >> just >> >> >> > >> >> >>> paste >> >> >> > >> >> >>> > in >> >> >> > >> >> >>> > > the changes to your local copy. Start with the >> >> >> > >> >> table.Column.__iter__ >> >> >> > >> >> >>> > > method (lines 3296-3310) here. >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > It needs to be modified slightly because it uses >> some >> >> >> > >> additional >> >> >> > >> >> >>> features >> >> >> > >> >> >>> > > that aren't available in the released version (the >> >> >> > >> out=buf_slice >> >> >> > >> >> >>> argument >> >> >> > >> >> >>> > > to table.read). The following should work. >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > def __iter__(self): >> >> >> > >> >> >>> > > table = self.table >> >> >> > >> >> >>> > > itemsize = self.dtype.itemsize >> >> >> > >> >> >>> > > nrowsinbuf = >> >> >> table._v_file.params['IO_BUFFER_SIZE'] >> >> >> > // >> >> >> > >> >> >>> itemsize >> >> >> > >> >> >>> > > max_row = len(self) >> >> >> > >> >> >>> > > for start_row in xrange(0, len(self), >> >> nrowsinbuf): >> >> >> > >> >> >>> > > end_row = min([start_row + nrowsinbuf, >> >> >> max_row]) >> >> >> > >> >> >>> > > buf = table.read(start_row, end_row, 1, >> >> >> > >> >> >>> field=self.pathname) >> >> >> > >> >> >>> > > for row in buf: >> >> >> > >> >> >>> > > yield row >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > I haven't tested this, but I think it will work. >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > Josh >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < >> >> >> > >> >> dav...@gm...> >> >> >> > >> >> >>> > wrote: >> >> >> > >> >> >>> > > >> >> >> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but >> I'm >> >> >> > forced >> >> >> > >> to >> >> >> > >> >> >>> work on >> >> >> > >> >> >>> > >> Windows 7 at work and have never had luck compiling >> >> >> python >> >> >> > >> source >> >> >> > >> >> >>> > >> successfully. I have had to rely on precompiled >> >> binaries >> >> >> > and >> >> >> > >> now >> >> >> > >> >> >>> its >> >> >> > >> >> >>> > >> biting me in the butt. >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> Is there any quick fix I can do to improve this >> >> iteration >> >> >> > >> using >> >> >> > >> >> >>> v2.4.0? >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> >> >> > >> >> >>> > >> pyt...@li...> >> wrote: >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >>> Send Pytables-users mailing list submissions to >> >> >> > >> >> >>> > >>> pyt...@li... >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide >> Web, >> >> >> visit >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> > >> >> >>> > >>> or, via email, send a message with subject or body >> >> >> 'help' >> >> >> > to >> >> >> > >> >> >>> > >>> >> pyt...@li... >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> You can reach the person managing the list at >> >> >> > >> >> >>> > >>> >> pyt...@li... >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> When replying, please edit your Subject line so >> it is >> >> >> more >> >> >> > >> >> specific >> >> >> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> Today's Topics: >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 >> >> (David >> >> >> > Reed) >> >> >> > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 >> >> (David >> >> >> > Reed) >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> >> >> >> > >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> Message: 1 >> >> >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> >> >> > >> >> >>> > >>> From: David Reed <dav...@gm...> >> >> >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users >> Digest, >> >> Vol >> >> >> > 80, >> >> >> > >> >> Issue >> >> >> > >> >> >>> 2 >> >> >> > >> >> >>> > >>> To: pyt...@li... >> >> >> > >> >> >>> > >>> Message-ID: >> >> >> > >> >> >>> > >>> >> >> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> >> >> > >> >> >>> > >>> ev...@ma...> >> >> >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I >> >> don't >> >> >> > think >> >> >> > >> >> that >> >> >> > >> >> >>> > method >> >> >> > >> >> >>> > >>> will work since this will only be comparing the >> ith >> >> >> element >> >> >> > >> with >> >> >> > >> >> >>> ith+1 >> >> >> > >> >> >>> > >>> element. I still need 2 for loops right? >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> Using itertools might speed things up though, I've >> >> never >> >> >> > used >> >> >> > >> >> them >> >> >> > >> >> >>> so I >> >> >> > >> >> >>> > >>> will give it a shot and let you know how it goes. >> >> Looks >> >> >> > >> like I >> >> >> > >> >> >>> need to >> >> >> > >> >> >>> > >>> download the latest release before I do that too. >> >> >> Thanks >> >> >> > for >> >> >> > >> >> the >> >> >> > >> >> >>> help. >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> -Dave >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> >> >> > >> >> >>> > >>> pyt...@li...> >> wrote: >> >> >> > >> >> >>> > >>> >> >> >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> >> >> > >> >> >>> > >>> > pyt...@li... >> >> >> > >> >> >>> > >>> > >> >> >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide >> Web, >> >> >> visit >> >> >> > >> >> >>> > >>> > >> >> >> > >> >> >>> >> >> https://lists.sourceforge.net/lists/listin... [truncated message content] |
From: David R. <dav...@gm...> - 2013-02-25 20:16:35
|
Anthony, I've had a chance recently to revisit this problem and am not getting anywhere. I was hoping I might be able to get more support in getting this working. If you have some ideas, through them out and I can do the leg work and see what I can come up with. -David On Mon, Feb 4, 2013 at 3:44 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 9 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 4 Feb 2013 14:43:37 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 9 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hey David, > > I am getting the following error now: > > scopatz@ares ~ $ python t.py > 10669890 Comparisons > Traceback (most recent call last): > File "t.py", line 61, in <module> > get_hd() > File "t.py", line 54, in get_hd > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > range(N_irises)), 2): > File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > line 3308, in __iter__ > out=buf_slice) > File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > line 1807, in read > arr = self._read(start, stop, step, field, out) > File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > line 1732, in _read > bytes_required)) > ValueError: output array size invalid, got 4620 bytes, need 753984000 bytes > > And I had to change the phasors line to ths following: > > r['phasors'] = np.empty((17, 20*240), complex) > > Thanks. > Be Well > Anthony > > > > On Mon, Feb 4, 2013 at 1:41 PM, David Reed <dav...@gm...> wrote: > > > I didn't have any luck. I replaced that __iter__ function which led to > me > > replacing the read function which lead to me replaceing the _read > function > > and I eventually got another error. > > > > Below are 2 functions and my HDF5 Table class declaration. They should > be > > self explanatory. I wasn't sure if attachments would go through and this > > is pretty small, so I figured it would be ok just to post. I apologize > if > > this is a bit cluttered. I would also appreciate any comments on how I > > assign the results to the matrix D, this does not seem very pythonic at > all > > and could use some advice there if its easy. (the ii*jj is just a place > > holder for a more sophisticated measure). Thanks again! > > > > import numpy as np > > import tables as tb > > > > class Iris(tb.IsDescription): > > subject_id = tb.IntCol() > > iris_id = tb.IntCol() > > database = tb.StringCol(5) > > is_left = tb.BoolCol() > > is_flipped = tb.BoolCol() > > templates = tb.BoolCol(shape=(17, 20*480)) > > masks1 = tb.BoolCol(shape=(17, 20*480)) > > phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) > > masks2 = tb.BoolCol(shape=(17, 20*240)) > > > > > > def create_hdf5(): > > """ > > """ > > with tb.openFile('test.h5', 'w') as f: > > > > # Create and fill the table of irises", > > irises = f.createTable(f.root, 'irises', Iris, 'Irises', > > filters=tb.Filters(1)) > > for ii in range(4620): > > > > r = irises.row > > r['subject_id'] = ii > > r['iris_id'] = 0 > > r['database'] = 'test' > > r['is_left'] = True > > r['is_flipped'] = False > > r['templates'] = np.empty((17, 20*480), np.bool8) > > r['masks1'] = np.empty((17, 20*480), np.bool8) > > r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) > > r['masks2'] = np.empty((17, 20*240), np.bool8) > > r.append() > > > > irises.flush() > > > > def get_hd(): > > """ > > """ > > from itertools import combinations, izip > > with tb.openFile('test.h5') as f: > > irises = f.root.irises > > > > templates = f.root.irises.cols.templates > > masks = f.root.irises.cols.masks1 > > > > N_irises = len(irises) > > > > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > > D = np.empty((N_irises, N_irises)) > > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > > range(N_irises)), 2): > > D[ii, jj] = ii*jj > > > > np.save('test', D) > > > > > > > > > > On Mon, Feb 4, 2013 at 11:16 AM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Mon, 4 Feb 2013 10:16:24 -0600 > >> From: Anthony Scopatz <sc...@gm...> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> < > >> CAP...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> > >> wrote: > >> > >> > Hi Josh, > >> > > >> > Here is my __iter__ code: > >> > > >> > def __iter__(self): > >> > table = self.table > >> > itemsize = self.dtype.itemsize > >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // > itemsize > >> > max_row = len(self) > >> > for start_row in xrange(0, len(self), nrowsinbuf): > >> > end_row = min([start_row + nrowsinbuf, max_row]) > >> > buf = table.read(start_row, end_row, 1, > field=self.pathname) > >> > for row in buf: > >> > yield row > >> > > >> > It does look different, I will try swapping in the code from github > and > >> > see what happens. > >> > > >> > >> Yes, please let us know how that goes! Otherwise send the list both the > >> test data generator script and the script that fails. > >> > >> Be Well > >> Anthony > >> > >> > >> > > >> > > >> > On Mon, Feb 4, 2013 at 9:59 AM, < > >> > pyt...@li...> wrote: > >> > > >> >> Send Pytables-users mailing list submissions to > >> >> pyt...@li... > >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> or, via email, send a message with subject or body 'help' to > >> >> pyt...@li... > >> >> > >> >> You can reach the person managing the list at > >> >> pyt...@li... > >> >> > >> >> When replying, please edit your Subject line so it is more specific > >> >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> > >> >> Today's Topics: > >> >> > >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) > >> >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) > >> >> > >> >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> Message: 1 > >> >> Date: Fri, 1 Feb 2013 14:08:47 -0800 > >> >> From: Josh Ayers <jos...@gm...> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > >> >> To: Discussion list for PyTables > >> >> <pyt...@li...> > >> >> Message-ID: > >> >> <CACOB4aPG4NZ6b2a3v= > >> >> 1Ue...@ma...> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> David, > >> >> > >> >> You added a custom version of table.Column.__iter__, correct? Could > >> you > >> >> also include that along with the script to reproduce the error? > >> >> > >> >> It seems like the problem may be in the 'nrowsinbuf' calculation - > see > >> >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using > >> the > >> >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows > of 6 > >> >> chunks. Instead, it's reading the entire table. > >> >> > >> >> [1]: > >> >> > >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 > >> >> > >> >> > >> >> > >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> > >> >> wrote: > >> >> > >> >> > > >> >> > > >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm... > > > >> >> wrote: > >> >> > > >> >> >> at the error: > >> >> >> > >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> >> > >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> >> > >> >> >> I'm not sure what that means as a dtype, but thats what it is. > >> >> >> > >> >> >> Forgive me if I'm being totally naive, but I thought the whole > >> point of > >> >> >> __iter__ with pyttables was to do iteration on the fly, so there > is > >> no > >> >> >> preallocation. > >> >> >> > >> >> > > >> >> > Nope you are not being naive at all. That is the point. > >> >> > > >> >> > > >> >> >> If you have any ideas on this I'm all ears. > >> >> >> > >> >> > > >> >> > If you could send a minimal script which reproduces this error, > that > >> >> would > >> >> > help a lot. > >> >> > > >> >> > Be Well > >> >> > Anthony > >> >> > > >> >> > > >> >> >> > >> >> >> > >> >> >> Thanks again. > >> >> >> > >> >> >> Dave > >> >> >> > >> >> >> > >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> >> pyt...@li...> wrote: > >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> or, via email, send a message with subject or body 'help' to > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> You can reach the person managing the list at > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> When replying, please edit your Subject line so it is more > specific > >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> >>> > >> >> >>> > >> >> >>> Today's Topics: > >> >> >>> > >> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony > Scopatz) > >> >> >>> > >> >> >>> > >> >> >>> > >> ---------------------------------------------------------------------- > >> >> >>> > >> >> >>> Message: 1 > >> >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >> >>> From: Anthony Scopatz <sc...@gm...> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > Issue > >> 2 > >> >> >>> To: Discussion list for PyTables > >> >> >>> <pyt...@li...> > >> >> >>> Message-ID: > >> >> >>> < > >> >> >>> > CAP...@ma... > >> > > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > >> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > >> dav...@gm...> > >> >> >>> wrote: > >> >> >>> > >> >> >>> > Hi Anthony, > >> >> >>> > > >> >> >>> > Thanks for the reply. > >> >> >>> > > >> >> >>> > I honestly don't know how to monitor my Python memory usage, > but > >> I'm > >> >> >>> sure > >> >> >>> > that its caused by out of memory. > >> >> >>> > > >> >> >>> > >> >> >>> Well, I would just run top or process monitor or something while > >> >> running > >> >> >>> the python script to see what happens to memory usage as the > script > >> >> chugs > >> >> >>> along... > >> >> >>> > >> >> >>> > >> >> >>> > I'm just trying to find out how to fix it. My HDF5 table has > >> 4620 > >> >> >>> rows > >> >> >>> > and the column I'm iterating over is a 17x9600 boolean matrix. > >> The > >> >> >>> > __iter__ method is preallocating an array that is this size > which > >> >> >>> appears > >> >> >>> > to be root of the error. I was hoping there is a fix somewhere > >> in > >> >> >>> here to > >> >> >>> > not have to do this preallocation. > >> >> >>> > > >> >> >>> > >> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. > >> 4620 of > >> >> >>> these is ~760 MB. If you have 2 GB of memory and you are > iterating > >> >> over > >> >> >>> 2 > >> >> >>> of these (templates & masks) it is conceivable that you are just > >> >> running > >> >> >>> out of memory. Maybe there is a way that __iter__ could not > >> >> preallocate > >> >> >>> something that is basically a temporary. What is the dtype of > the > >> >> >>> templates array? > >> >> >>> > >> >> >>> Be Well > >> >> >>> Anthony > >> >> >>> > >> >> >>> > >> >> >>> > > >> >> >>> > Thanks again. > >> >> >>> > >> >> >>> > >> >> -------------- next part -------------- > >> >> An HTML attachment was scrubbed... > >> >> > >> >> ------------------------------ > >> >> > >> >> Message: 2 > >> >> Date: Mon, 4 Feb 2013 09:58:53 -0500 > >> >> From: David Reed <dav...@gm...> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 > >> >> To: pyt...@li... > >> >> Message-ID: > >> >> <CAM6XA7= > >> >> h50...@ma...> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> Hi Anthony, > >> >> > >> >> Sorry to just get back to you. I can send a script, should I send a > >> script > >> >> that creates some fake data as well? > >> >> > >> >> -Dave > >> >> > >> >> > >> >> On Fri, Feb 1, 2013 at 4:50 PM, < > >> >> pyt...@li...> wrote: > >> >> > >> >> > Send Pytables-users mailing list submissions to > >> >> > pyt...@li... > >> >> > > >> >> > To subscribe or unsubscribe via the World Wide Web, visit > >> >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > or, via email, send a message with subject or body 'help' to > >> >> > pyt...@li... > >> >> > > >> >> > You can reach the person managing the list at > >> >> > pyt...@li... > >> >> > > >> >> > When replying, please edit your Subject line so it is more specific > >> >> > than "Re: Contents of Pytables-users digest..." > >> >> > > >> >> > > >> >> > Today's Topics: > >> >> > > >> >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) > >> >> > > >> >> > > >> >> > > >> ---------------------------------------------------------------------- > >> >> > > >> >> > Message: 1 > >> >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 > >> >> > From: Anthony Scopatz <sc...@gm...> > >> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue > 4 > >> >> > To: Discussion list for PyTables > >> >> > <pyt...@li...> > >> >> > Message-ID: > >> >> > < > >> >> > CAP...@ma... > > > >> >> > Content-Type: text/plain; charset="iso-8859-1" > >> >> > > >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm... > > > >> >> wrote: > >> >> > > >> >> > > at the error: > >> >> > > > >> >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> > > > >> >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> > > > >> >> > > I'm not sure what that means as a dtype, but thats what it is. > >> >> > > > >> >> > > Forgive me if I'm being totally naive, but I thought the whole > >> point > >> >> of > >> >> > > __iter__ with pyttables was to do iteration on the fly, so there > >> is no > >> >> > > preallocation. > >> >> > > > >> >> > > >> >> > Nope you are not being naive at all. That is the point. > >> >> > > >> >> > > >> >> > > If you have any ideas on this I'm all ears. > >> >> > > > >> >> > > >> >> > If you could send a minimal script which reproduces this error, > that > >> >> would > >> >> > help a lot. > >> >> > > >> >> > Be Well > >> >> > Anthony > >> >> > > >> >> > > >> >> > > > >> >> > > > >> >> > > Thanks again. > >> >> > > > >> >> > > Dave > >> >> > > > >> >> > > > >> >> > > On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> > > pyt...@li...> wrote: > >> >> > > > >> >> > >> Send Pytables-users mailing list submissions to > >> >> > >> pyt...@li... > >> >> > >> > >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> or, via email, send a message with subject or body 'help' to > >> >> > >> pyt...@li... > >> >> > >> > >> >> > >> You can reach the person managing the list at > >> >> > >> pyt...@li... > >> >> > >> > >> >> > >> When replying, please edit your Subject line so it is more > >> specific > >> >> > >> than "Re: Contents of Pytables-users digest..." > >> >> > >> > >> >> > >> > >> >> > >> Today's Topics: > >> >> > >> > >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony > Scopatz) > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> > >> >> > >> Message: 1 > >> >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >> > >> From: Anthony Scopatz <sc...@gm...> > >> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > >> Issue 2 > >> >> > >> To: Discussion list for PyTables > >> >> > >> <pyt...@li...> > >> >> > >> Message-ID: > >> >> > >> < > >> >> > >> > >> CAP...@ma...> > >> >> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> > >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > >> dav...@gm...> > >> >> > >> wrote: > >> >> > >> > >> >> > >> > Hi Anthony, > >> >> > >> > > >> >> > >> > Thanks for the reply. > >> >> > >> > > >> >> > >> > I honestly don't know how to monitor my Python memory usage, > but > >> >> I'm > >> >> > >> sure > >> >> > >> > that its caused by out of memory. > >> >> > >> > > >> >> > >> > >> >> > >> Well, I would just run top or process monitor or something while > >> >> running > >> >> > >> the python script to see what happens to memory usage as the > >> script > >> >> > chugs > >> >> > >> along... > >> >> > >> > >> >> > >> > >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table has > >> 4620 > >> >> > rows > >> >> > >> > and the column I'm iterating over is a 17x9600 boolean matrix. > >> The > >> >> > >> > __iter__ method is preallocating an array that is this size > >> which > >> >> > >> appears > >> >> > >> > to be root of the error. I was hoping there is a fix > somewhere > >> in > >> >> > here > >> >> > >> to > >> >> > >> > not have to do this preallocation. > >> >> > >> > > >> >> > >> > >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. > >> 4620 > >> >> of > >> >> > >> these is ~760 MB. If you have 2 GB of memory and you are > >> iterating > >> >> > over 2 > >> >> > >> of these (templates & masks) it is conceivable that you are just > >> >> running > >> >> > >> out of memory. Maybe there is a way that __iter__ could not > >> >> preallocate > >> >> > >> something that is basically a temporary. What is the dtype of > the > >> >> > >> templates array? > >> >> > >> > >> >> > >> Be Well > >> >> > >> Anthony > >> >> > >> > >> >> > >> > >> >> > >> > > >> >> > >> > Thanks again. > >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < > >> >> > >> > pyt...@li...> wrote: > >> >> > >> > > >> >> > >> >> Send Pytables-users mailing list submissions to > >> >> > >> >> pyt...@li... > >> >> > >> >> > >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> > >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> or, via email, send a message with subject or body 'help' to > >> >> > >> >> pyt...@li... > >> >> > >> >> > >> >> > >> >> You can reach the person managing the list at > >> >> > >> >> pyt...@li... > >> >> > >> >> > >> >> > >> >> When replying, please edit your Subject line so it is more > >> >> specific > >> >> > >> >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> Today's Topics: > >> >> > >> >> > >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony > >> Scopatz) > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > > >> ---------------------------------------------------------------------- > >> >> > >> >> > >> >> > >> >> Message: 1 > >> >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > >> >> > >> >> From: Anthony Scopatz <sc...@gm...> > >> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > >> >> Issue 9 > >> >> > >> >> To: Discussion list for PyTables > >> >> > >> >> <pyt...@li...> > >> >> > >> >> Message-ID: > >> >> > >> >> < > >> >> > >> >> > >> >> CAP...@ma...> > >> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> > >> >> > >> >> Hi David, > >> >> > >> >> > >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to be > >> >> getting > >> >> > a > >> >> > >> >> memory error on creating a numpy array. This kind of thing > >> >> typically > >> >> > >> >> happens when you are out of memory. Does this seem to be the > >> case > >> >> > with > >> >> > >> >> you? When this dies, is your memory usage at 100%? If so, > >> this > >> >> > >> algorithm > >> >> > >> >> might require a little tweaking... > >> >> > >> >> > >> >> > >> >> Be Well > >> >> > >> >> Anthony > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < > >> >> dav...@gm...> > >> >> > >> >> wrote: > >> >> > >> >> > >> >> > >> >> > I'm still having problems with this one. I can't tell if > >> this > >> >> > >> something > >> >> > >> >> > dumb Im doing with itertools, or if its something in > >> pytables. > >> >> > >> >> > > >> >> > >> >> > Would appreciate any help. > >> >> > >> >> > > >> >> > >> >> > Thanks > >> >> > >> >> > > >> >> > >> >> > > >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < > >> >> > dav...@gm... > >> >> > >> >> >wrote: > >> >> > >> >> > > >> >> > >> >> >> I think I have to reopen this issue. I have been running > >> fine > >> >> for > >> >> > >> >> awhile > >> >> > >> >> >> using the combinations method from itertools, but have > >> recently > >> >> > run > >> >> > >> >> into a > >> >> > >> >> >> memory since I have recently quadrupled the size of the > hdf > >> >> file. > >> >> > >> >> >> > >> >> > >> >> >> Here is my code again: > >> >> > >> >> >> > >> >> > >> >> >> from itertools import combinations, izip > >> >> > >> >> >> with tb.openFile(h5_all, 'r') as f: > >> >> > >> >> >> irises = f.root.irises > >> >> > >> >> >> > >> >> > >> >> >> templates = f.root.irises.cols.templates > >> >> > >> >> >> masks = f.root.irises.cols.masks1 > >> >> > >> >> >> > >> >> > >> >> >> N_irises = len(irises) > >> >> > >> >> >> index = np.ones((20 * 480), np.bool) > >> >> > >> >> >> > >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> >> > >> >> >> D = np.empty((N_irises, N_irises)) > >> >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in > >> combinations(izip(templates, > >> >> > >> masks, > >> >> > >> >> >> range(N_irises)), 2): > >> >> > >> >> >> # print ii > >> >> > >> >> >> D[ii, jj] = ham_dist( > >> >> > >> >> >> t1[8, index], > >> >> > >> >> >> t2[:, index], > >> >> > >> >> >> m1[8, index], > >> >> > >> >> >> m2[:, index], > >> >> > >> >> >> ) > >> >> > >> >> >> > >> >> > >> >> >> And here is the error: > >> >> > >> >> >> > >> >> > >> >> >> In [10]: get_hd3() > >> >> > >> >> >> 10669890 Comparisons > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > --------------------------------------------------------------------------- > >> >> > >> >> >> MemoryError Traceback (most > >> >> recent > >> >> > >> call > >> >> > >> >> >> last) > >> >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > >> >> > >> >> >> ----> 1 get_hd3() > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> 118 print '%i Comparisons' % > >> >> > >> (N_irises*(N_irises - > >> >> > >> >> >> 1)/2) > >> >> > >> >> >> 119 D = np.empty((N_irises, N_irises)) > >> >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in > >> >> > >> >> >> combinations(izip(temp > >> >> > >> >> >> lates, masks, range(N_irises)), 2): > >> >> > >> >> >> 121 # print ii > >> >> > >> >> >> 122 D[ii, jj] = ham_dist( > >> >> > >> >> >> > >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> >> __iter__(self) > >> >> > >> >> >> 3274 for start_row in xrange(0, len(self), > >> >> nrowsinbuf): > >> >> > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, > >> >> > max_row]) > >> >> > >> >> >> -> 3276 buf = table.read(start_row, end_row, > 1, > >> >> > >> >> >> field=self.pathname) > >> >> > >> >> >> > >> >> > >> >> >> 3277 for row in buf: > >> >> > >> >> >> 3278 yield row > >> >> > >> >> >> > >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > read(self, > >> >> > start, > >> >> > >> >> stop, > >> >> > >> >> >> step, > >> >> > >> >> >> field) > >> >> > >> >> >> 1772 (start, stop, step) = > >> >> > self._processRangeRead(start, > >> >> > >> >> stop, > >> >> > >> >> >> step) > >> >> > >> >> >> 1773 > >> >> > >> >> >> -> 1774 arr = self._read(start, stop, step, field) > >> >> > >> >> >> 1775 return internal_to_flavor(arr, > self.flavor) > >> >> > >> >> >> 1776 > >> >> > >> >> >> > >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> _read(self, > >> >> > start, > >> >> > >> >> >> stop, step, > >> >> > >> >> >> field) > >> >> > >> >> >> 1719 if field: > >> >> > >> >> >> 1720 # Create a container for the results > >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> >> > >> dtype=dtypeField) > >> >> > >> >> >> 1722 else: > >> >> > >> >> >> 1723 # Recarray case > >> >> > >> >> >> > >> >> > >> >> >> MemoryError: > >> >> > >> >> >> > > c:\python27\lib\site-packages\tables\table.py(1721)_read() > >> >> > >> >> >> 1720 # Create a container for the results > >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> >> > >> dtype=dtypeField) > >> >> > >> >> >> 1722 else: > >> >> > >> >> >> > >> >> > >> >> >> Also, if you guys see any performance problems in my code, > >> >> please > >> >> > >> let > >> >> > >> >> me > >> >> > >> >> >> know. > >> >> > >> >> >> > >> >> > >> >> >> Thank you so much for the help. > >> >> > >> >> >> > >> >> > >> >> >> -Dave > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > >> >> > >> >> >> pyt...@li...> wrote: > >> >> > >> >> >> > >> >> > >> >> >>> Send Pytables-users mailing list submissions to > >> >> > >> >> >>> pyt...@li... > >> >> > >> >> >>> > >> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >> > >> >> >>> > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> or, via email, send a message with subject or body 'help' > >> to > >> >> > >> >> >>> pyt...@li... > >> >> > >> >> >>> > >> >> > >> >> >>> You can reach the person managing the list at > >> >> > >> >> >>> pyt...@li... > >> >> > >> >> >>> > >> >> > >> >> >>> When replying, please edit your Subject line so it is > more > >> >> > specific > >> >> > >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> Today's Topics: > >> >> > >> >> >>> > >> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David > >> Reed) > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> >>> > >> >> > >> >> >>> Message: 1 > >> >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > >> >> > >> >> >>> From: David Reed <dav...@gm...> > >> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > >> 80, > >> >> > Issue > >> >> > >> 8 > >> >> > >> >> >>> To: pyt...@li... > >> >> > >> >> >>> Message-ID: > >> >> > >> >> >>> < > >> >> > >> >> >>> > >> >> > CAM...@ma... > >> >> > >> > > >> >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > >> >> > >> >> >>> I can't thank you guys enough for the help. I was able > to > >> add > >> >> > the > >> >> > >> >> >>> __iter__ > >> >> > >> >> >>> function to the table.py file and everything seems to be > >> >> working > >> >> > >> >> great! > >> >> > >> >> >>> I'm not quite as fast as I was with iterating right of a > >> >> matrix > >> >> > >> but > >> >> > >> >> >>> pretty > >> >> > >> >> >>> close. I was at 555 comparisons per second, and now im > at > >> >> 420. > >> >> > >> >> >>> > >> >> > >> >> >>> I handled the problem I mentioned earlier by doing this, > >> and > >> >> it > >> >> > >> seems > >> >> > >> >> to > >> >> > >> >> >>> work great: > >> >> > >> >> >>> > >> >> > >> >> >>> A = f.root.data.cols.A > >> >> > >> >> >>> B = f.root.data.cols.B > >> >> > >> >> >>> > >> >> > >> >> >>> D = np.empty((len(A), len(A)) > >> >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, > >> >> > >> >> range(len(A))), > >> >> > >> >> >>> 2): > >> >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > >> >> > >> >> >>> > >> >> > >> >> >>> Again, thanks a lot. > >> >> > >> >> >>> > >> >> > >> >> >>> -Dave > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > >> >> > >> >> >>> pyt...@li...> wrote: > >> >> > >> >> >>> > >> >> > >> >> >>> > Send Pytables-users mailing list submissions to > >> >> > >> >> >>> > pyt...@li... > >> >> > >> >> >>> > > >> >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, > visit > >> >> > >> >> >>> > > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > or, via email, send a message with subject or body > >> 'help' to > >> >> > >> >> >>> > pyt...@li... > >> >> > >> >> >>> > > >> >> > >> >> >>> > You can reach the person managing the list at > >> >> > >> >> >>> > pyt...@li... > >> >> > >> >> >>> > > >> >> > >> >> >>> > When replying, please edit your Subject line so it is > >> more > >> >> > >> specific > >> >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > Today's Topics: > >> >> > >> >> >>> > > >> >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 > (Anthony > >> >> > >> Scopatz) > >> >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 > (Anthony > >> >> > >> Scopatz) > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> > >> >> > > >> ---------------------------------------------------------------------- > >> >> > >> >> >>> > > >> >> > >> >> >>> > Message: 1 > >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, > Vol > >> 80, > >> >> > >> Issue 3 > >> >> > >> >> >>> > To: Discussion list for PyTables > >> >> > >> >> >>> > <pyt...@li...> > >> >> > >> >> >>> > Message-ID: > >> >> > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > >> >> > >> >> >>> > Gz...@ma...> > >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > > >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > >> >> > >> dav...@gm...> > >> >> > >> >> >>> wrote: > >> >> > >> >> >>> > > >> >> > >> >> >>> > > Thanks a lot for the help so far guys! > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > Looking at itertools, I found what I believe to be > the > >> >> > perfect > >> >> > >> >> >>> function > >> >> > >> >> >>> > > for what I need, itertools.combinations. This appears > >> to > >> >> be a > >> >> > >> >> valid > >> >> > >> >> >>> > > replacement to the method proposed. > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > >> >> > >> >> >>> > Yes, combinations is awesome! > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > There is a small problem that I didn't mention is > that > >> my > >> >> > >> compare > >> >> > >> >> >>> > function > >> >> > >> >> >>> > > actually takes as inputs 2 columns from the table. > Like > >> >> so: > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) > >> >> > >> >> >>> > > for ii in xrange(N_elements): > >> >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): > >> >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], > >> >> > >> >> >>> > data['element1'][jj],data['element2'][ii], > >> >> > >> >> >>> > > data['element2'][jj]) > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > Is there an efficient way of using itertools with > this > >> >> > >> structure? > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > >> >> > >> >> >>> > You can always make two other iterators for each > column. > >> >> Since > >> >> > >> you > >> >> > >> >> >>> have > >> >> > >> >> >>> > two columns you would have 4 iterators. I am not sure > >> how > >> >> fast > >> >> > >> >> this is > >> >> > >> >> >>> > going to be but I am confident that there is > definitely a > >> >> way > >> >> > to > >> >> > >> do > >> >> > >> >> >>> this in > >> >> > >> >> >>> > one for-loop, which is going to be way faster than > nested > >> >> > loops. > >> >> > >> >> >>> > > >> >> > >> >> >>> > Be Well > >> >> > >> >> >>> > Anthony > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > >> >> > >> >> >>> > > pyt...@li...> wrote: > >> >> > >> >> >>> > > > >> >> > >> >> >>> > >> Send Pytables-users mailing list submissions to > >> >> > >> >> >>> > >> pyt...@li... > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, > >> visit > >> >> > >> >> >>> > >> > >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >> or, via email, send a message with subject or body > >> >> 'help' to > >> >> > >> >> >>> > >> > pyt...@li... > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> You can reach the person managing the list at > >> >> > >> >> >>> > >> pyt...@li... > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> When replying, please edit your Subject line so it > is > >> >> more > >> >> > >> >> specific > >> >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Today's Topics: > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables > >> (Josh > >> >> > Ayers) > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> >> > >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Message: 1 > >> >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> > >> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of > HDF5 > >> >> using > >> >> > >> >> >>> PyTables > >> >> > >> >> >>> > >> To: Discussion list for PyTables > >> >> > >> >> >>> > >> <pyt...@li...> > >> >> > >> >> >>> > >> Message-ID: > >> >> > >> >> >>> > >> < > >> >> > >> >> >>> > >> > >> >> > >> >> > >> >> CAC...@ma...> > >> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> David, > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> The change in issue 27 was only for iteration over a > >> >> > >> >> tables.Column > >> >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as > follows. > >> >> This > >> >> > >> will > >> >> > >> >> >>> > iterate > >> >> > >> >> >>> > >> over the "element" column, as in your original > >> example. > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Note also that this will only work with the > >> development > >> >> > >> version > >> >> > >> >> of > >> >> > >> >> >>> > >> PyTables > >> >> > >> >> >>> > >> available on github. It will be very slow using the > >> >> > released > >> >> > >> >> >>> v2.4.0. > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> from itertools import izip > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> with tb.openFile(...) as f: > >> >> > >> >> >>> > >> data = f.root.data.cols.element > >> >> > >> >> >>> > >> data_i = iter(data) > >> >> > >> >> >>> > >> data_j = iter(data) > >> >> > >> >> >>> > >> data_i.next() # throw the first value away > >> >> > >> >> >>> > >> for i, j in izip(data_i, data_j): > >> >> > >> >> >>> > >> compare(i, j) > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Hope that helps, > >> >> > >> >> >>> > >> Josh > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> >> > >> >> sc...@gm...> > >> >> > >> >> >>> > >> wrote: > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > HI David, > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > Tables and table column iteration have been > >> overhauled > >> >> > >> fairly > >> >> > >> >> >>> recently > >> >> > >> >> >>> > >> > [1]. So you might try creating two iterators, > >> offset > >> >> by > >> >> > >> one, > >> >> > >> >> and > >> >> > >> >> >>> then > >> >> > >> >> >>> > >> > doing the comparison. I am hacking this out super > >> >> quick > >> >> > so > >> >> > >> >> please > >> >> > >> >> >>> > >> forgive > >> >> > >> >> >>> > >> > me: > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > from itertools import izip > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > with tb.openFile(...) as f: > >> >> > >> >> >>> > >> > data = f.root.data > >> >> > >> >> >>> > >> > data_i = iter(data) > >> >> > >> >> >>> > >> > data_j = iter(data) > >> >> > >> >> >>> > >> > data_i.next() # throw the first value away > >> >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): > >> >> > >> >> >>> > >> > compare(i, j) > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > You get the idea ;) > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > Be Well > >> >> > >> >> >>> > >> > Anthony > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >> > >> >> >>> dav...@gm...> > >> >> > >> >> >>> > >> wrote: > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> >> I was hoping someone could help me out here. > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store in > >> HDF5 > >> >> and > >> >> > >> >> access > >> >> > >> >> >>> > using > >> >> > >> >> >>> > >> >> PyTables. One operation I need to do on this > >> dataset > >> >> are > >> >> > >> >> pairwise > >> >> > >> >> >>> > >> >> comparisons between each of the elements. This > >> >> requires 2 > >> >> > >> >> loops, > >> >> > >> >> >>> one > >> >> > >> >> >>> > to > >> >> > >> >> >>> > >> >> iterate over each element, and an inner loop to > >> >> iterate > >> >> > >> over > >> >> > >> >> >>> every > >> >> > >> >> >>> > >> other > >> >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 > >> >> > comparisons. > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster to > >> dump > >> >> the > >> >> > >> >> >>> contents > >> >> > >> >> >>> > >> into a > >> >> > >> >> >>> > >> >> multdimensional numpy array and then do my > >> iteration. > >> >> I > >> >> > run > >> >> > >> >> into > >> >> > >> >> >>> > >> problems > >> >> > >> >> >>> > >> >> with large sets because of memory issues and need > >> to > >> >> > access > >> >> > >> >> each > >> >> > >> >> >>> > >> element of > >> >> > >> >> >>> > >> >> the dataset at run time. > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> Putting the elements into an array gives me about > >> 600 > >> >> > >> >> >>> comparisons per > >> >> > >> >> >>> > >> >> second, while operating on hdf5 data itself gives > >> me > >> >> > about > >> >> > >> 300 > >> >> > >> >> >>> > >> comparisons > >> >> > >> >> >>> > >> >> per second. > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> Is there a way to speed this process up? > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> Example follows (this is not my real code, just > an > >> >> > >> example): > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> *Small Set*: > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> > >> >> >>> > >> >> data = f.root.data > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): > >> >> > >> >> >>> > >> >> elements[ii] = data['element'] > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > >> >> > >> >> xrange(N_elements): > >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], > >> >> elements[jj]) > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> *Large Set*: > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> > >> >> >>> > >> >> data = f.root.data > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) > >> >> > >> >> >>> > >> >> for ii in xrange(N_elements): > >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> > >> >> >>> > >> >> D[ii, jj] = > >> compare(data['element'][ii], > >> >> > >> >> >>> > >> data['element'][jj]) > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > ------------------------------------------------------------------------------ > >> >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> >> 2012, > >> >> > >> >> HTML5, > >> >> > >> >> >>> CSS, > >> >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> >> your > >> >> > >> >> skills > >> >> > >> >> >>> > current > >> >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> >> by > >> >> > >> >> >>> Microsoft > >> >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- > learn > >> >> more > >> >> > at: > >> >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > >> >> > >> >> >>> > >> >> _______________________________________________ > >> >> > >> >> >>> > >> >> Pytables-users mailing list > >> >> > >> >> >>> > >> >> Pyt...@li... > >> >> > >> >> >>> > >> >> > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > ------------------------------------------------------------------------------ > >> >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> >> 2012, > >> >> > >> >> HTML5, > >> >> > >> >> >>> CSS, > >> >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> >> your > >> >> > >> skills > >> >> > >> >> >>> > current > >> >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> >> by > >> >> > >> >> Microsoft > >> >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn > >> more > >> >> > at: > >> >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > >> >> > >> >> >>> > >> > _______________________________________________ > >> >> > >> >> >>> > >> > Pytables-users mailing list > >> >> > >> >> >>> > >> > Pyt...@li... > >> >> > >> >> >>> > >> > > >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> -------------- next part -------------- > >> >> > >> >> >>> > >> An HTML attachment was scrubbed... > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> ------------------------------ > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > ------------------------------------------------------------------------------ > >> >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# > >> 2012, > >> >> > >> HTML5, > >> >> > >> >> >>> CSS, > >> >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep > >> your > >> >> > >> skills > >> >> > >> >> >>> current > >> >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video > tutorials > >> by > >> >> > >> >> Microsoft > >> >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn > >> more > >> >> at: > >> >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> ------------------------------ > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> _______________________________________________ > >> >> > >> >> >>> > >> Pytables-users mailing list > >> >> > >> >> >>> > >> Pyt...@li... > >> >> > >> >> >>> > >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> >> > >> >> >>> > >> ********************************************* > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > ------------------------------------------------------------------------------ > >> >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > >> 2012, > >> >> > >> HTML5, > >> >> > >> >> CSS, > >> >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep > >> your > >> >> > skills > >> >> > >> >> >>> current > >> >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials > >> by > >> >> > >> Microsoft > >> >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn > more > >> >> at: > >> >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >> > >> >> >>> > > _______________________________________________ > >> >> > >> >> >>> > > Pytables-users mailing list > >> >> > >> >> >>> > > Pyt...@li... > >> >> > >> >> >>> > > > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > -------------- next part -------------- > >> >> > >> >> >>> > An HTML attachment was scrubbed... > >> >> > >> >> >>> > > >> >> > >> >> >>> > ------------------------------ > >> >> > >> >> >>> > > >> >> > >> >> >>> > Message: 2 > >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, > Vol > >> 80, > >> >> > >> Issue 4 > >> >> > >> >> >>> > To: Discussion list for PyTables > >> >> > >> >> >>> > <pyt...@li...> > >> >> > >> >> >>> > Message-ID: > >> >> > >> >> >>> > < > >> >> > >> >> >>> > > >> >> > >> > >> CAP...@ma...> > >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > > >> >> > >> >> >>> > Josh is right that you can just edit the code by hand > >> (which > >> >> > >> works > >> >> > >> >> but > >> >> > >> >> >>> > sucks). > >> >> > >> >> >>> > > >> >> > >> >> >>> > However, on Windows -- on the rare occasion when I also > >> >> have to > >> >> > >> >> >>> develop on > >> >> > >> >> >>> > it -- I typically use a distribution that includes a > >> >> compiler, > >> >> > >> >> cython, > >> >> > >> >> >>> > hdf5, and pytables already and then I install my > >> development > >> >> > >> version > >> >> > >> >> >>> from > >> >> > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, > >> >> though > >> >> > >> other > >> >> > >> >> >>> > distributions listed here [1] might also work. > >> >> > >> >> >>> > > >> >> > >> >> >>> > Be well > >> >> > >> >> >>> > Anthony > >> >> > >> >> >>> > > >> >> > >> >> >>> > 1. > >> http://numfocus.org/projects-2/software-distributions/ > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < > >> >> > jos...@gm... > >> >> > >> > > >> >> > >> >> >>> wrote: > >> >> > >> >> >>> > > >> >> > >> >> >>> > > The change was in pure Python code, so you should be > >> able > >> >> to > >> >> > >> just > >> >> > >> >> >>> paste > >> >> > >> >> >>> > in > >> >> > >> >> >>> > > the changes to your local copy. Start with the > >> >> > >> >> table.Column.__iter__ > >> >> > >> >> >>> > > method (lines 3296-3310) here. > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > It needs to be modified slightly because it uses some > >> >> > >> additional > >> >> > >> >> >>> features > >> >> > >> >> >>> > > that aren't available in the released version (the > >> >> > >> out=buf_slice > >> >> > >> >> >>> argument > >> >> > >> >> >>> > > to table.read). The following should work. > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > def __iter__(self): > >> >> > >> >> >>> > > table = self.table > >> >> > >> >> >>> > > itemsize = self.dtype.itemsize > >> >> > >> >> >>> > > nrowsinbuf = > >> >> table._v_file.params['IO_BUFFER_SIZE'] > >> >> > // > >> >> > >> >> >>> itemsize > >> >> > >> >> >>> > > max_row = len(self) > >> >> > >> >> >>> > > for start_row in xrange(0, len(self), > >> nrowsinbuf): > >> >> > >> >> >>> > > end_row = min([start_row + nrowsinbuf, > >> >> max_row]) > >> >> > >> >> >>> > > buf = table.read(start_row, end_row, 1, > >> >> > >> >> >>> field=self.pathname) > >> >> > >> >> >>> > > for row in buf: > >> >> > >> >> >>> > > yield row > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > I haven't tested this, but I think it will work. > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > Josh > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < > >> >> > >> >> dav...@gm...> > >> >> > >> >> >>> > wrote: > >> >> > >> >> >>> > > > >> >> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but > I'm > >> >> > forced > >> >> > >> to > >> >> > >> >> >>> work on > >> >> > >> >> >>> > >> Windows 7 at work and have never had luck compiling > >> >> python > >> >> > >> source > >> >> > >> >> >>> > >> successfully. I have had to rely on precompiled > >> binaries > >> >> > and > >> >> > >> now > >> >> > >> >> >>> its > >> >> > >> >> >>> > >> biting me in the butt. > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Is there any quick fix I can do to improve this > >> iteration > >> >> > >> using > >> >> > >> >> >>> v2.4.0? > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > >> >> > >> >> >>> > >> pyt...@li...> > wrote: > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >>> Send Pytables-users mailing list submissions to > >> >> > >> >> >>> > >>> pyt...@li... > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, > >> >> visit > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >>> or, via email, send a message with subject or body > >> >> 'help' > >> >> > to > >> >> > >> >> >>> > >>> > pyt...@li... > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> You can reach the person managing the list at > >> >> > >> >> >>> > >>> pyt...@li... > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> When replying, please edit your Subject line so it > is > >> >> more > >> >> > >> >> specific > >> >> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> Today's Topics: > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 > >> (David > >> >> > Reed) > >> >> > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 > >> (David > >> >> > Reed) > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >> >> > >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> Message: 1 > >> >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > >> >> > >> >> >>> > >>> From: David Reed <dav...@gm...> > >> >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users > Digest, > >> Vol > >> >> > 80, > >> >> > >> >> Issue > >> >> > >> >> >>> 2 > >> >> > >> >> >>> > >>> To: pyt...@li... > >> >> > >> >> >>> > >>> Message-ID: > >> >> > >> >> >>> > >>> > >> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > >> >> > >> >> >>> > >>> ev...@ma...> > >> >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I > >> don't > >> >> > think > >> >> > >> >> that > >> >> > >> >> >>> > method > >> >> > >> >> >>> > >>> will work since this will only be comparing the ith > >> >> element > >> >> > >> with > >> >> > >> >> >>> ith+1 > >> >> > >> >> >>> > >>> element. I still need 2 for loops right? > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> Using itertools might speed things up though, I've > >> never > >> >> > used > >> >> > >> >> them > >> >> > >> >> >>> so I > >> >> > >> >> >>> > >>> will give it a shot and let you know how it goes. > >> Looks > >> >> > >> like I > >> >> > >> >> >>> need to > >> >> > >> >> >>> > >>> download the latest release before I do that too. > >> >> Thanks > >> >> > for > >> >> > >> >> the > >> >> > >> >> >>> help. > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> -Dave > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > >> >> > >> >> >>> > >>> pyt...@li...> > wrote: > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > >> >> > >> >> >>> > >>> > pyt...@li... > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide > Web, > >> >> visit > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >>> > or, via email, send a message with subject or > body > >> >> 'help' > >> >> > >> to > >> >> > >> >> >>> > >>> > > >> pyt...@li... > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > You can reach the person managing the list at > >> >> > >> >> >>> > >>> > > pyt...@li... > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > When replying, please edit your Subject line so > it > >> is > >> >> > more > >> >> > >> >> >>> specific > >> >> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > Today's Topics: > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables > >> >> > (Anthony > >> >> > >> >> >>> Scopatz) > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > > >> >> > >> >> > >> >> > > >> ---------------------------------------------------------------------- > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > Message: 1 > >> >> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > >> >> > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> > >> >> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of > >> HDF5 > >> >> > >> using > >> >> > >> >> >>> PyTables > >> >> > >> >> >>> > >>> > To: Discussion list for PyTables > >> >> > >> >> >>> > >>> > <pyt...@li...> > >> >> > >> >> >>> > >>> > Message-ID: > >> >> > >> >> >>> > >>> > <CAPk-6T5b= > >> >> > >> >> >>> > >>> > > >> >> 1EG...@ma... > >> >> > > > >> >> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > HI David, > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > Tables and > > > > ... > > > > [Message clipped] > > > > > ------------------------------------------------------------------------------ > > Everyone hates sl... [truncated message content] |
From: Jon R. <row...@gm...> - 2013-02-15 17:13:04
|
Thanks for confirming - I'll keep an eye out for the fix. On Fri, Feb 15, 2013 at 4:12 PM, Francesc Alted <fa...@gm...> wrote: > Hi Jon and Anthony, > > I can confirm that this is a package error of PyTables in Anaconda CE 64 > for Windows. We have filed a ticket in Anaconda for fixing this. Sorry > for the inconveniences. > > Francesc Alted > > On 2/15/13 4:56 PM, Anthony Scopatz wrote: >> Hi Jon, >> >> Unfortunately, I have no way of testing this out. I will say that I >> have had problems with HDF5 and Anaconda on windows before since they >> only ship the static *.lib hdf5 libraries. So it may be the case that >> the pandas -> pytables / hdf5 interface hasn't been properly linked. >> Barring someone on this list who can test things out for you, you >> might try grabbing the PyTables source from github and building it on >> top of your install of Anaconda. Sorry... >> >> Be Well >> Anthony >> >> >> On Fri, Feb 15, 2013 at 3:29 AM, Jon Rowland <row...@gm... >> <mailto:row...@gm...>> wrote: >> >> Hi - apologies if this is a duplicate, I had an error sending the >> first time and wasn't sure if it made it through. >> >> I have an issue using pandas/HDFStore/pytables in the Anaconda CE >> distribution on Windows 64-bit. >> >> After a little troubleshooting with the Anaconda/pandas lists, it's >> been suggested that it might be a pytables issue (or at least some >> kind of package mismatch causing pytables not to work). >> >> I have a clean install of Anaconda 1.3.1 64-bit CE edition on a >> Windows 64-bit machine. >> >> Running the pytables self-test gives the following output: >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> PyTables version: 2.4.0 >> HDF5 version: 1.8.9 >> NumPy version: 1.6.2 >> Numexpr version: 2.0.1 (not using Intel's VML/MKL) >> Zlib version: 1.2.3 (in Python interpreter) >> Blosc version: 1.1.3 (2010-11-16) >> Cython version: 0.17.4 >> Python version: 2.7.3 |AnacondaCE 1.3.1 (64-bit)| (default, Jan 7 >> 2013, 09:47:12) [MSC v.1500 64 bit (AMD64)] >> Byte-ordering: little >> Detected cores: 4 >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> >> Then I get a *lot* of output to standard error - pages and pages of it >> - that looks something like this: >> >> C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning: >> compression library ``bzip2`` is not available; using ``zlib`` instead >> % (complib, default_complib), FiltersWarning ) >> C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning: >> compression library ``lzo`` is not available; using ``zlib`` instead >> % (complib, default_complib), FiltersWarning ) >> HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: >> #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type >> major: Invalid arguments to routine >> minor: Inappropriate type >> HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: >> #000: ..\..\src\H5A.c line 920 in H5Awrite(): not an attribute >> major: Invalid arguments to routine >> minor: Inappropriate type >> EHDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: >> #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type >> major: Invalid arguments to routine >> minor: Inappropriate type >> >> Is this something I'm doing wrong or is there something wrong with >> the package? >> >> Any help would be appreciated. >> >> Thanks, >> Jon >> >> ------------------------------------------------------------------------------ >> Free Next-Gen Firewall Hardware Offer >> Buy your Sophos next-gen firewall before the end March 2013 >> and get the hardware for free! Learn more. >> http://p.sf.net/sfu/sophos-d2d-feb >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> <mailto:Pyt...@li...> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> ------------------------------------------------------------------------------ >> Free Next-Gen Firewall Hardware Offer >> Buy your Sophos next-gen firewall before the end March 2013 >> and get the hardware for free! Learn more. >> http://p.sf.net/sfu/sophos-d2d-feb >> >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > -- > Francesc Alted > > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Francesc A. <fa...@gm...> - 2013-02-15 16:12:52
|
Hi Jon and Anthony, I can confirm that this is a package error of PyTables in Anaconda CE 64 for Windows. We have filed a ticket in Anaconda for fixing this. Sorry for the inconveniences. Francesc Alted On 2/15/13 4:56 PM, Anthony Scopatz wrote: > Hi Jon, > > Unfortunately, I have no way of testing this out. I will say that I > have had problems with HDF5 and Anaconda on windows before since they > only ship the static *.lib hdf5 libraries. So it may be the case that > the pandas -> pytables / hdf5 interface hasn't been properly linked. > Barring someone on this list who can test things out for you, you > might try grabbing the PyTables source from github and building it on > top of your install of Anaconda. Sorry... > > Be Well > Anthony > > > On Fri, Feb 15, 2013 at 3:29 AM, Jon Rowland <row...@gm... > <mailto:row...@gm...>> wrote: > > Hi - apologies if this is a duplicate, I had an error sending the > first time and wasn't sure if it made it through. > > I have an issue using pandas/HDFStore/pytables in the Anaconda CE > distribution on Windows 64-bit. > > After a little troubleshooting with the Anaconda/pandas lists, it's > been suggested that it might be a pytables issue (or at least some > kind of package mismatch causing pytables not to work). > > I have a clean install of Anaconda 1.3.1 64-bit CE edition on a > Windows 64-bit machine. > > Running the pytables self-test gives the following output: > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > PyTables version: 2.4.0 > HDF5 version: 1.8.9 > NumPy version: 1.6.2 > Numexpr version: 2.0.1 (not using Intel's VML/MKL) > Zlib version: 1.2.3 (in Python interpreter) > Blosc version: 1.1.3 (2010-11-16) > Cython version: 0.17.4 > Python version: 2.7.3 |AnacondaCE 1.3.1 (64-bit)| (default, Jan 7 > 2013, 09:47:12) [MSC v.1500 64 bit (AMD64)] > Byte-ordering: little > Detected cores: 4 > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > Then I get a *lot* of output to standard error - pages and pages of it > - that looks something like this: > > C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning: > compression library ``bzip2`` is not available; using ``zlib`` instead > % (complib, default_complib), FiltersWarning ) > C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning: > compression library ``lzo`` is not available; using ``zlib`` instead > % (complib, default_complib), FiltersWarning ) > HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: > #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type > major: Invalid arguments to routine > minor: Inappropriate type > HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: > #000: ..\..\src\H5A.c line 920 in H5Awrite(): not an attribute > major: Invalid arguments to routine > minor: Inappropriate type > EHDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: > #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type > major: Invalid arguments to routine > minor: Inappropriate type > > Is this something I'm doing wrong or is there something wrong with > the package? > > Any help would be appreciated. > > Thanks, > Jon > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2013-02-15 15:57:16
|
Hi Jon, Unfortunately, I have no way of testing this out. I will say that I have had problems with HDF5 and Anaconda on windows before since they only ship the static *.lib hdf5 libraries. So it may be the case that the pandas -> pytables / hdf5 interface hasn't been properly linked. Barring someone on this list who can test things out for you, you might try grabbing the PyTables source from github and building it on top of your install of Anaconda. Sorry... Be Well Anthony On Fri, Feb 15, 2013 at 3:29 AM, Jon Rowland <row...@gm...> wrote: > Hi - apologies if this is a duplicate, I had an error sending the > first time and wasn't sure if it made it through. > > I have an issue using pandas/HDFStore/pytables in the Anaconda CE > distribution on Windows 64-bit. > > After a little troubleshooting with the Anaconda/pandas lists, it's > been suggested that it might be a pytables issue (or at least some > kind of package mismatch causing pytables not to work). > > I have a clean install of Anaconda 1.3.1 64-bit CE edition on a > Windows 64-bit machine. > > Running the pytables self-test gives the following output: > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > PyTables version: 2.4.0 > HDF5 version: 1.8.9 > NumPy version: 1.6.2 > Numexpr version: 2.0.1 (not using Intel's VML/MKL) > Zlib version: 1.2.3 (in Python interpreter) > Blosc version: 1.1.3 (2010-11-16) > Cython version: 0.17.4 > Python version: 2.7.3 |AnacondaCE 1.3.1 (64-bit)| (default, Jan 7 > 2013, 09:47:12) [MSC v.1500 64 bit (AMD64)] > Byte-ordering: little > Detected cores: 4 > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > Then I get a *lot* of output to standard error - pages and pages of it > - that looks something like this: > > C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning: > compression library ``bzip2`` is not available; using ``zlib`` instead > % (complib, default_complib), FiltersWarning ) > C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning: > compression library ``lzo`` is not available; using ``zlib`` instead > % (complib, default_complib), FiltersWarning ) > HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: > #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type > major: Invalid arguments to routine > minor: Inappropriate type > HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: > #000: ..\..\src\H5A.c line 920 in H5Awrite(): not an attribute > major: Invalid arguments to routine > minor: Inappropriate type > EHDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: > #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type > major: Invalid arguments to routine > minor: Inappropriate type > > Is this something I'm doing wrong or is there something wrong with the > package? > > Any help would be appreciated. > > Thanks, > Jon > > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Jon R. <row...@gm...> - 2013-02-15 09:29:38
|
Hi - apologies if this is a duplicate, I had an error sending the first time and wasn't sure if it made it through. I have an issue using pandas/HDFStore/pytables in the Anaconda CE distribution on Windows 64-bit. After a little troubleshooting with the Anaconda/pandas lists, it's been suggested that it might be a pytables issue (or at least some kind of package mismatch causing pytables not to work). I have a clean install of Anaconda 1.3.1 64-bit CE edition on a Windows 64-bit machine. Running the pytables self-test gives the following output: -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= PyTables version: 2.4.0 HDF5 version: 1.8.9 NumPy version: 1.6.2 Numexpr version: 2.0.1 (not using Intel's VML/MKL) Zlib version: 1.2.3 (in Python interpreter) Blosc version: 1.1.3 (2010-11-16) Cython version: 0.17.4 Python version: 2.7.3 |AnacondaCE 1.3.1 (64-bit)| (default, Jan 7 2013, 09:47:12) [MSC v.1500 64 bit (AMD64)] Byte-ordering: little Detected cores: 4 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Then I get a *lot* of output to standard error - pages and pages of it - that looks something like this: C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning: compression library ``bzip2`` is not available; using ``zlib`` instead % (complib, default_complib), FiltersWarning ) C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning: compression library ``lzo`` is not available; using ``zlib`` instead % (complib, default_complib), FiltersWarning ) HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type major: Invalid arguments to routine minor: Inappropriate type HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: #000: ..\..\src\H5A.c line 920 in H5Awrite(): not an attribute major: Invalid arguments to routine minor: Inappropriate type EHDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0: #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type major: Invalid arguments to routine minor: Inappropriate type Is this something I'm doing wrong or is there something wrong with the package? Any help would be appreciated. Thanks, Jon |
From: Anthony S. <sc...@gm...> - 2013-02-10 17:57:40
|
Thanks for hunting this down Michka, It was a pretty simple change, so I went ahead and merged it in. Be Well Anthony On Sun, Feb 10, 2013 at 9:29 AM, Michka Popoff <mic...@gm...>wrote: > After some (loooong) code browsing I sent a pull request for my problem : > https://github.com/PyTables/PyTables/pull/208 > > Thanks, I was not sure if it was a bug or an intended functionality > preventing me to rename nodes with children. > > Michka > > Le 10 févr. 2013 à 09:08, Anthony Scopatz a écrit : > > Hey Michka, > > This seems like a bug. Please open an issue on github or submit a pull > request if you figure out a fix. Thanks! > > Be Well > Anthony > > > On Sat, Feb 9, 2013 at 4:44 AM, Michka Popoff <mic...@gm...>wrote: > >> Hello >> >> I am not able to rename a node which has parent nodes. The doc doesn't >> specify any restriction to the usage of the renameNode method. >> Here is a small example script to show what I want to achieve : >> >> import tables >> >> # Create file and groups >> file = tables.openFile("test.hdf5", "w") >> file.createGroup("/", "data", "Data") >> file.createGroup("/data", "id", "Single Data") >> file.createGroup("/data/id/", "curves1", "Curve 1") >> file.createGroup("/data/id/", "curves2", "Curve 2") >> >> # Rename (works) >> file.renameNode("/data/id/curves1", "newcurve1") >> >> # Rename (doesn't work) >> file.renameNode("/data/id", "newid") >> >> The first rename will work and rename "/data/id/curves1" to >> "/data/id/newcurve1" >> The second rename will fail with the following traceback : >> >> Traceback (most recent call last): >> File "Rename.py", line 14, in <module> >> file.renameNode("/data/id", "newid") >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", >> line 1157, in renameNode >> obj._f_rename(newname, overwrite) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", >> line 590, in _f_rename >> self._f_move(newname=newname, overwrite=overwrite) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", >> line 674, in _f_move >> self._g_move(newparent, newname) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/group.py", >> line 565, in _g_move >> self._v_file._updateNodeLocations(oldPath, newPath) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", >> line 2368, in _updateNodeLocations >> descendentNode._g_updateLocation(newNodePPath) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", >> line 414, in _g_updateLocation >> file_._refNode(self, newPath) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", >> line 2287, in _refNode >> "file already has a node with path ``%s``" % nodePath >> AssertionError: file already has a node with path ``/data`` >> Closing remaining open files: test.hdf5... done >> Exception AttributeError: "'File' object has no attribute '_aliveNodes'" >> in ignored >> >> Perhaps I can not do what I want to do here, or is there another method I >> should use ? >> >> Thanks in advance >> >> Michka Popoff >> >> >> ------------------------------------------------------------------------------ >> Free Next-Gen Firewall Hardware Offer >> Buy your Sophos next-gen firewall before the end March 2013 >> and get the hardware for free! Learn more. >> http://p.sf.net/sfu/sophos-d2d-feb >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > > http://p.sf.net/sfu/sophos-d2d-feb_______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Michka P. <mic...@gm...> - 2013-02-10 17:29:39
|
After some (loooong) code browsing I sent a pull request for my problem : https://github.com/PyTables/PyTables/pull/208 Thanks, I was not sure if it was a bug or an intended functionality preventing me to rename nodes with children. Michka Le 10 févr. 2013 à 09:08, Anthony Scopatz a écrit : > Hey Michka, > > This seems like a bug. Please open an issue on github or submit a pull request if you figure out a fix. Thanks! > > Be Well > Anthony > > > On Sat, Feb 9, 2013 at 4:44 AM, Michka Popoff <mic...@gm...> wrote: > Hello > > I am not able to rename a node which has parent nodes. The doc doesn't specify any restriction to the usage of the renameNode method. > Here is a small example script to show what I want to achieve : > > import tables > > # Create file and groups > file = tables.openFile("test.hdf5", "w") > file.createGroup("/", "data", "Data") > file.createGroup("/data", "id", "Single Data") > file.createGroup("/data/id/", "curves1", "Curve 1") > file.createGroup("/data/id/", "curves2", "Curve 2") > > # Rename (works) > file.renameNode("/data/id/curves1", "newcurve1") > > # Rename (doesn't work) > file.renameNode("/data/id", "newid") > > The first rename will work and rename "/data/id/curves1" to "/data/id/newcurve1" > The second rename will fail with the following traceback : > > Traceback (most recent call last): > File "Rename.py", line 14, in <module> > file.renameNode("/data/id", "newid") > File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", line 1157, in renameNode > obj._f_rename(newname, overwrite) > File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", line 590, in _f_rename > self._f_move(newname=newname, overwrite=overwrite) > File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", line 674, in _f_move > self._g_move(newparent, newname) > File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/group.py", line 565, in _g_move > self._v_file._updateNodeLocations(oldPath, newPath) > File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", line 2368, in _updateNodeLocations > descendentNode._g_updateLocation(newNodePPath) > File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", line 414, in _g_updateLocation > file_._refNode(self, newPath) > File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", line 2287, in _refNode > "file already has a node with path ``%s``" % nodePath > AssertionError: file already has a node with path ``/data`` > Closing remaining open files: test.hdf5... done > Exception AttributeError: "'File' object has no attribute '_aliveNodes'" in ignored > > Perhaps I can not do what I want to do here, or is there another method I should use ? > > Thanks in advance > > Michka Popoff > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb_______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Anthony S. <sc...@gm...> - 2013-02-10 08:09:17
|
Hey Michka, This seems like a bug. Please open an issue on github or submit a pull request if you figure out a fix. Thanks! Be Well Anthony On Sat, Feb 9, 2013 at 4:44 AM, Michka Popoff <mic...@gm...>wrote: > Hello > > I am not able to rename a node which has parent nodes. The doc doesn't > specify any restriction to the usage of the renameNode method. > Here is a small example script to show what I want to achieve : > > import tables > > # Create file and groups > file = tables.openFile("test.hdf5", "w") > file.createGroup("/", "data", "Data") > file.createGroup("/data", "id", "Single Data") > file.createGroup("/data/id/", "curves1", "Curve 1") > file.createGroup("/data/id/", "curves2", "Curve 2") > > # Rename (works) > file.renameNode("/data/id/curves1", "newcurve1") > > # Rename (doesn't work) > file.renameNode("/data/id", "newid") > > The first rename will work and rename "/data/id/curves1" to > "/data/id/newcurve1" > The second rename will fail with the following traceback : > > Traceback (most recent call last): > File "Rename.py", line 14, in <module> > file.renameNode("/data/id", "newid") > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", > line 1157, in renameNode > obj._f_rename(newname, overwrite) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", > line 590, in _f_rename > self._f_move(newname=newname, overwrite=overwrite) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", > line 674, in _f_move > self._g_move(newparent, newname) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/group.py", > line 565, in _g_move > self._v_file._updateNodeLocations(oldPath, newPath) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", > line 2368, in _updateNodeLocations > descendentNode._g_updateLocation(newNodePPath) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", > line 414, in _g_updateLocation > file_._refNode(self, newPath) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", > line 2287, in _refNode > "file already has a node with path ``%s``" % nodePath > AssertionError: file already has a node with path ``/data`` > Closing remaining open files: test.hdf5... done > Exception AttributeError: "'File' object has no attribute '_aliveNodes'" > in ignored > > Perhaps I can not do what I want to do here, or is there another method I > should use ? > > Thanks in advance > > Michka Popoff > > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Michka P. <mic...@gm...> - 2013-02-09 12:45:07
|
Hello I am not able to rename a node which has parent nodes. The doc doesn't specify any restriction to the usage of the renameNode method. Here is a small example script to show what I want to achieve : import tables # Create file and groups file = tables.openFile("test.hdf5", "w") file.createGroup("/", "data", "Data") file.createGroup("/data", "id", "Single Data") file.createGroup("/data/id/", "curves1", "Curve 1") file.createGroup("/data/id/", "curves2", "Curve 2") # Rename (works) file.renameNode("/data/id/curves1", "newcurve1") # Rename (doesn't work) file.renameNode("/data/id", "newid") The first rename will work and rename "/data/id/curves1" to "/data/id/newcurve1" The second rename will fail with the following traceback : Traceback (most recent call last): File "Rename.py", line 14, in <module> file.renameNode("/data/id", "newid") File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", line 1157, in renameNode obj._f_rename(newname, overwrite) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", line 590, in _f_rename self._f_move(newname=newname, overwrite=overwrite) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", line 674, in _f_move self._g_move(newparent, newname) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/group.py", line 565, in _g_move self._v_file._updateNodeLocations(oldPath, newPath) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", line 2368, in _updateNodeLocations descendentNode._g_updateLocation(newNodePPath) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py", line 414, in _g_updateLocation file_._refNode(self, newPath) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py", line 2287, in _refNode "file already has a node with path ``%s``" % nodePath AssertionError: file already has a node with path ``/data`` Closing remaining open files: test.hdf5... done Exception AttributeError: "'File' object has no attribute '_aliveNodes'" in ignored Perhaps I can not do what I want to do here, or is there another method I should use ? Thanks in advance Michka Popoff |
From: Gaëtan de M. <gde...@gm...> - 2013-02-08 14:04:56
|
On Wed, Jan 23, 2013 at 5:33 PM, Jeff Reback <jr...@ya...> wrote: > It seems there is a limit to the condition sytax when using readWhere > > I get various exceptions when passing increasing number of terms > > is this some kind of hard coded limit? > This is a limitation of numexpr. FWIW, this is due to the fact that the number of numexpr internal "registers" (including temporary ones) is implicitly limited to 256 because they are coded in a single character in its internal representation of your expression (internally a string called "program"). In your case, Numexpr could theoretically do a much better job of allocating temporary registers, so you could add a feature request at https://code.google.com/p/numexpr/issues/ (but don't hold your breath on it). In the meantime, the best workaround I know of is to read chunks out of pytables and call numexpr manually on them (because in that case you can simply split your expression in multiple smaller exprs): filter1 = ne.evaluate(c1 | c2 | c3 | ... | cx) filter2 = ne.evaluate(filter1 | cy | ...) -- Gaëtan de Menten |
From: David R. <dav...@gm...> - 2013-02-04 20:49:39
|
Thats the error I was getting after modifying my tables.py file, so its good to see that you are getting it too. I'll dig into it more. On Mon, Feb 4, 2013 at 3:44 PM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 9 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 4 Feb 2013 14:43:37 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 9 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hey David, > > I am getting the following error now: > > scopatz@ares ~ $ python t.py > 10669890 Comparisons > Traceback (most recent call last): > File "t.py", line 61, in <module> > get_hd() > File "t.py", line 54, in get_hd > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > range(N_irises)), 2): > File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > line 3308, in __iter__ > out=buf_slice) > File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > line 1807, in read > arr = self._read(start, stop, step, field, out) > File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", > line 1732, in _read > bytes_required)) > ValueError: output array size invalid, got 4620 bytes, need 753984000 bytes > > And I had to change the phasors line to ths following: > > r['phasors'] = np.empty((17, 20*240), complex) > > Thanks. > Be Well > Anthony > > > > On Mon, Feb 4, 2013 at 1:41 PM, David Reed <dav...@gm...> wrote: > > > I didn't have any luck. I replaced that __iter__ function which led to > me > > replacing the read function which lead to me replaceing the _read > function > > and I eventually got another error. > > > > Below are 2 functions and my HDF5 Table class declaration. They should > be > > self explanatory. I wasn't sure if attachments would go through and this > > is pretty small, so I figured it would be ok just to post. I apologize > if > > this is a bit cluttered. I would also appreciate any comments on how I > > assign the results to the matrix D, this does not seem very pythonic at > all > > and could use some advice there if its easy. (the ii*jj is just a place > > holder for a more sophisticated measure). Thanks again! > > > > import numpy as np > > import tables as tb > > > > class Iris(tb.IsDescription): > > subject_id = tb.IntCol() > > iris_id = tb.IntCol() > > database = tb.StringCol(5) > > is_left = tb.BoolCol() > > is_flipped = tb.BoolCol() > > templates = tb.BoolCol(shape=(17, 20*480)) > > masks1 = tb.BoolCol(shape=(17, 20*480)) > > phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) > > masks2 = tb.BoolCol(shape=(17, 20*240)) > > > > > > def create_hdf5(): > > """ > > """ > > with tb.openFile('test.h5', 'w') as f: > > > > # Create and fill the table of irises", > > irises = f.createTable(f.root, 'irises', Iris, 'Irises', > > filters=tb.Filters(1)) > > for ii in range(4620): > > > > r = irises.row > > r['subject_id'] = ii > > r['iris_id'] = 0 > > r['database'] = 'test' > > r['is_left'] = True > > r['is_flipped'] = False > > r['templates'] = np.empty((17, 20*480), np.bool8) > > r['masks1'] = np.empty((17, 20*480), np.bool8) > > r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) > > r['masks2'] = np.empty((17, 20*240), np.bool8) > > r.append() > > > > irises.flush() > > > > def get_hd(): > > """ > > """ > > from itertools import combinations, izip > > with tb.openFile('test.h5') as f: > > irises = f.root.irises > > > > templates = f.root.irises.cols.templates > > masks = f.root.irises.cols.masks1 > > > > N_irises = len(irises) > > > > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > > D = np.empty((N_irises, N_irises)) > > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > > range(N_irises)), 2): > > D[ii, jj] = ii*jj > > > > np.save('test', D) > > > > > > > > > > On Mon, Feb 4, 2013 at 11:16 AM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Mon, 4 Feb 2013 10:16:24 -0600 > >> From: Anthony Scopatz <sc...@gm...> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> < > >> CAP...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> > >> wrote: > >> > >> > Hi Josh, > >> > > >> > Here is my __iter__ code: > >> > > >> > def __iter__(self): > >> > table = self.table > >> > itemsize = self.dtype.itemsize > >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // > itemsize > >> > max_row = len(self) > >> > for start_row in xrange(0, len(self), nrowsinbuf): > >> > end_row = min([start_row + nrowsinbuf, max_row]) > >> > buf = table.read(start_row, end_row, 1, > field=self.pathname) > >> > for row in buf: > >> > yield row > >> > > >> > It does look different, I will try swapping in the code from github > and > >> > see what happens. > >> > > >> > >> Yes, please let us know how that goes! Otherwise send the list both the > >> test data generator script and the script that fails. > >> > >> Be Well > >> Anthony > >> > >> > >> > > >> > > >> > On Mon, Feb 4, 2013 at 9:59 AM, < > >> > pyt...@li...> wrote: > >> > > >> >> Send Pytables-users mailing list submissions to > >> >> pyt...@li... > >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> or, via email, send a message with subject or body 'help' to > >> >> pyt...@li... > >> >> > >> >> You can reach the person managing the list at > >> >> pyt...@li... > >> >> > >> >> When replying, please edit your Subject line so it is more specific > >> >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> > >> >> Today's Topics: > >> >> > >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) > >> >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) > >> >> > >> >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> Message: 1 > >> >> Date: Fri, 1 Feb 2013 14:08:47 -0800 > >> >> From: Josh Ayers <jos...@gm...> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > >> >> To: Discussion list for PyTables > >> >> <pyt...@li...> > >> >> Message-ID: > >> >> <CACOB4aPG4NZ6b2a3v= > >> >> 1Ue...@ma...> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> David, > >> >> > >> >> You added a custom version of table.Column.__iter__, correct? Could > >> you > >> >> also include that along with the script to reproduce the error? > >> >> > >> >> It seems like the problem may be in the 'nrowsinbuf' calculation - > see > >> >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using > >> the > >> >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows > of 6 > >> >> chunks. Instead, it's reading the entire table. > >> >> > >> >> [1]: > >> >> > >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 > >> >> > >> >> > >> >> > >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> > >> >> wrote: > >> >> > >> >> > > >> >> > > >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm... > > > >> >> wrote: > >> >> > > >> >> >> at the error: > >> >> >> > >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> >> > >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> >> > >> >> >> I'm not sure what that means as a dtype, but thats what it is. > >> >> >> > >> >> >> Forgive me if I'm being totally naive, but I thought the whole > >> point of > >> >> >> __iter__ with pyttables was to do iteration on the fly, so there > is > >> no > >> >> >> preallocation. > >> >> >> > >> >> > > >> >> > Nope you are not being naive at all. That is the point. > >> >> > > >> >> > > >> >> >> If you have any ideas on this I'm all ears. > >> >> >> > >> >> > > >> >> > If you could send a minimal script which reproduces this error, > that > >> >> would > >> >> > help a lot. > >> >> > > >> >> > Be Well > >> >> > Anthony > >> >> > > >> >> > > >> >> >> > >> >> >> > >> >> >> Thanks again. > >> >> >> > >> >> >> Dave > >> >> >> > >> >> >> > >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> >> pyt...@li...> wrote: > >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >>> or, via email, send a message with subject or body 'help' to > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> You can reach the person managing the list at > >> >> >>> pyt...@li... > >> >> >>> > >> >> >>> When replying, please edit your Subject line so it is more > specific > >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> >>> > >> >> >>> > >> >> >>> Today's Topics: > >> >> >>> > >> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony > Scopatz) > >> >> >>> > >> >> >>> > >> >> >>> > >> ---------------------------------------------------------------------- > >> >> >>> > >> >> >>> Message: 1 > >> >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >> >>> From: Anthony Scopatz <sc...@gm...> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > Issue > >> 2 > >> >> >>> To: Discussion list for PyTables > >> >> >>> <pyt...@li...> > >> >> >>> Message-ID: > >> >> >>> < > >> >> >>> > CAP...@ma... > >> > > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> >>> > >> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > >> dav...@gm...> > >> >> >>> wrote: > >> >> >>> > >> >> >>> > Hi Anthony, > >> >> >>> > > >> >> >>> > Thanks for the reply. > >> >> >>> > > >> >> >>> > I honestly don't know how to monitor my Python memory usage, > but > >> I'm > >> >> >>> sure > >> >> >>> > that its caused by out of memory. > >> >> >>> > > >> >> >>> > >> >> >>> Well, I would just run top or process monitor or something while > >> >> running > >> >> >>> the python script to see what happens to memory usage as the > script > >> >> chugs > >> >> >>> along... > >> >> >>> > >> >> >>> > >> >> >>> > I'm just trying to find out how to fix it. My HDF5 table has > >> 4620 > >> >> >>> rows > >> >> >>> > and the column I'm iterating over is a 17x9600 boolean matrix. > >> The > >> >> >>> > __iter__ method is preallocating an array that is this size > which > >> >> >>> appears > >> >> >>> > to be root of the error. I was hoping there is a fix somewhere > >> in > >> >> >>> here to > >> >> >>> > not have to do this preallocation. > >> >> >>> > > >> >> >>> > >> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. > >> 4620 of > >> >> >>> these is ~760 MB. If you have 2 GB of memory and you are > iterating > >> >> over > >> >> >>> 2 > >> >> >>> of these (templates & masks) it is conceivable that you are just > >> >> running > >> >> >>> out of memory. Maybe there is a way that __iter__ could not > >> >> preallocate > >> >> >>> something that is basically a temporary. What is the dtype of > the > >> >> >>> templates array? > >> >> >>> > >> >> >>> Be Well > >> >> >>> Anthony > >> >> >>> > >> >> >>> > >> >> >>> > > >> >> >>> > Thanks again. > >> >> >>> > >> >> >>> > >> >> -------------- next part -------------- > >> >> An HTML attachment was scrubbed... > >> >> > >> >> ------------------------------ > >> >> > >> >> Message: 2 > >> >> Date: Mon, 4 Feb 2013 09:58:53 -0500 > >> >> From: David Reed <dav...@gm...> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 > >> >> To: pyt...@li... > >> >> Message-ID: > >> >> <CAM6XA7= > >> >> h50...@ma...> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> Hi Anthony, > >> >> > >> >> Sorry to just get back to you. I can send a script, should I send a > >> script > >> >> that creates some fake data as well? > >> >> > >> >> -Dave > >> >> > >> >> > >> >> On Fri, Feb 1, 2013 at 4:50 PM, < > >> >> pyt...@li...> wrote: > >> >> > >> >> > Send Pytables-users mailing list submissions to > >> >> > pyt...@li... > >> >> > > >> >> > To subscribe or unsubscribe via the World Wide Web, visit > >> >> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > or, via email, send a message with subject or body 'help' to > >> >> > pyt...@li... > >> >> > > >> >> > You can reach the person managing the list at > >> >> > pyt...@li... > >> >> > > >> >> > When replying, please edit your Subject line so it is more specific > >> >> > than "Re: Contents of Pytables-users digest..." > >> >> > > >> >> > > >> >> > Today's Topics: > >> >> > > >> >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) > >> >> > > >> >> > > >> >> > > >> ---------------------------------------------------------------------- > >> >> > > >> >> > Message: 1 > >> >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 > >> >> > From: Anthony Scopatz <sc...@gm...> > >> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue > 4 > >> >> > To: Discussion list for PyTables > >> >> > <pyt...@li...> > >> >> > Message-ID: > >> >> > < > >> >> > CAP...@ma... > > > >> >> > Content-Type: text/plain; charset="iso-8859-1" > >> >> > > >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm... > > > >> >> wrote: > >> >> > > >> >> > > at the error: > >> >> > > > >> >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> > > > >> >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> > > > >> >> > > I'm not sure what that means as a dtype, but thats what it is. > >> >> > > > >> >> > > Forgive me if I'm being totally naive, but I thought the whole > >> point > >> >> of > >> >> > > __iter__ with pyttables was to do iteration on the fly, so there > >> is no > >> >> > > preallocation. > >> >> > > > >> >> > > >> >> > Nope you are not being naive at all. That is the point. > >> >> > > >> >> > > >> >> > > If you have any ideas on this I'm all ears. > >> >> > > > >> >> > > >> >> > If you could send a minimal script which reproduces this error, > that > >> >> would > >> >> > help a lot. > >> >> > > >> >> > Be Well > >> >> > Anthony > >> >> > > >> >> > > >> >> > > > >> >> > > > >> >> > > Thanks again. > >> >> > > > >> >> > > Dave > >> >> > > > >> >> > > > >> >> > > On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> > > pyt...@li...> wrote: > >> >> > > > >> >> > >> Send Pytables-users mailing list submissions to > >> >> > >> pyt...@li... > >> >> > >> > >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> or, via email, send a message with subject or body 'help' to > >> >> > >> pyt...@li... > >> >> > >> > >> >> > >> You can reach the person managing the list at > >> >> > >> pyt...@li... > >> >> > >> > >> >> > >> When replying, please edit your Subject line so it is more > >> specific > >> >> > >> than "Re: Contents of Pytables-users digest..." > >> >> > >> > >> >> > >> > >> >> > >> Today's Topics: > >> >> > >> > >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony > Scopatz) > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> > >> >> > >> Message: 1 > >> >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >> > >> From: Anthony Scopatz <sc...@gm...> > >> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, > >> Issue 2 > >> >> > >> To: Discussion list for PyTables > >> >> > >> <pyt...@li...> > >> >> > >> Message-ID: > >> >> > >> < > >> >> > >> > >> CAP...@ma...> > >> >> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> > >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > >> dav...@gm...> > >> >> > >> wrote: > >> >> > >> > >> >> > >> > Hi Anthony, > >> >> > >> > > >> >> > >> > Thanks for the reply. > >> >> > >> > > >> >> > >> > I honestly don't know how to monitor my Python memory usage, > but > >> >> I'm > >> >> > >> sure > >> >> > >> > that its caused by out of memory. > >> >> > >> > > >> >> > >> > >> >> > >> Well, I would just run top or process monitor or something while > >> >> running > >> >> > >> the python script to see what happens to memory usage as the > >> script > >> >> > chugs > >> >> > >> along... > >> >> > >> > >> >> > >> > >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table has > >> 4620 > >> >> > rows > >> >> > >> > and the column I'm iterating over is a 17x9600 boolean matrix. > >> The > >> >> > >> > __iter__ method is preallocating an array that is this size > >> which > >> >> > >> appears > >> >> > >> > to be root of the error. I was hoping there is a fix > somewhere > >> in > >> >> > here > >> >> > >> to > >> >> > >> > not have to do this preallocation. > >> >> > >> > > >> >> > >> > >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. > >> 4620 > >> >> of > >> >> > >> these is ~760 MB. If you have 2 GB of memory and you are > >> iterating > >> >> > over 2 > >> >> > >> of these (templates & masks) it is conceivable that you are just > >> >> running > >> >> > >> out of memory. Maybe there is a way that __iter__ could not > >> >> preallocate > >> >> > >> something that is basically a temporary. What is the dtype of > the > >> >> > >> templates array? > >> >> > >> > >> >> > >> Be Well > >> >> > >> Anthony > >> >> > >> > >> >> > >> > >> >> > >> > > >> >> > >> > Thanks again. > >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < > >> >> > >> > pyt...@li...> wrote: > >> >> > >> > > >> >> > >> >> Send Pytables-users mailing list submissions to > >> >> > >> >> pyt...@li... > >> >> > >> >> > >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> >> > >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> or, via email, send a message with subject or body 'help' to > >> >> > >> >> pyt...@li... > >> >> > >> >> > >> >> > >> >> You can reach the person managing the list at > >> >> > >> >> pyt...@li... > >> >> > >> >> > >> >> > >> >> When replying, please edit your Subject line so it is more > >> >> specific > >> >> > >> >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> Today's Topics: > >> >> > >> >> > >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony > >> Scopatz) > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > > >> ---------------------------------------------------------------------- > >> >> > >> >> > >> >> > >> >> Message: 1 > >> >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > >> >> > >> >> From: Anthony Scopatz <sc...@gm...> > >> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > >> >> Issue 9 > >> >> > >> >> To: Discussion list for PyTables > >> >> > >> >> <pyt...@li...> > >> >> > >> >> Message-ID: > >> >> > >> >> < > >> >> > >> >> > >> >> CAP...@ma...> > >> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> > >> >> > >> >> Hi David, > >> >> > >> >> > >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to be > >> >> getting > >> >> > a > >> >> > >> >> memory error on creating a numpy array. This kind of thing > >> >> typically > >> >> > >> >> happens when you are out of memory. Does this seem to be the > >> case > >> >> > with > >> >> > >> >> you? When this dies, is your memory usage at 100%? If so, > >> this > >> >> > >> algorithm > >> >> > >> >> might require a little tweaking... > >> >> > >> >> > >> >> > >> >> Be Well > >> >> > >> >> Anthony > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < > >> >> dav...@gm...> > >> >> > >> >> wrote: > >> >> > >> >> > >> >> > >> >> > I'm still having problems with this one. I can't tell if > >> this > >> >> > >> something > >> >> > >> >> > dumb Im doing with itertools, or if its something in > >> pytables. > >> >> > >> >> > > >> >> > >> >> > Would appreciate any help. > >> >> > >> >> > > >> >> > >> >> > Thanks > >> >> > >> >> > > >> >> > >> >> > > >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < > >> >> > dav...@gm... > >> >> > >> >> >wrote: > >> >> > >> >> > > >> >> > >> >> >> I think I have to reopen this issue. I have been running > >> fine > >> >> for > >> >> > >> >> awhile > >> >> > >> >> >> using the combinations method from itertools, but have > >> recently > >> >> > run > >> >> > >> >> into a > >> >> > >> >> >> memory since I have recently quadrupled the size of the > hdf > >> >> file. > >> >> > >> >> >> > >> >> > >> >> >> Here is my code again: > >> >> > >> >> >> > >> >> > >> >> >> from itertools import combinations, izip > >> >> > >> >> >> with tb.openFile(h5_all, 'r') as f: > >> >> > >> >> >> irises = f.root.irises > >> >> > >> >> >> > >> >> > >> >> >> templates = f.root.irises.cols.templates > >> >> > >> >> >> masks = f.root.irises.cols.masks1 > >> >> > >> >> >> > >> >> > >> >> >> N_irises = len(irises) > >> >> > >> >> >> index = np.ones((20 * 480), np.bool) > >> >> > >> >> >> > >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> >> > >> >> >> D = np.empty((N_irises, N_irises)) > >> >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in > >> combinations(izip(templates, > >> >> > >> masks, > >> >> > >> >> >> range(N_irises)), 2): > >> >> > >> >> >> # print ii > >> >> > >> >> >> D[ii, jj] = ham_dist( > >> >> > >> >> >> t1[8, index], > >> >> > >> >> >> t2[:, index], > >> >> > >> >> >> m1[8, index], > >> >> > >> >> >> m2[:, index], > >> >> > >> >> >> ) > >> >> > >> >> >> > >> >> > >> >> >> And here is the error: > >> >> > >> >> >> > >> >> > >> >> >> In [10]: get_hd3() > >> >> > >> >> >> 10669890 Comparisons > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > --------------------------------------------------------------------------- > >> >> > >> >> >> MemoryError Traceback (most > >> >> recent > >> >> > >> call > >> >> > >> >> >> last) > >> >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > >> >> > >> >> >> ----> 1 get_hd3() > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> 118 print '%i Comparisons' % > >> >> > >> (N_irises*(N_irises - > >> >> > >> >> >> 1)/2) > >> >> > >> >> >> 119 D = np.empty((N_irises, N_irises)) > >> >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in > >> >> > >> >> >> combinations(izip(temp > >> >> > >> >> >> lates, masks, range(N_irises)), 2): > >> >> > >> >> >> 121 # print ii > >> >> > >> >> >> 122 D[ii, jj] = ham_dist( > >> >> > >> >> >> > >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> >> __iter__(self) > >> >> > >> >> >> 3274 for start_row in xrange(0, len(self), > >> >> nrowsinbuf): > >> >> > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, > >> >> > max_row]) > >> >> > >> >> >> -> 3276 buf = table.read(start_row, end_row, > 1, > >> >> > >> >> >> field=self.pathname) > >> >> > >> >> >> > >> >> > >> >> >> 3277 for row in buf: > >> >> > >> >> >> 3278 yield row > >> >> > >> >> >> > >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > read(self, > >> >> > start, > >> >> > >> >> stop, > >> >> > >> >> >> step, > >> >> > >> >> >> field) > >> >> > >> >> >> 1772 (start, stop, step) = > >> >> > self._processRangeRead(start, > >> >> > >> >> stop, > >> >> > >> >> >> step) > >> >> > >> >> >> 1773 > >> >> > >> >> >> -> 1774 arr = self._read(start, stop, step, field) > >> >> > >> >> >> 1775 return internal_to_flavor(arr, > self.flavor) > >> >> > >> >> >> 1776 > >> >> > >> >> >> > >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> _read(self, > >> >> > start, > >> >> > >> >> >> stop, step, > >> >> > >> >> >> field) > >> >> > >> >> >> 1719 if field: > >> >> > >> >> >> 1720 # Create a container for the results > >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> >> > >> dtype=dtypeField) > >> >> > >> >> >> 1722 else: > >> >> > >> >> >> 1723 # Recarray case > >> >> > >> >> >> > >> >> > >> >> >> MemoryError: > >> >> > >> >> >> > > c:\python27\lib\site-packages\tables\table.py(1721)_read() > >> >> > >> >> >> 1720 # Create a container for the results > >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> >> > >> dtype=dtypeField) > >> >> > >> >> >> 1722 else: > >> >> > >> >> >> > >> >> > >> >> >> Also, if you guys see any performance problems in my code, > >> >> please > >> >> > >> let > >> >> > >> >> me > >> >> > >> >> >> know. > >> >> > >> >> >> > >> >> > >> >> >> Thank you so much for the help. > >> >> > >> >> >> > >> >> > >> >> >> -Dave > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > >> >> > >> >> >> pyt...@li...> wrote: > >> >> > >> >> >> > >> >> > >> >> >>> Send Pytables-users mailing list submissions to > >> >> > >> >> >>> pyt...@li... > >> >> > >> >> >>> > >> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >> > >> >> >>> > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> or, via email, send a message with subject or body 'help' > >> to > >> >> > >> >> >>> pyt...@li... > >> >> > >> >> >>> > >> >> > >> >> >>> You can reach the person managing the list at > >> >> > >> >> >>> pyt...@li... > >> >> > >> >> >>> > >> >> > >> >> >>> When replying, please edit your Subject line so it is > more > >> >> > specific > >> >> > >> >> >>> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> Today's Topics: > >> >> > >> >> >>> > >> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David > >> Reed) > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> >>> > >> >> > >> >> >>> Message: 1 > >> >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > >> >> > >> >> >>> From: David Reed <dav...@gm...> > >> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol > >> 80, > >> >> > Issue > >> >> > >> 8 > >> >> > >> >> >>> To: pyt...@li... > >> >> > >> >> >>> Message-ID: > >> >> > >> >> >>> < > >> >> > >> >> >>> > >> >> > CAM...@ma... > >> >> > >> > > >> >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > >> >> > >> >> >>> I can't thank you guys enough for the help. I was able > to > >> add > >> >> > the > >> >> > >> >> >>> __iter__ > >> >> > >> >> >>> function to the table.py file and everything seems to be > >> >> working > >> >> > >> >> great! > >> >> > >> >> >>> I'm not quite as fast as I was with iterating right of a > >> >> matrix > >> >> > >> but > >> >> > >> >> >>> pretty > >> >> > >> >> >>> close. I was at 555 comparisons per second, and now im > at > >> >> 420. > >> >> > >> >> >>> > >> >> > >> >> >>> I handled the problem I mentioned earlier by doing this, > >> and > >> >> it > >> >> > >> seems > >> >> > >> >> to > >> >> > >> >> >>> work great: > >> >> > >> >> >>> > >> >> > >> >> >>> A = f.root.data.cols.A > >> >> > >> >> >>> B = f.root.data.cols.B > >> >> > >> >> >>> > >> >> > >> >> >>> D = np.empty((len(A), len(A)) > >> >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, > >> >> > >> >> range(len(A))), > >> >> > >> >> >>> 2): > >> >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > >> >> > >> >> >>> > >> >> > >> >> >>> Again, thanks a lot. > >> >> > >> >> >>> > >> >> > >> >> >>> -Dave > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > >> >> > >> >> >>> pyt...@li...> wrote: > >> >> > >> >> >>> > >> >> > >> >> >>> > Send Pytables-users mailing list submissions to > >> >> > >> >> >>> > pyt...@li... > >> >> > >> >> >>> > > >> >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, > visit > >> >> > >> >> >>> > > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > or, via email, send a message with subject or body > >> 'help' to > >> >> > >> >> >>> > pyt...@li... > >> >> > >> >> >>> > > >> >> > >> >> >>> > You can reach the person managing the list at > >> >> > >> >> >>> > pyt...@li... > >> >> > >> >> >>> > > >> >> > >> >> >>> > When replying, please edit your Subject line so it is > >> more > >> >> > >> specific > >> >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > Today's Topics: > >> >> > >> >> >>> > > >> >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 > (Anthony > >> >> > >> Scopatz) > >> >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 > (Anthony > >> >> > >> Scopatz) > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> > >> >> > > >> ---------------------------------------------------------------------- > >> >> > >> >> >>> > > >> >> > >> >> >>> > Message: 1 > >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, > Vol > >> 80, > >> >> > >> Issue 3 > >> >> > >> >> >>> > To: Discussion list for PyTables > >> >> > >> >> >>> > <pyt...@li...> > >> >> > >> >> >>> > Message-ID: > >> >> > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > >> >> > >> >> >>> > Gz...@ma...> > >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > > >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > >> >> > >> dav...@gm...> > >> >> > >> >> >>> wrote: > >> >> > >> >> >>> > > >> >> > >> >> >>> > > Thanks a lot for the help so far guys! > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > Looking at itertools, I found what I believe to be > the > >> >> > perfect > >> >> > >> >> >>> function > >> >> > >> >> >>> > > for what I need, itertools.combinations. This appears > >> to > >> >> be a > >> >> > >> >> valid > >> >> > >> >> >>> > > replacement to the method proposed. > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > >> >> > >> >> >>> > Yes, combinations is awesome! > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > There is a small problem that I didn't mention is > that > >> my > >> >> > >> compare > >> >> > >> >> >>> > function > >> >> > >> >> >>> > > actually takes as inputs 2 columns from the table. > Like > >> >> so: > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) > >> >> > >> >> >>> > > for ii in xrange(N_elements): > >> >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): > >> >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], > >> >> > >> >> >>> > data['element1'][jj],data['element2'][ii], > >> >> > >> >> >>> > > data['element2'][jj]) > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > Is there an efficient way of using itertools with > this > >> >> > >> structure? > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > >> >> > >> >> >>> > You can always make two other iterators for each > column. > >> >> Since > >> >> > >> you > >> >> > >> >> >>> have > >> >> > >> >> >>> > two columns you would have 4 iterators. I am not sure > >> how > >> >> fast > >> >> > >> >> this is > >> >> > >> >> >>> > going to be but I am confident that there is > definitely a > >> >> way > >> >> > to > >> >> > >> do > >> >> > >> >> >>> this in > >> >> > >> >> >>> > one for-loop, which is going to be way faster than > nested > >> >> > loops. > >> >> > >> >> >>> > > >> >> > >> >> >>> > Be Well > >> >> > >> >> >>> > Anthony > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > >> >> > >> >> >>> > > pyt...@li...> wrote: > >> >> > >> >> >>> > > > >> >> > >> >> >>> > >> Send Pytables-users mailing list submissions to > >> >> > >> >> >>> > >> pyt...@li... > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, > >> visit > >> >> > >> >> >>> > >> > >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >> or, via email, send a message with subject or body > >> >> 'help' to > >> >> > >> >> >>> > >> > pyt...@li... > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> You can reach the person managing the list at > >> >> > >> >> >>> > >> pyt...@li... > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> When replying, please edit your Subject line so it > is > >> >> more > >> >> > >> >> specific > >> >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Today's Topics: > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables > >> (Josh > >> >> > Ayers) > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> >> > >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Message: 1 > >> >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> > >> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of > HDF5 > >> >> using > >> >> > >> >> >>> PyTables > >> >> > >> >> >>> > >> To: Discussion list for PyTables > >> >> > >> >> >>> > >> <pyt...@li...> > >> >> > >> >> >>> > >> Message-ID: > >> >> > >> >> >>> > >> < > >> >> > >> >> >>> > >> > >> >> > >> >> > >> >> CAC...@ma...> > >> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> David, > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> The change in issue 27 was only for iteration over a > >> >> > >> >> tables.Column > >> >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as > follows. > >> >> This > >> >> > >> will > >> >> > >> >> >>> > iterate > >> >> > >> >> >>> > >> over the "element" column, as in your original > >> example. > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Note also that this will only work with the > >> development > >> >> > >> version > >> >> > >> >> of > >> >> > >> >> >>> > >> PyTables > >> >> > >> >> >>> > >> available on github. It will be very slow using the > >> >> > released > >> >> > >> >> >>> v2.4.0. > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> from itertools import izip > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> with tb.openFile(...) as f: > >> >> > >> >> >>> > >> data = f.root.data.cols.element > >> >> > >> >> >>> > >> data_i = iter(data) > >> >> > >> >> >>> > >> data_j = iter(data) > >> >> > >> >> >>> > >> data_i.next() # throw the first value away > >> >> > >> >> >>> > >> for i, j in izip(data_i, data_j): > >> >> > >> >> >>> > >> compare(i, j) > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Hope that helps, > >> >> > >> >> >>> > >> Josh > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> >> > >> >> sc...@gm...> > >> >> > >> >> >>> > >> wrote: > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > HI David, > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > Tables and table column iteration have been > >> overhauled > >> >> > >> fairly > >> >> > >> >> >>> recently > >> >> > >> >> >>> > >> > [1]. So you might try creating two iterators, > >> offset > >> >> by > >> >> > >> one, > >> >> > >> >> and > >> >> > >> >> >>> then > >> >> > >> >> >>> > >> > doing the comparison. I am hacking this out super > >> >> quick > >> >> > so > >> >> > >> >> please > >> >> > >> >> >>> > >> forgive > >> >> > >> >> >>> > >> > me: > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > from itertools import izip > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > with tb.openFile(...) as f: > >> >> > >> >> >>> > >> > data = f.root.data > >> >> > >> >> >>> > >> > data_i = iter(data) > >> >> > >> >> >>> > >> > data_j = iter(data) > >> >> > >> >> >>> > >> > data_i.next() # throw the first value away > >> >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): > >> >> > >> >> >>> > >> > compare(i, j) > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > You get the idea ;) > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > Be Well > >> >> > >> >> >>> > >> > Anthony > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> >> > >> >> >>> dav...@gm...> > >> >> > >> >> >>> > >> wrote: > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> >> I was hoping someone could help me out here. > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store in > >> HDF5 > >> >> and > >> >> > >> >> access > >> >> > >> >> >>> > using > >> >> > >> >> >>> > >> >> PyTables. One operation I need to do on this > >> dataset > >> >> are > >> >> > >> >> pairwise > >> >> > >> >> >>> > >> >> comparisons between each of the elements. This > >> >> requires 2 > >> >> > >> >> loops, > >> >> > >> >> >>> one > >> >> > >> >> >>> > to > >> >> > >> >> >>> > >> >> iterate over each element, and an inner loop to > >> >> iterate > >> >> > >> over > >> >> > >> >> >>> every > >> >> > >> >> >>> > >> other > >> >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 > >> >> > comparisons. > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster to > >> dump > >> >> the > >> >> > >> >> >>> contents > >> >> > >> >> >>> > >> into a > >> >> > >> >> >>> > >> >> multdimensional numpy array and then do my > >> iteration. > >> >> I > >> >> > run > >> >> > >> >> into > >> >> > >> >> >>> > >> problems > >> >> > >> >> >>> > >> >> with large sets because of memory issues and need > >> to > >> >> > access > >> >> > >> >> each > >> >> > >> >> >>> > >> element of > >> >> > >> >> >>> > >> >> the dataset at run time. > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> Putting the elements into an array gives me about > >> 600 > >> >> > >> >> >>> comparisons per > >> >> > >> >> >>> > >> >> second, while operating on hdf5 data itself gives > >> me > >> >> > about > >> >> > >> 300 > >> >> > >> >> >>> > >> comparisons > >> >> > >> >> >>> > >> >> per second. > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> Is there a way to speed this process up? > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> Example follows (this is not my real code, just > an > >> >> > >> example): > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> *Small Set*: > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> > >> >> >>> > >> >> data = f.root.data > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): > >> >> > >> >> >>> > >> >> elements[ii] = data['element'] > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > >> >> > >> >> xrange(N_elements): > >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], > >> >> elements[jj]) > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> *Large Set*: > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> >> > >> >> >>> > >> >> data = f.root.data > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> N_elements = len(data) > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) > >> >> > >> >> >>> > >> >> for ii in xrange(N_elements): > >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> >> > >> >> >>> > >> >> D[ii, jj] = > >> compare(data['element'][ii], > >> >> > >> >> >>> > >> data['element'][jj]) > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > ------------------------------------------------------------------------------ > >> >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> >> 2012, > >> >> > >> >> HTML5, > >> >> > >> >> >>> CSS, > >> >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> >> your > >> >> > >> >> skills > >> >> > >> >> >>> > current > >> >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> >> by > >> >> > >> >> >>> Microsoft > >> >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- > learn > >> >> more > >> >> > at: > >> >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > >> >> > >> >> >>> > >> >> _______________________________________________ > >> >> > >> >> >>> > >> >> Pytables-users mailing list > >> >> > >> >> >>> > >> >> Pyt...@li... > >> >> > >> >> >>> > >> >> > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> >> > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > ------------------------------------------------------------------------------ > >> >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> >> 2012, > >> >> > >> >> HTML5, > >> >> > >> >> >>> CSS, > >> >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> >> your > >> >> > >> skills > >> >> > >> >> >>> > current > >> >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> >> by > >> >> > >> >> Microsoft > >> >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn > >> more > >> >> > at: > >> >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > >> >> > >> >> >>> > >> > _______________________________________________ > >> >> > >> >> >>> > >> > Pytables-users mailing list > >> >> > >> >> >>> > >> > Pyt...@li... > >> >> > >> >> >>> > >> > > >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> -------------- next part -------------- > >> >> > >> >> >>> > >> An HTML attachment was scrubbed... > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> ------------------------------ > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > ------------------------------------------------------------------------------ > >> >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# > >> 2012, > >> >> > >> HTML5, > >> >> > >> >> >>> CSS, > >> >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep > >> your > >> >> > >> skills > >> >> > >> >> >>> current > >> >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video > tutorials > >> by > >> >> > >> >> Microsoft > >> >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn > >> more > >> >> at: > >> >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> ------------------------------ > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> _______________________________________________ > >> >> > >> >> >>> > >> Pytables-users mailing list > >> >> > >> >> >>> > >> Pyt...@li... > >> >> > >> >> >>> > >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> >> > >> >> >>> > >> ********************************************* > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > ------------------------------------------------------------------------------ > >> >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > >> 2012, > >> >> > >> HTML5, > >> >> > >> >> CSS, > >> >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep > >> your > >> >> > skills > >> >> > >> >> >>> current > >> >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials > >> by > >> >> > >> Microsoft > >> >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn > more > >> >> at: > >> >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> >> > >> >> >>> > > _______________________________________________ > >> >> > >> >> >>> > > Pytables-users mailing list > >> >> > >> >> >>> > > Pyt...@li... > >> >> > >> >> >>> > > > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > -------------- next part -------------- > >> >> > >> >> >>> > An HTML attachment was scrubbed... > >> >> > >> >> >>> > > >> >> > >> >> >>> > ------------------------------ > >> >> > >> >> >>> > > >> >> > >> >> >>> > Message: 2 > >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, > Vol > >> 80, > >> >> > >> Issue 4 > >> >> > >> >> >>> > To: Discussion list for PyTables > >> >> > >> >> >>> > <pyt...@li...> > >> >> > >> >> >>> > Message-ID: > >> >> > >> >> >>> > < > >> >> > >> >> >>> > > >> >> > >> > >> CAP...@ma...> > >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > > >> >> > >> >> >>> > Josh is right that you can just edit the code by hand > >> (which > >> >> > >> works > >> >> > >> >> but > >> >> > >> >> >>> > sucks). > >> >> > >> >> >>> > > >> >> > >> >> >>> > However, on Windows -- on the rare occasion when I also > >> >> have to > >> >> > >> >> >>> develop on > >> >> > >> >> >>> > it -- I typically use a distribution that includes a > >> >> compiler, > >> >> > >> >> cython, > >> >> > >> >> >>> > hdf5, and pytables already and then I install my > >> development > >> >> > >> version > >> >> > >> >> >>> from > >> >> > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, > >> >> though > >> >> > >> other > >> >> > >> >> >>> > distributions listed here [1] might also work. > >> >> > >> >> >>> > > >> >> > >> >> >>> > Be well > >> >> > >> >> >>> > Anthony > >> >> > >> >> >>> > > >> >> > >> >> >>> > 1. > >> http://numfocus.org/projects-2/software-distributions/ > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < > >> >> > jos...@gm... > >> >> > >> > > >> >> > >> >> >>> wrote: > >> >> > >> >> >>> > > >> >> > >> >> >>> > > The change was in pure Python code, so you should be > >> able > >> >> to > >> >> > >> just > >> >> > >> >> >>> paste > >> >> > >> >> >>> > in > >> >> > >> >> >>> > > the changes to your local copy. Start with the > >> >> > >> >> table.Column.__iter__ > >> >> > >> >> >>> > > method (lines 3296-3310) here. > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> > >> >> > >> > >> >> > > >> >> > >> > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > It needs to be modified slightly because it uses some > >> >> > >> additional > >> >> > >> >> >>> features > >> >> > >> >> >>> > > that aren't available in the released version (the > >> >> > >> out=buf_slice > >> >> > >> >> >>> argument > >> >> > >> >> >>> > > to table.read). The following should work. > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > def __iter__(self): > >> >> > >> >> >>> > > table = self.table > >> >> > >> >> >>> > > itemsize = self.dtype.itemsize > >> >> > >> >> >>> > > nrowsinbuf = > >> >> table._v_file.params['IO_BUFFER_SIZE'] > >> >> > // > >> >> > >> >> >>> itemsize > >> >> > >> >> >>> > > max_row = len(self) > >> >> > >> >> >>> > > for start_row in xrange(0, len(self), > >> nrowsinbuf): > >> >> > >> >> >>> > > end_row = min([start_row + nrowsinbuf, > >> >> max_row]) > >> >> > >> >> >>> > > buf = table.read(start_row, end_row, 1, > >> >> > >> >> >>> field=self.pathname) > >> >> > >> >> >>> > > for row in buf: > >> >> > >> >> >>> > > yield row > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > I haven't tested this, but I think it will work. > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > Josh > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > > >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < > >> >> > >> >> dav...@gm...> > >> >> > >> >> >>> > wrote: > >> >> > >> >> >>> > > > >> >> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but > I'm > >> >> > forced > >> >> > >> to > >> >> > >> >> >>> work on > >> >> > >> >> >>> > >> Windows 7 at work and have never had luck compiling > >> >> python > >> >> > >> source > >> >> > >> >> >>> > >> successfully. I have had to rely on precompiled > >> binaries > >> >> > and > >> >> > >> now > >> >> > >> >> >>> its > >> >> > >> >> >>> > >> biting me in the butt. > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> Is there any quick fix I can do to improve this > >> iteration > >> >> > >> using > >> >> > >> >> >>> v2.4.0? > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > >> >> > >> >> >>> > >> pyt...@li...> > wrote: > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >>> Send Pytables-users mailing list submissions to > >> >> > >> >> >>> > >>> pyt...@li... > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, > >> >> visit > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >>> or, via email, send a message with subject or body > >> >> 'help' > >> >> > to > >> >> > >> >> >>> > >>> > pyt...@li... > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> You can reach the person managing the list at > >> >> > >> >> >>> > >>> pyt...@li... > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> When replying, please edit your Subject line so it > is > >> >> more > >> >> > >> >> specific > >> >> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> Today's Topics: > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 > >> (David > >> >> > Reed) > >> >> > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 > >> (David > >> >> > Reed) > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >> >> > >> > >> >> > ---------------------------------------------------------------------- > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> Message: 1 > >> >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > >> >> > >> >> >>> > >>> From: David Reed <dav...@gm...> > >> >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users > Digest, > >> Vol > >> >> > 80, > >> >> > >> >> Issue > >> >> > >> >> >>> 2 > >> >> > >> >> >>> > >>> To: pyt...@li... > >> >> > >> >> >>> > >>> Message-ID: > >> >> > >> >> >>> > >>> > >> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > >> >> > >> >> >>> > >>> ev...@ma...> > >> >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I > >> don't > >> >> > think > >> >> > >> >> that > >> >> > >> >> >>> > method > >> >> > >> >> >>> > >>> will work since this will only be comparing the ith > >> >> element > >> >> > >> with > >> >> > >> >> >>> ith+1 > >> >> > >> >> >>> > >>> element. I still need 2 for loops right? > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> Using itertools might speed things up though, I've > >> never > >> >> > used > >> >> > >> >> them > >> >> > >> >> >>> so I > >> >> > >> >> >>> > >>> will give it a shot and let you know how it goes. > >> Looks > >> >> > >> like I > >> >> > >> >> >>> need to > >> >> > >> >> >>> > >>> download the latest release before I do that too. > >> >> Thanks > >> >> > for > >> >> > >> >> the > >> >> > >> >> >>> help. > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> -Dave > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > >> >> > >> >> >>> > >>> pyt...@li...> > wrote: > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > >> >> > >> >> >>> > >>> > pyt...@li... > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide > Web, > >> >> visit > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > >> >> >>> > >>> > or, via email, send a message with subject or > body > >> >> 'help' > >> >> > >> to > >> >> > >> >> >>> > >>> > > >> pyt...@li... > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > You can reach the person managing the list at > >> >> > >> >> >>> > >>> > > pyt...@li... > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > When replying, please edit your Subject line so > it > >> is > >> >> > more > >> >> > >> >> >>> specific > >> >> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > Today's Topics: > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables > >> >> > (Anthony > >> >> > >> >> >>> Scopatz) > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > > >> >> > >> >> > >> >> > > >> ---------------------------------------------------------------------- > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > Message: 1 > >> >> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > >> >> > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> > >> >> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of > >> HDF5 > >> >> > >> using > >> >> > >> >> >>> PyTables > >> >> > >> >> >>> > >>> > To: Discussion list for PyTables > >> >> > >> >> >>> > >>> > <pyt...@li...> > >> >> > >> >> >>> > >>> > Message-ID: > >> >> > >> >> >>> > >>> > <CAPk-6T5b= > >> >> > >> >> >>> > >>> > > >> >> 1EG...@ma... > >> >> > > > >> >> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > HI David, > >> >> > >> >> >>> > >>> > > >> >> > >> >> >>> > >>> > Tables and > > > > ... > > > > [Message clipped] > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/s... [truncated message content] |
From: Anthony S. <sc...@gm...> - 2013-02-04 20:44:09
|
Hey David, I am getting the following error now: scopatz@ares ~ $ python t.py 10669890 Comparisons Traceback (most recent call last): File "t.py", line 61, in <module> get_hd() File "t.py", line 54, in get_hd for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, range(N_irises)), 2): File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 3308, in __iter__ out=buf_slice) File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 1807, in read arr = self._read(start, stop, step, field, out) File "/home/scopatz/.local/lib/python2.7/site-packages/tables/table.py", line 1732, in _read bytes_required)) ValueError: output array size invalid, got 4620 bytes, need 753984000 bytes And I had to change the phasors line to ths following: r['phasors'] = np.empty((17, 20*240), complex) Thanks. Be Well Anthony On Mon, Feb 4, 2013 at 1:41 PM, David Reed <dav...@gm...> wrote: > I didn't have any luck. I replaced that __iter__ function which led to me > replacing the read function which lead to me replaceing the _read function > and I eventually got another error. > > Below are 2 functions and my HDF5 Table class declaration. They should be > self explanatory. I wasn't sure if attachments would go through and this > is pretty small, so I figured it would be ok just to post. I apologize if > this is a bit cluttered. I would also appreciate any comments on how I > assign the results to the matrix D, this does not seem very pythonic at all > and could use some advice there if its easy. (the ii*jj is just a place > holder for a more sophisticated measure). Thanks again! > > import numpy as np > import tables as tb > > class Iris(tb.IsDescription): > subject_id = tb.IntCol() > iris_id = tb.IntCol() > database = tb.StringCol(5) > is_left = tb.BoolCol() > is_flipped = tb.BoolCol() > templates = tb.BoolCol(shape=(17, 20*480)) > masks1 = tb.BoolCol(shape=(17, 20*480)) > phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) > masks2 = tb.BoolCol(shape=(17, 20*240)) > > > def create_hdf5(): > """ > """ > with tb.openFile('test.h5', 'w') as f: > > # Create and fill the table of irises", > irises = f.createTable(f.root, 'irises', Iris, 'Irises', > filters=tb.Filters(1)) > for ii in range(4620): > > r = irises.row > r['subject_id'] = ii > r['iris_id'] = 0 > r['database'] = 'test' > r['is_left'] = True > r['is_flipped'] = False > r['templates'] = np.empty((17, 20*480), np.bool8) > r['masks1'] = np.empty((17, 20*480), np.bool8) > r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) > r['masks2'] = np.empty((17, 20*240), np.bool8) > r.append() > > irises.flush() > > def get_hd(): > """ > """ > from itertools import combinations, izip > with tb.openFile('test.h5') as f: > irises = f.root.irises > > templates = f.root.irises.cols.templates > masks = f.root.irises.cols.masks1 > > N_irises = len(irises) > > print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > D = np.empty((N_irises, N_irises)) > for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, > range(N_irises)), 2): > D[ii, jj] = ii*jj > > np.save('test', D) > > > > > On Mon, Feb 4, 2013 at 11:16 AM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 4 Feb 2013 10:16:24 -0600 >> From: Anthony Scopatz <sc...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> < >> CAP...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> >> wrote: >> >> > Hi Josh, >> > >> > Here is my __iter__ code: >> > >> > def __iter__(self): >> > table = self.table >> > itemsize = self.dtype.itemsize >> > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize >> > max_row = len(self) >> > for start_row in xrange(0, len(self), nrowsinbuf): >> > end_row = min([start_row + nrowsinbuf, max_row]) >> > buf = table.read(start_row, end_row, 1, field=self.pathname) >> > for row in buf: >> > yield row >> > >> > It does look different, I will try swapping in the code from github and >> > see what happens. >> > >> >> Yes, please let us know how that goes! Otherwise send the list both the >> test data generator script and the script that fails. >> >> Be Well >> Anthony >> >> >> > >> > >> > On Mon, Feb 4, 2013 at 9:59 AM, < >> > pyt...@li...> wrote: >> > >> >> Send Pytables-users mailing list submissions to >> >> pyt...@li... >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> or, via email, send a message with subject or body 'help' to >> >> pyt...@li... >> >> >> >> You can reach the person managing the list at >> >> pyt...@li... >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of Pytables-users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) >> >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Fri, 1 Feb 2013 14:08:47 -0800 >> >> From: Josh Ayers <jos...@gm...> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 >> >> To: Discussion list for PyTables >> >> <pyt...@li...> >> >> Message-ID: >> >> <CACOB4aPG4NZ6b2a3v= >> >> 1Ue...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> David, >> >> >> >> You added a custom version of table.Column.__iter__, correct? Could >> you >> >> also include that along with the script to reproduce the error? >> >> >> >> It seems like the problem may be in the 'nrowsinbuf' calculation - see >> >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using >> the >> >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6 >> >> chunks. Instead, it's reading the entire table. >> >> >> >> [1]: >> >> >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> >> >> wrote: >> >> >> >> > >> >> > >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> >> >> wrote: >> >> > >> >> >> at the error: >> >> >> >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> >> >> >> I'm not sure what that means as a dtype, but thats what it is. >> >> >> >> >> >> Forgive me if I'm being totally naive, but I thought the whole >> point of >> >> >> __iter__ with pyttables was to do iteration on the fly, so there is >> no >> >> >> preallocation. >> >> >> >> >> > >> >> > Nope you are not being naive at all. That is the point. >> >> > >> >> > >> >> >> If you have any ideas on this I'm all ears. >> >> >> >> >> > >> >> > If you could send a minimal script which reproduces this error, that >> >> would >> >> > help a lot. >> >> > >> >> > Be Well >> >> > Anthony >> >> > >> >> > >> >> >> >> >> >> >> >> >> Thanks again. >> >> >> >> >> >> Dave >> >> >> >> >> >> >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < >> >> >> pyt...@li...> wrote: >> >> >> >> >> >>> Send Pytables-users mailing list submissions to >> >> >>> pyt...@li... >> >> >>> >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> >>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >>> or, via email, send a message with subject or body 'help' to >> >> >>> pyt...@li... >> >> >>> >> >> >>> You can reach the person managing the list at >> >> >>> pyt...@li... >> >> >>> >> >> >>> When replying, please edit your Subject line so it is more specific >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> >>> >> >> >>> >> >> >>> Today's Topics: >> >> >>> >> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >> >> >>> >> >> >>> >> >> >>> >> ---------------------------------------------------------------------- >> >> >>> >> >> >>> Message: 1 >> >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >> >>> From: Anthony Scopatz <sc...@gm...> >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue >> 2 >> >> >>> To: Discussion list for PyTables >> >> >>> <pyt...@li...> >> >> >>> Message-ID: >> >> >>> < >> >> >>> CAP...@ma... >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> >>> >> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < >> dav...@gm...> >> >> >>> wrote: >> >> >>> >> >> >>> > Hi Anthony, >> >> >>> > >> >> >>> > Thanks for the reply. >> >> >>> > >> >> >>> > I honestly don't know how to monitor my Python memory usage, but >> I'm >> >> >>> sure >> >> >>> > that its caused by out of memory. >> >> >>> > >> >> >>> >> >> >>> Well, I would just run top or process monitor or something while >> >> running >> >> >>> the python script to see what happens to memory usage as the script >> >> chugs >> >> >>> along... >> >> >>> >> >> >>> >> >> >>> > I'm just trying to find out how to fix it. My HDF5 table has >> 4620 >> >> >>> rows >> >> >>> > and the column I'm iterating over is a 17x9600 boolean matrix. >> The >> >> >>> > __iter__ method is preallocating an array that is this size which >> >> >>> appears >> >> >>> > to be root of the error. I was hoping there is a fix somewhere >> in >> >> >>> here to >> >> >>> > not have to do this preallocation. >> >> >>> > >> >> >>> >> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. >> 4620 of >> >> >>> these is ~760 MB. If you have 2 GB of memory and you are iterating >> >> over >> >> >>> 2 >> >> >>> of these (templates & masks) it is conceivable that you are just >> >> running >> >> >>> out of memory. Maybe there is a way that __iter__ could not >> >> preallocate >> >> >>> something that is basically a temporary. What is the dtype of the >> >> >>> templates array? >> >> >>> >> >> >>> Be Well >> >> >>> Anthony >> >> >>> >> >> >>> >> >> >>> > >> >> >>> > Thanks again. >> >> >>> >> >> >>> >> >> -------------- next part -------------- >> >> An HTML attachment was scrubbed... >> >> >> >> ------------------------------ >> >> >> >> Message: 2 >> >> Date: Mon, 4 Feb 2013 09:58:53 -0500 >> >> From: David Reed <dav...@gm...> >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 >> >> To: pyt...@li... >> >> Message-ID: >> >> <CAM6XA7= >> >> h50...@ma...> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> Hi Anthony, >> >> >> >> Sorry to just get back to you. I can send a script, should I send a >> script >> >> that creates some fake data as well? >> >> >> >> -Dave >> >> >> >> >> >> On Fri, Feb 1, 2013 at 4:50 PM, < >> >> pyt...@li...> wrote: >> >> >> >> > Send Pytables-users mailing list submissions to >> >> > pyt...@li... >> >> > >> >> > To subscribe or unsubscribe via the World Wide Web, visit >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > or, via email, send a message with subject or body 'help' to >> >> > pyt...@li... >> >> > >> >> > You can reach the person managing the list at >> >> > pyt...@li... >> >> > >> >> > When replying, please edit your Subject line so it is more specific >> >> > than "Re: Contents of Pytables-users digest..." >> >> > >> >> > >> >> > Today's Topics: >> >> > >> >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) >> >> > >> >> > >> >> > >> ---------------------------------------------------------------------- >> >> > >> >> > Message: 1 >> >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 >> >> > From: Anthony Scopatz <sc...@gm...> >> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 >> >> > To: Discussion list for PyTables >> >> > <pyt...@li...> >> >> > Message-ID: >> >> > < >> >> > CAP...@ma...> >> >> > Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> >> >> wrote: >> >> > >> >> > > at the error: >> >> > > >> >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> > > >> >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> > > >> >> > > I'm not sure what that means as a dtype, but thats what it is. >> >> > > >> >> > > Forgive me if I'm being totally naive, but I thought the whole >> point >> >> of >> >> > > __iter__ with pyttables was to do iteration on the fly, so there >> is no >> >> > > preallocation. >> >> > > >> >> > >> >> > Nope you are not being naive at all. That is the point. >> >> > >> >> > >> >> > > If you have any ideas on this I'm all ears. >> >> > > >> >> > >> >> > If you could send a minimal script which reproduces this error, that >> >> would >> >> > help a lot. >> >> > >> >> > Be Well >> >> > Anthony >> >> > >> >> > >> >> > > >> >> > > >> >> > > Thanks again. >> >> > > >> >> > > Dave >> >> > > >> >> > > >> >> > > On Fri, Feb 1, 2013 at 3:45 PM, < >> >> > > pyt...@li...> wrote: >> >> > > >> >> > >> Send Pytables-users mailing list submissions to >> >> > >> pyt...@li... >> >> > >> >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> or, via email, send a message with subject or body 'help' to >> >> > >> pyt...@li... >> >> > >> >> >> > >> You can reach the person managing the list at >> >> > >> pyt...@li... >> >> > >> >> >> > >> When replying, please edit your Subject line so it is more >> specific >> >> > >> than "Re: Contents of Pytables-users digest..." >> >> > >> >> >> > >> >> >> > >> Today's Topics: >> >> > >> >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >> >> > >> >> >> > >> >> >> > >> >> >> ---------------------------------------------------------------------- >> >> > >> >> >> > >> Message: 1 >> >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >> > >> From: Anthony Scopatz <sc...@gm...> >> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, >> Issue 2 >> >> > >> To: Discussion list for PyTables >> >> > >> <pyt...@li...> >> >> > >> Message-ID: >> >> > >> < >> >> > >> >> CAP...@ma...> >> >> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < >> dav...@gm...> >> >> > >> wrote: >> >> > >> >> >> > >> > Hi Anthony, >> >> > >> > >> >> > >> > Thanks for the reply. >> >> > >> > >> >> > >> > I honestly don't know how to monitor my Python memory usage, but >> >> I'm >> >> > >> sure >> >> > >> > that its caused by out of memory. >> >> > >> > >> >> > >> >> >> > >> Well, I would just run top or process monitor or something while >> >> running >> >> > >> the python script to see what happens to memory usage as the >> script >> >> > chugs >> >> > >> along... >> >> > >> >> >> > >> >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table has >> 4620 >> >> > rows >> >> > >> > and the column I'm iterating over is a 17x9600 boolean matrix. >> The >> >> > >> > __iter__ method is preallocating an array that is this size >> which >> >> > >> appears >> >> > >> > to be root of the error. I was hoping there is a fix somewhere >> in >> >> > here >> >> > >> to >> >> > >> > not have to do this preallocation. >> >> > >> > >> >> > >> >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. >> 4620 >> >> of >> >> > >> these is ~760 MB. If you have 2 GB of memory and you are >> iterating >> >> > over 2 >> >> > >> of these (templates & masks) it is conceivable that you are just >> >> running >> >> > >> out of memory. Maybe there is a way that __iter__ could not >> >> preallocate >> >> > >> something that is basically a temporary. What is the dtype of the >> >> > >> templates array? >> >> > >> >> >> > >> Be Well >> >> > >> Anthony >> >> > >> >> >> > >> >> >> > >> > >> >> > >> > Thanks again. >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < >> >> > >> > pyt...@li...> wrote: >> >> > >> > >> >> > >> >> Send Pytables-users mailing list submissions to >> >> > >> >> pyt...@li... >> >> > >> >> >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> > >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> or, via email, send a message with subject or body 'help' to >> >> > >> >> pyt...@li... >> >> > >> >> >> >> > >> >> You can reach the person managing the list at >> >> > >> >> pyt...@li... >> >> > >> >> >> >> > >> >> When replying, please edit your Subject line so it is more >> >> specific >> >> > >> >> than "Re: Contents of Pytables-users digest..." >> >> > >> >> >> >> > >> >> >> >> > >> >> Today's Topics: >> >> > >> >> >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony >> Scopatz) >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> ---------------------------------------------------------------------- >> >> > >> >> >> >> > >> >> Message: 1 >> >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> >> > >> >> From: Anthony Scopatz <sc...@gm...> >> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> >> Issue 9 >> >> > >> >> To: Discussion list for PyTables >> >> > >> >> <pyt...@li...> >> >> > >> >> Message-ID: >> >> > >> >> < >> >> > >> >> >> >> CAP...@ma...> >> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> >> >> > >> >> Hi David, >> >> > >> >> >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to be >> >> getting >> >> > a >> >> > >> >> memory error on creating a numpy array. This kind of thing >> >> typically >> >> > >> >> happens when you are out of memory. Does this seem to be the >> case >> >> > with >> >> > >> >> you? When this dies, is your memory usage at 100%? If so, >> this >> >> > >> algorithm >> >> > >> >> might require a little tweaking... >> >> > >> >> >> >> > >> >> Be Well >> >> > >> >> Anthony >> >> > >> >> >> >> > >> >> >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < >> >> dav...@gm...> >> >> > >> >> wrote: >> >> > >> >> >> >> > >> >> > I'm still having problems with this one. I can't tell if >> this >> >> > >> something >> >> > >> >> > dumb Im doing with itertools, or if its something in >> pytables. >> >> > >> >> > >> >> > >> >> > Would appreciate any help. >> >> > >> >> > >> >> > >> >> > Thanks >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < >> >> > dav...@gm... >> >> > >> >> >wrote: >> >> > >> >> > >> >> > >> >> >> I think I have to reopen this issue. I have been running >> fine >> >> for >> >> > >> >> awhile >> >> > >> >> >> using the combinations method from itertools, but have >> recently >> >> > run >> >> > >> >> into a >> >> > >> >> >> memory since I have recently quadrupled the size of the hdf >> >> file. >> >> > >> >> >> >> >> > >> >> >> Here is my code again: >> >> > >> >> >> >> >> > >> >> >> from itertools import combinations, izip >> >> > >> >> >> with tb.openFile(h5_all, 'r') as f: >> >> > >> >> >> irises = f.root.irises >> >> > >> >> >> >> >> > >> >> >> templates = f.root.irises.cols.templates >> >> > >> >> >> masks = f.root.irises.cols.masks1 >> >> > >> >> >> >> >> > >> >> >> N_irises = len(irises) >> >> > >> >> >> index = np.ones((20 * 480), np.bool) >> >> > >> >> >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> >> > >> >> >> D = np.empty((N_irises, N_irises)) >> >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in >> combinations(izip(templates, >> >> > >> masks, >> >> > >> >> >> range(N_irises)), 2): >> >> > >> >> >> # print ii >> >> > >> >> >> D[ii, jj] = ham_dist( >> >> > >> >> >> t1[8, index], >> >> > >> >> >> t2[:, index], >> >> > >> >> >> m1[8, index], >> >> > >> >> >> m2[:, index], >> >> > >> >> >> ) >> >> > >> >> >> >> >> > >> >> >> And here is the error: >> >> > >> >> >> >> >> > >> >> >> In [10]: get_hd3() >> >> > >> >> >> 10669890 Comparisons >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> >> --------------------------------------------------------------------------- >> >> > >> >> >> MemoryError Traceback (most >> >> recent >> >> > >> call >> >> > >> >> >> last) >> >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> >> > >> >> >> ----> 1 get_hd3() >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> 118 print '%i Comparisons' % >> >> > >> (N_irises*(N_irises - >> >> > >> >> >> 1)/2) >> >> > >> >> >> 119 D = np.empty((N_irises, N_irises)) >> >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in >> >> > >> >> >> combinations(izip(temp >> >> > >> >> >> lates, masks, range(N_irises)), 2): >> >> > >> >> >> 121 # print ii >> >> > >> >> >> 122 D[ii, jj] = ham_dist( >> >> > >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> >> __iter__(self) >> >> > >> >> >> 3274 for start_row in xrange(0, len(self), >> >> nrowsinbuf): >> >> > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, >> >> > max_row]) >> >> > >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, >> >> > >> >> >> field=self.pathname) >> >> > >> >> >> >> >> > >> >> >> 3277 for row in buf: >> >> > >> >> >> 3278 yield row >> >> > >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, >> >> > start, >> >> > >> >> stop, >> >> > >> >> >> step, >> >> > >> >> >> field) >> >> > >> >> >> 1772 (start, stop, step) = >> >> > self._processRangeRead(start, >> >> > >> >> stop, >> >> > >> >> >> step) >> >> > >> >> >> 1773 >> >> > >> >> >> -> 1774 arr = self._read(start, stop, step, field) >> >> > >> >> >> 1775 return internal_to_flavor(arr, self.flavor) >> >> > >> >> >> 1776 >> >> > >> >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> _read(self, >> >> > start, >> >> > >> >> >> stop, step, >> >> > >> >> >> field) >> >> > >> >> >> 1719 if field: >> >> > >> >> >> 1720 # Create a container for the results >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> >> > >> dtype=dtypeField) >> >> > >> >> >> 1722 else: >> >> > >> >> >> 1723 # Recarray case >> >> > >> >> >> >> >> > >> >> >> MemoryError: >> >> > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() >> >> > >> >> >> 1720 # Create a container for the results >> >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> >> > >> dtype=dtypeField) >> >> > >> >> >> 1722 else: >> >> > >> >> >> >> >> > >> >> >> Also, if you guys see any performance problems in my code, >> >> please >> >> > >> let >> >> > >> >> me >> >> > >> >> >> know. >> >> > >> >> >> >> >> > >> >> >> Thank you so much for the help. >> >> > >> >> >> >> >> > >> >> >> -Dave >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> >> > >> >> >> pyt...@li...> wrote: >> >> > >> >> >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to >> >> > >> >> >>> pyt...@li... >> >> > >> >> >>> >> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >> > >> >> >>> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> or, via email, send a message with subject or body 'help' >> to >> >> > >> >> >>> pyt...@li... >> >> > >> >> >>> >> >> > >> >> >>> You can reach the person managing the list at >> >> > >> >> >>> pyt...@li... >> >> > >> >> >>> >> >> > >> >> >>> When replying, please edit your Subject line so it is more >> >> > specific >> >> > >> >> >>> than "Re: Contents of Pytables-users digest..." >> >> > >> >> >>> >> >> > >> >> >>> >> >> > >> >> >>> Today's Topics: >> >> > >> >> >>> >> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David >> Reed) >> >> > >> >> >>> >> >> > >> >> >>> >> >> > >> >> >>> >> >> > >> >> >> ---------------------------------------------------------------------- >> >> > >> >> >>> >> >> > >> >> >>> Message: 1 >> >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> >> > >> >> >>> From: David Reed <dav...@gm...> >> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> 80, >> >> > Issue >> >> > >> 8 >> >> > >> >> >>> To: pyt...@li... >> >> > >> >> >>> Message-ID: >> >> > >> >> >>> < >> >> > >> >> >>> >> >> > CAM...@ma... >> >> > >> > >> >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> >>> >> >> > >> >> >>> I can't thank you guys enough for the help. I was able to >> add >> >> > the >> >> > >> >> >>> __iter__ >> >> > >> >> >>> function to the table.py file and everything seems to be >> >> working >> >> > >> >> great! >> >> > >> >> >>> I'm not quite as fast as I was with iterating right of a >> >> matrix >> >> > >> but >> >> > >> >> >>> pretty >> >> > >> >> >>> close. I was at 555 comparisons per second, and now im at >> >> 420. >> >> > >> >> >>> >> >> > >> >> >>> I handled the problem I mentioned earlier by doing this, >> and >> >> it >> >> > >> seems >> >> > >> >> to >> >> > >> >> >>> work great: >> >> > >> >> >>> >> >> > >> >> >>> A = f.root.data.cols.A >> >> > >> >> >>> B = f.root.data.cols.B >> >> > >> >> >>> >> >> > >> >> >>> D = np.empty((len(A), len(A)) >> >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, >> >> > >> >> range(len(A))), >> >> > >> >> >>> 2): >> >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> >> > >> >> >>> >> >> > >> >> >>> Again, thanks a lot. >> >> > >> >> >>> >> >> > >> >> >>> -Dave >> >> > >> >> >>> >> >> > >> >> >>> >> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> >> > >> >> >>> pyt...@li...> wrote: >> >> > >> >> >>> >> >> > >> >> >>> > Send Pytables-users mailing list submissions to >> >> > >> >> >>> > pyt...@li... >> >> > >> >> >>> > >> >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> >> > >> >> >>> > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> > or, via email, send a message with subject or body >> 'help' to >> >> > >> >> >>> > pyt...@li... >> >> > >> >> >>> > >> >> > >> >> >>> > You can reach the person managing the list at >> >> > >> >> >>> > pyt...@li... >> >> > >> >> >>> > >> >> > >> >> >>> > When replying, please edit your Subject line so it is >> more >> >> > >> specific >> >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > Today's Topics: >> >> > >> >> >>> > >> >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony >> >> > >> Scopatz) >> >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony >> >> > >> Scopatz) >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >> >> > >> ---------------------------------------------------------------------- >> >> > >> >> >>> > >> >> > >> >> >>> > Message: 1 >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> 80, >> >> > >> Issue 3 >> >> > >> >> >>> > To: Discussion list for PyTables >> >> > >> >> >>> > <pyt...@li...> >> >> > >> >> >>> > Message-ID: >> >> > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> >> > >> >> >>> > Gz...@ma...> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> >>> > >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < >> >> > >> dav...@gm...> >> >> > >> >> >>> wrote: >> >> > >> >> >>> > >> >> > >> >> >>> > > Thanks a lot for the help so far guys! >> >> > >> >> >>> > > >> >> > >> >> >>> > > Looking at itertools, I found what I believe to be the >> >> > perfect >> >> > >> >> >>> function >> >> > >> >> >>> > > for what I need, itertools.combinations. This appears >> to >> >> be a >> >> > >> >> valid >> >> > >> >> >>> > > replacement to the method proposed. >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> >>> > Yes, combinations is awesome! >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > > >> >> > >> >> >>> > > There is a small problem that I didn't mention is that >> my >> >> > >> compare >> >> > >> >> >>> > function >> >> > >> >> >>> > > actually takes as inputs 2 columns from the table. Like >> >> so: >> >> > >> >> >>> > > >> >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) >> >> > >> >> >>> > > for ii in xrange(N_elements): >> >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): >> >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], >> >> > >> >> >>> > data['element1'][jj],data['element2'][ii], >> >> > >> >> >>> > > data['element2'][jj]) >> >> > >> >> >>> > > >> >> > >> >> >>> > > Is there an efficient way of using itertools with this >> >> > >> structure? >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> >>> > You can always make two other iterators for each column. >> >> Since >> >> > >> you >> >> > >> >> >>> have >> >> > >> >> >>> > two columns you would have 4 iterators. I am not sure >> how >> >> fast >> >> > >> >> this is >> >> > >> >> >>> > going to be but I am confident that there is definitely a >> >> way >> >> > to >> >> > >> do >> >> > >> >> >>> this in >> >> > >> >> >>> > one for-loop, which is going to be way faster than nested >> >> > loops. >> >> > >> >> >>> > >> >> > >> >> >>> > Be Well >> >> > >> >> >>> > Anthony >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> >> > >> >> >>> > > pyt...@li...> wrote: >> >> > >> >> >>> > > >> >> > >> >> >>> > >> Send Pytables-users mailing list submissions to >> >> > >> >> >>> > >> pyt...@li... >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, >> visit >> >> > >> >> >>> > >> >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> > >> or, via email, send a message with subject or body >> >> 'help' to >> >> > >> >> >>> > >> pyt...@li... >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> You can reach the person managing the list at >> >> > >> >> >>> > >> pyt...@li... >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> When replying, please edit your Subject line so it is >> >> more >> >> > >> >> specific >> >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> Today's Topics: >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables >> (Josh >> >> > Ayers) >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> >> >> > >> >> >> ---------------------------------------------------------------------- >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> Message: 1 >> >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> >> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> >> using >> >> > >> >> >>> PyTables >> >> > >> >> >>> > >> To: Discussion list for PyTables >> >> > >> >> >>> > >> <pyt...@li...> >> >> > >> >> >>> > >> Message-ID: >> >> > >> >> >>> > >> < >> >> > >> >> >>> > >> >> >> > >> >> >> >> CAC...@ma...> >> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> David, >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> The change in issue 27 was only for iteration over a >> >> > >> >> tables.Column >> >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. >> >> This >> >> > >> will >> >> > >> >> >>> > iterate >> >> > >> >> >>> > >> over the "element" column, as in your original >> example. >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> Note also that this will only work with the >> development >> >> > >> version >> >> > >> >> of >> >> > >> >> >>> > >> PyTables >> >> > >> >> >>> > >> available on github. It will be very slow using the >> >> > released >> >> > >> >> >>> v2.4.0. >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> from itertools import izip >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> with tb.openFile(...) as f: >> >> > >> >> >>> > >> data = f.root.data.cols.element >> >> > >> >> >>> > >> data_i = iter(data) >> >> > >> >> >>> > >> data_j = iter(data) >> >> > >> >> >>> > >> data_i.next() # throw the first value away >> >> > >> >> >>> > >> for i, j in izip(data_i, data_j): >> >> > >> >> >>> > >> compare(i, j) >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> Hope that helps, >> >> > >> >> >>> > >> Josh >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> >> > >> >> sc...@gm...> >> >> > >> >> >>> > >> wrote: >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> > HI David, >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > Tables and table column iteration have been >> overhauled >> >> > >> fairly >> >> > >> >> >>> recently >> >> > >> >> >>> > >> > [1]. So you might try creating two iterators, >> offset >> >> by >> >> > >> one, >> >> > >> >> and >> >> > >> >> >>> then >> >> > >> >> >>> > >> > doing the comparison. I am hacking this out super >> >> quick >> >> > so >> >> > >> >> please >> >> > >> >> >>> > >> forgive >> >> > >> >> >>> > >> > me: >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > from itertools import izip >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > with tb.openFile(...) as f: >> >> > >> >> >>> > >> > data = f.root.data >> >> > >> >> >>> > >> > data_i = iter(data) >> >> > >> >> >>> > >> > data_j = iter(data) >> >> > >> >> >>> > >> > data_i.next() # throw the first value away >> >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): >> >> > >> >> >>> > >> > compare(i, j) >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > You get the idea ;) >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > Be Well >> >> > >> >> >>> > >> > Anthony >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> >> > >> >> >>> dav...@gm...> >> >> > >> >> >>> > >> wrote: >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> >> I was hoping someone could help me out here. >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store in >> HDF5 >> >> and >> >> > >> >> access >> >> > >> >> >>> > using >> >> > >> >> >>> > >> >> PyTables. One operation I need to do on this >> dataset >> >> are >> >> > >> >> pairwise >> >> > >> >> >>> > >> >> comparisons between each of the elements. This >> >> requires 2 >> >> > >> >> loops, >> >> > >> >> >>> one >> >> > >> >> >>> > to >> >> > >> >> >>> > >> >> iterate over each element, and an inner loop to >> >> iterate >> >> > >> over >> >> > >> >> >>> every >> >> > >> >> >>> > >> other >> >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 >> >> > comparisons. >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster to >> dump >> >> the >> >> > >> >> >>> contents >> >> > >> >> >>> > >> into a >> >> > >> >> >>> > >> >> multdimensional numpy array and then do my >> iteration. >> >> I >> >> > run >> >> > >> >> into >> >> > >> >> >>> > >> problems >> >> > >> >> >>> > >> >> with large sets because of memory issues and need >> to >> >> > access >> >> > >> >> each >> >> > >> >> >>> > >> element of >> >> > >> >> >>> > >> >> the dataset at run time. >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> Putting the elements into an array gives me about >> 600 >> >> > >> >> >>> comparisons per >> >> > >> >> >>> > >> >> second, while operating on hdf5 data itself gives >> me >> >> > about >> >> > >> 300 >> >> > >> >> >>> > >> comparisons >> >> > >> >> >>> > >> >> per second. >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> Is there a way to speed this process up? >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> Example follows (this is not my real code, just an >> >> > >> example): >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> *Small Set*: >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> > >> >> >>> > >> >> data = f.root.data >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): >> >> > >> >> >>> > >> >> elements[ii] = data['element'] >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> >> > >> >> xrange(N_elements): >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], >> >> elements[jj]) >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> *Large Set*: >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> >> > >> >> >>> > >> >> data = f.root.data >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> >> > >> >> >>> > >> >> for ii in xrange(N_elements): >> >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> >> > >> >> >>> > >> >> D[ii, jj] = >> compare(data['element'][ii], >> >> > >> >> >>> > >> data['element'][jj]) >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> > >> >> >>> >> >> > >> >> >> >> > >> >> >> > >> >> >> ------------------------------------------------------------------------------ >> >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> >> 2012, >> >> > >> >> HTML5, >> >> > >> >> >>> CSS, >> >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep >> >> your >> >> > >> >> skills >> >> > >> >> >>> > current >> >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video >> tutorials >> >> by >> >> > >> >> >>> Microsoft >> >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn >> >> more >> >> > at: >> >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> >> > >> >> >>> > >> >> _______________________________________________ >> >> > >> >> >>> > >> >> Pytables-users mailing list >> >> > >> >> >>> > >> >> Pyt...@li... >> >> > >> >> >>> > >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> >> >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> > >> >> >>> >> >> > >> >> >> >> > >> >> >> > >> >> >> ------------------------------------------------------------------------------ >> >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> >> 2012, >> >> > >> >> HTML5, >> >> > >> >> >>> CSS, >> >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep >> >> your >> >> > >> skills >> >> > >> >> >>> > current >> >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video >> tutorials >> >> by >> >> > >> >> Microsoft >> >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn >> more >> >> > at: >> >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> >> > >> >> >>> > >> > _______________________________________________ >> >> > >> >> >>> > >> > Pytables-users mailing list >> >> > >> >> >>> > >> > Pyt...@li... >> >> > >> >> >>> > >> > >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> > >> >> > >> >> >>> > >> -------------- next part -------------- >> >> > >> >> >>> > >> An HTML attachment was scrubbed... >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> ------------------------------ >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> > >> >> >>> >> >> > >> >> >> >> > >> >> >> > >> >> >> ------------------------------------------------------------------------------ >> >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> >> > >> HTML5, >> >> > >> >> >>> CSS, >> >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep >> your >> >> > >> skills >> >> > >> >> >>> current >> >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials >> by >> >> > >> >> Microsoft >> >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn >> more >> >> at: >> >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> ------------------------------ >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> _______________________________________________ >> >> > >> >> >>> > >> Pytables-users mailing list >> >> > >> >> >>> > >> Pyt...@li... >> >> > >> >> >>> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> >> > >> >> >>> > >> ********************************************* >> >> > >> >> >>> > >> >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> >>> >> >> > >> >> >> >> > >> >> >> > >> >> >> ------------------------------------------------------------------------------ >> >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> >> > >> HTML5, >> >> > >> >> CSS, >> >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep >> your >> >> > skills >> >> > >> >> >>> current >> >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials >> by >> >> > >> Microsoft >> >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more >> >> at: >> >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> >> > >> >> >>> > > _______________________________________________ >> >> > >> >> >>> > > Pytables-users mailing list >> >> > >> >> >>> > > Pyt...@li... >> >> > >> >> >>> > > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > -------------- next part -------------- >> >> > >> >> >>> > An HTML attachment was scrubbed... >> >> > >> >> >>> > >> >> > >> >> >>> > ------------------------------ >> >> > >> >> >>> > >> >> > >> >> >>> > Message: 2 >> >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> 80, >> >> > >> Issue 4 >> >> > >> >> >>> > To: Discussion list for PyTables >> >> > >> >> >>> > <pyt...@li...> >> >> > >> >> >>> > Message-ID: >> >> > >> >> >>> > < >> >> > >> >> >>> > >> >> > >> >> CAP...@ma...> >> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> >>> > >> >> > >> >> >>> > Josh is right that you can just edit the code by hand >> (which >> >> > >> works >> >> > >> >> but >> >> > >> >> >>> > sucks). >> >> > >> >> >>> > >> >> > >> >> >>> > However, on Windows -- on the rare occasion when I also >> >> have to >> >> > >> >> >>> develop on >> >> > >> >> >>> > it -- I typically use a distribution that includes a >> >> compiler, >> >> > >> >> cython, >> >> > >> >> >>> > hdf5, and pytables already and then I install my >> development >> >> > >> version >> >> > >> >> >>> from >> >> > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, >> >> though >> >> > >> other >> >> > >> >> >>> > distributions listed here [1] might also work. >> >> > >> >> >>> > >> >> > >> >> >>> > Be well >> >> > >> >> >>> > Anthony >> >> > >> >> >>> > >> >> > >> >> >>> > 1. >> http://numfocus.org/projects-2/software-distributions/ >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < >> >> > jos...@gm... >> >> > >> > >> >> > >> >> >>> wrote: >> >> > >> >> >>> > >> >> > >> >> >>> > > The change was in pure Python code, so you should be >> able >> >> to >> >> > >> just >> >> > >> >> >>> paste >> >> > >> >> >>> > in >> >> > >> >> >>> > > the changes to your local copy. Start with the >> >> > >> >> table.Column.__iter__ >> >> > >> >> >>> > > method (lines 3296-3310) here. >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > >> >> > >> >> >>> >> >> > >> >> >> >> > >> >> >> > >> >> >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> >> > >> >> >>> > > >> >> > >> >> >>> > > It needs to be modified slightly because it uses some >> >> > >> additional >> >> > >> >> >>> features >> >> > >> >> >>> > > that aren't available in the released version (the >> >> > >> out=buf_slice >> >> > >> >> >>> argument >> >> > >> >> >>> > > to table.read). The following should work. >> >> > >> >> >>> > > >> >> > >> >> >>> > > def __iter__(self): >> >> > >> >> >>> > > table = self.table >> >> > >> >> >>> > > itemsize = self.dtype.itemsize >> >> > >> >> >>> > > nrowsinbuf = >> >> table._v_file.params['IO_BUFFER_SIZE'] >> >> > // >> >> > >> >> >>> itemsize >> >> > >> >> >>> > > max_row = len(self) >> >> > >> >> >>> > > for start_row in xrange(0, len(self), >> nrowsinbuf): >> >> > >> >> >>> > > end_row = min([start_row + nrowsinbuf, >> >> max_row]) >> >> > >> >> >>> > > buf = table.read(start_row, end_row, 1, >> >> > >> >> >>> field=self.pathname) >> >> > >> >> >>> > > for row in buf: >> >> > >> >> >>> > > yield row >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > I haven't tested this, but I think it will work. >> >> > >> >> >>> > > >> >> > >> >> >>> > > Josh >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > >> >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < >> >> > >> >> dav...@gm...> >> >> > >> >> >>> > wrote: >> >> > >> >> >>> > > >> >> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm >> >> > forced >> >> > >> to >> >> > >> >> >>> work on >> >> > >> >> >>> > >> Windows 7 at work and have never had luck compiling >> >> python >> >> > >> source >> >> > >> >> >>> > >> successfully. I have had to rely on precompiled >> binaries >> >> > and >> >> > >> now >> >> > >> >> >>> its >> >> > >> >> >>> > >> biting me in the butt. >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> Is there any quick fix I can do to improve this >> iteration >> >> > >> using >> >> > >> >> >>> v2.4.0? >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> >> > >> >> >>> > >> pyt...@li...> wrote: >> >> > >> >> >>> > >> >> >> > >> >> >>> > >>> Send Pytables-users mailing list submissions to >> >> > >> >> >>> > >>> pyt...@li... >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, >> >> visit >> >> > >> >> >>> > >>> >> >> > >> >> >>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> > >>> or, via email, send a message with subject or body >> >> 'help' >> >> > to >> >> > >> >> >>> > >>> pyt...@li... >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> You can reach the person managing the list at >> >> > >> >> >>> > >>> pyt...@li... >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> When replying, please edit your Subject line so it is >> >> more >> >> > >> >> specific >> >> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> Today's Topics: >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 >> (David >> >> > Reed) >> >> > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 >> (David >> >> > Reed) >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> >> >> > >> >> >>> >> >> > >> >> >> ---------------------------------------------------------------------- >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> Message: 1 >> >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> >> > >> >> >>> > >>> From: David Reed <dav...@gm...> >> >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, >> Vol >> >> > 80, >> >> > >> >> Issue >> >> > >> >> >>> 2 >> >> > >> >> >>> > >>> To: pyt...@li... >> >> > >> >> >>> > >>> Message-ID: >> >> > >> >> >>> > >>> >> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> >> > >> >> >>> > >>> ev...@ma...> >> >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I >> don't >> >> > think >> >> > >> >> that >> >> > >> >> >>> > method >> >> > >> >> >>> > >>> will work since this will only be comparing the ith >> >> element >> >> > >> with >> >> > >> >> >>> ith+1 >> >> > >> >> >>> > >>> element. I still need 2 for loops right? >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> Using itertools might speed things up though, I've >> never >> >> > used >> >> > >> >> them >> >> > >> >> >>> so I >> >> > >> >> >>> > >>> will give it a shot and let you know how it goes. >> Looks >> >> > >> like I >> >> > >> >> >>> need to >> >> > >> >> >>> > >>> download the latest release before I do that too. >> >> Thanks >> >> > for >> >> > >> >> the >> >> > >> >> >>> help. >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> -Dave >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> >> > >> >> >>> > >>> pyt...@li...> wrote: >> >> > >> >> >>> > >>> >> >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> >> > >> >> >>> > >>> > pyt...@li... >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, >> >> visit >> >> > >> >> >>> > >>> > >> >> > >> >> >>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > >> >> >>> > >>> > or, via email, send a message with subject or body >> >> 'help' >> >> > >> to >> >> > >> >> >>> > >>> > >> pyt...@li... >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > You can reach the person managing the list at >> >> > >> >> >>> > >>> > pyt...@li... >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > When replying, please edit your Subject line so it >> is >> >> > more >> >> > >> >> >>> specific >> >> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > Today's Topics: >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables >> >> > (Anthony >> >> > >> >> >>> Scopatz) >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >> >> > >> >> >> >> > >> ---------------------------------------------------------------------- >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > Message: 1 >> >> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> >> > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> >> >> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of >> HDF5 >> >> > >> using >> >> > >> >> >>> PyTables >> >> > >> >> >>> > >>> > To: Discussion list for PyTables >> >> > >> >> >>> > >>> > <pyt...@li...> >> >> > >> >> >>> > >>> > Message-ID: >> >> > >> >> >>> > >>> > <CAPk-6T5b= >> >> > >> >> >>> > >>> > >> >> 1EG...@ma... >> >> > > >> >> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > HI David, >> >> > >> >> >>> > >>> > >> >> > >> >> >>> > >>> > Tables and > > ... > > [Message clipped] > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: David R. <dav...@gm...> - 2013-02-04 19:41:54
|
I didn't have any luck. I replaced that __iter__ function which led to me replacing the read function which lead to me replaceing the _read function and I eventually got another error. Below are 2 functions and my HDF5 Table class declaration. They should be self explanatory. I wasn't sure if attachments would go through and this is pretty small, so I figured it would be ok just to post. I apologize if this is a bit cluttered. I would also appreciate any comments on how I assign the results to the matrix D, this does not seem very pythonic at all and could use some advice there if its easy. (the ii*jj is just a place holder for a more sophisticated measure). Thanks again! import numpy as np import tables as tb class Iris(tb.IsDescription): subject_id = tb.IntCol() iris_id = tb.IntCol() database = tb.StringCol(5) is_left = tb.BoolCol() is_flipped = tb.BoolCol() templates = tb.BoolCol(shape=(17, 20*480)) masks1 = tb.BoolCol(shape=(17, 20*480)) phasors = tb.ComplexCol(itemsize=8, shape=(17, 20*240)) masks2 = tb.BoolCol(shape=(17, 20*240)) def create_hdf5(): """ """ with tb.openFile('test.h5', 'w') as f: # Create and fill the table of irises", irises = f.createTable(f.root, 'irises', Iris, 'Irises', filters=tb.Filters(1)) for ii in range(4620): r = irises.row r['subject_id'] = ii r['iris_id'] = 0 r['database'] = 'test' r['is_left'] = True r['is_flipped'] = False r['templates'] = np.empty((17, 20*480), np.bool8) r['masks1'] = np.empty((17, 20*480), np.bool8) r['phasors'] = np.empty((17, 20*240)) + 1j*np.empty((17, 20*240)) r['masks2'] = np.empty((17, 20*240), np.bool8) r.append() irises.flush() def get_hd(): """ """ from itertools import combinations, izip with tb.openFile('test.h5') as f: irises = f.root.irises templates = f.root.irises.cols.templates masks = f.root.irises.cols.masks1 N_irises = len(irises) print '%i Comparisons' % (N_irises*(N_irises - 1)/2) D = np.empty((N_irises, N_irises)) for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, masks, range(N_irises)), 2): D[ii, jj] = ii*jj np.save('test', D) On Mon, Feb 4, 2013 at 11:16 AM, < pyt...@li...> wrote: > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: Pytables-users Digest, Vol 81, Issue 7 (Anthony Scopatz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 4 Feb 2013 10:16:24 -0600 > From: Anthony Scopatz <sc...@gm...> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 7 > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < > CAP...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> wrote: > > > Hi Josh, > > > > Here is my __iter__ code: > > > > def __iter__(self): > > table = self.table > > itemsize = self.dtype.itemsize > > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize > > max_row = len(self) > > for start_row in xrange(0, len(self), nrowsinbuf): > > end_row = min([start_row + nrowsinbuf, max_row]) > > buf = table.read(start_row, end_row, 1, field=self.pathname) > > for row in buf: > > yield row > > > > It does look different, I will try swapping in the code from github and > > see what happens. > > > > Yes, please let us know how that goes! Otherwise send the list both the > test data generator script and the script that fails. > > Be Well > Anthony > > > > > > > > On Mon, Feb 4, 2013 at 9:59 AM, < > > pyt...@li...> wrote: > > > >> Send Pytables-users mailing list submissions to > >> pyt...@li... > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> or, via email, send a message with subject or body 'help' to > >> pyt...@li... > >> > >> You can reach the person managing the list at > >> pyt...@li... > >> > >> When replying, please edit your Subject line so it is more specific > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> Today's Topics: > >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) > >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) > >> > >> > >> ---------------------------------------------------------------------- > >> > >> Message: 1 > >> Date: Fri, 1 Feb 2013 14:08:47 -0800 > >> From: Josh Ayers <jos...@gm...> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > >> To: Discussion list for PyTables > >> <pyt...@li...> > >> Message-ID: > >> <CACOB4aPG4NZ6b2a3v= > >> 1Ue...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> David, > >> > >> You added a custom version of table.Column.__iter__, correct? Could you > >> also include that along with the script to reproduce the error? > >> > >> It seems like the problem may be in the 'nrowsinbuf' calculation - see > >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using > the > >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6 > >> chunks. Instead, it's reading the entire table. > >> > >> [1]: > >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 > >> > >> > >> > >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> > >> wrote: > >> > >> > > >> > > >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> > >> wrote: > >> > > >> >> at the error: > >> >> > >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) > >> >> > >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> >> > >> >> I'm not sure what that means as a dtype, but thats what it is. > >> >> > >> >> Forgive me if I'm being totally naive, but I thought the whole point > of > >> >> __iter__ with pyttables was to do iteration on the fly, so there is > no > >> >> preallocation. > >> >> > >> > > >> > Nope you are not being naive at all. That is the point. > >> > > >> > > >> >> If you have any ideas on this I'm all ears. > >> >> > >> > > >> > If you could send a minimal script which reproduces this error, that > >> would > >> > help a lot. > >> > > >> > Be Well > >> > Anthony > >> > > >> > > >> >> > >> >> > >> >> Thanks again. > >> >> > >> >> Dave > >> >> > >> >> > >> >> On Fri, Feb 1, 2013 at 3:45 PM, < > >> >> pyt...@li...> wrote: > >> >> > >> >>> Send Pytables-users mailing list submissions to > >> >>> pyt...@li... > >> >>> > >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >>> or, via email, send a message with subject or body 'help' to > >> >>> pyt...@li... > >> >>> > >> >>> You can reach the person managing the list at > >> >>> pyt...@li... > >> >>> > >> >>> When replying, please edit your Subject line so it is more specific > >> >>> than "Re: Contents of Pytables-users digest..." > >> >>> > >> >>> > >> >>> Today's Topics: > >> >>> > >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) > >> >>> > >> >>> > >> >>> > ---------------------------------------------------------------------- > >> >>> > >> >>> Message: 1 > >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> >>> From: Anthony Scopatz <sc...@gm...> > >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 > >> >>> To: Discussion list for PyTables > >> >>> <pyt...@li...> > >> >>> Message-ID: > >> >>> < > >> >>> CAP...@ma...> > >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> >>> > >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm... > > > >> >>> wrote: > >> >>> > >> >>> > Hi Anthony, > >> >>> > > >> >>> > Thanks for the reply. > >> >>> > > >> >>> > I honestly don't know how to monitor my Python memory usage, but > I'm > >> >>> sure > >> >>> > that its caused by out of memory. > >> >>> > > >> >>> > >> >>> Well, I would just run top or process monitor or something while > >> running > >> >>> the python script to see what happens to memory usage as the script > >> chugs > >> >>> along... > >> >>> > >> >>> > >> >>> > I'm just trying to find out how to fix it. My HDF5 table has > 4620 > >> >>> rows > >> >>> > and the column I'm iterating over is a 17x9600 boolean matrix. > The > >> >>> > __iter__ method is preallocating an array that is this size which > >> >>> appears > >> >>> > to be root of the error. I was hoping there is a fix somewhere in > >> >>> here to > >> >>> > not have to do this preallocation. > >> >>> > > >> >>> > >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 > of > >> >>> these is ~760 MB. If you have 2 GB of memory and you are iterating > >> over > >> >>> 2 > >> >>> of these (templates & masks) it is conceivable that you are just > >> running > >> >>> out of memory. Maybe there is a way that __iter__ could not > >> preallocate > >> >>> something that is basically a temporary. What is the dtype of the > >> >>> templates array? > >> >>> > >> >>> Be Well > >> >>> Anthony > >> >>> > >> >>> > >> >>> > > >> >>> > Thanks again. > >> >>> > >> >>> > >> -------------- next part -------------- > >> An HTML attachment was scrubbed... > >> > >> ------------------------------ > >> > >> Message: 2 > >> Date: Mon, 4 Feb 2013 09:58:53 -0500 > >> From: David Reed <dav...@gm...> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 > >> To: pyt...@li... > >> Message-ID: > >> <CAM6XA7= > >> h50...@ma...> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> Hi Anthony, > >> > >> Sorry to just get back to you. I can send a script, should I send a > script > >> that creates some fake data as well? > >> > >> -Dave > >> > >> > >> On Fri, Feb 1, 2013 at 4:50 PM, < > >> pyt...@li...> wrote: > >> > >> > Send Pytables-users mailing list submissions to > >> > pyt...@li... > >> > > >> > To subscribe or unsubscribe via the World Wide Web, visit > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > or, via email, send a message with subject or body 'help' to > >> > pyt...@li... > >> > > >> > You can reach the person managing the list at > >> > pyt...@li... > >> > > >> > When replying, please edit your Subject line so it is more specific > >> > than "Re: Contents of Pytables-users digest..." > >> > > >> > > >> > Today's Topics: > >> > > >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) > >> > > >> > > >> > ---------------------------------------------------------------------- > >> > > >> > Message: 1 > >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 > >> > From: Anthony Scopatz <sc...@gm...> > >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 > >> > To: Discussion list for PyTables > >> > <pyt...@li...> > >> > Message-ID: > >> > < > >> > CAP...@ma...> > >> > Content-Type: text/plain; charset="iso-8859-1" > >> > > >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> > >> wrote: > >> > > >> > > at the error: > >> > > > >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) > >> > > > >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) > >> > > > >> > > I'm not sure what that means as a dtype, but thats what it is. > >> > > > >> > > Forgive me if I'm being totally naive, but I thought the whole point > >> of > >> > > __iter__ with pyttables was to do iteration on the fly, so there is > no > >> > > preallocation. > >> > > > >> > > >> > Nope you are not being naive at all. That is the point. > >> > > >> > > >> > > If you have any ideas on this I'm all ears. > >> > > > >> > > >> > If you could send a minimal script which reproduces this error, that > >> would > >> > help a lot. > >> > > >> > Be Well > >> > Anthony > >> > > >> > > >> > > > >> > > > >> > > Thanks again. > >> > > > >> > > Dave > >> > > > >> > > > >> > > On Fri, Feb 1, 2013 at 3:45 PM, < > >> > > pyt...@li...> wrote: > >> > > > >> > >> Send Pytables-users mailing list submissions to > >> > >> pyt...@li... > >> > >> > >> > >> To subscribe or unsubscribe via the World Wide Web, visit > >> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> or, via email, send a message with subject or body 'help' to > >> > >> pyt...@li... > >> > >> > >> > >> You can reach the person managing the list at > >> > >> pyt...@li... > >> > >> > >> > >> When replying, please edit your Subject line so it is more specific > >> > >> than "Re: Contents of Pytables-users digest..." > >> > >> > >> > >> > >> > >> Today's Topics: > >> > >> > >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) > >> > >> > >> > >> > >> > >> > >> ---------------------------------------------------------------------- > >> > >> > >> > >> Message: 1 > >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 > >> > >> From: Anthony Scopatz <sc...@gm...> > >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue > 2 > >> > >> To: Discussion list for PyTables > >> > >> <pyt...@li...> > >> > >> Message-ID: > >> > >> < > >> > >> CAP...@ma... > > > >> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> > >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed < > dav...@gm...> > >> > >> wrote: > >> > >> > >> > >> > Hi Anthony, > >> > >> > > >> > >> > Thanks for the reply. > >> > >> > > >> > >> > I honestly don't know how to monitor my Python memory usage, but > >> I'm > >> > >> sure > >> > >> > that its caused by out of memory. > >> > >> > > >> > >> > >> > >> Well, I would just run top or process monitor or something while > >> running > >> > >> the python script to see what happens to memory usage as the script > >> > chugs > >> > >> along... > >> > >> > >> > >> > >> > >> > I'm just trying to find out how to fix it. My HDF5 table has > 4620 > >> > rows > >> > >> > and the column I'm iterating over is a 17x9600 boolean matrix. > The > >> > >> > __iter__ method is preallocating an array that is this size which > >> > >> appears > >> > >> > to be root of the error. I was hoping there is a fix somewhere > in > >> > here > >> > >> to > >> > >> > not have to do this preallocation. > >> > >> > > >> > >> > >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 > >> of > >> > >> these is ~760 MB. If you have 2 GB of memory and you are iterating > >> > over 2 > >> > >> of these (templates & masks) it is conceivable that you are just > >> running > >> > >> out of memory. Maybe there is a way that __iter__ could not > >> preallocate > >> > >> something that is basically a temporary. What is the dtype of the > >> > >> templates array? > >> > >> > >> > >> Be Well > >> > >> Anthony > >> > >> > >> > >> > >> > >> > > >> > >> > Thanks again. > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < > >> > >> > pyt...@li...> wrote: > >> > >> > > >> > >> >> Send Pytables-users mailing list submissions to > >> > >> >> pyt...@li... > >> > >> >> > >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit > >> > >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> or, via email, send a message with subject or body 'help' to > >> > >> >> pyt...@li... > >> > >> >> > >> > >> >> You can reach the person managing the list at > >> > >> >> pyt...@li... > >> > >> >> > >> > >> >> When replying, please edit your Subject line so it is more > >> specific > >> > >> >> than "Re: Contents of Pytables-users digest..." > >> > >> >> > >> > >> >> > >> > >> >> Today's Topics: > >> > >> >> > >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony > Scopatz) > >> > >> >> > >> > >> >> > >> > >> >> > >> > ---------------------------------------------------------------------- > >> > >> >> > >> > >> >> Message: 1 > >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 > >> > >> >> From: Anthony Scopatz <sc...@gm...> > >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > >> Issue 9 > >> > >> >> To: Discussion list for PyTables > >> > >> >> <pyt...@li...> > >> > >> >> Message-ID: > >> > >> >> < > >> > >> >> > >> CAP...@ma...> > >> > >> >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> >> > >> > >> >> Hi David, > >> > >> >> > >> > >> >> Sorry, I haven't had a ton of time recently. You seem to be > >> getting > >> > a > >> > >> >> memory error on creating a numpy array. This kind of thing > >> typically > >> > >> >> happens when you are out of memory. Does this seem to be the > case > >> > with > >> > >> >> you? When this dies, is your memory usage at 100%? If so, this > >> > >> algorithm > >> > >> >> might require a little tweaking... > >> > >> >> > >> > >> >> Be Well > >> > >> >> Anthony > >> > >> >> > >> > >> >> > >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < > >> dav...@gm...> > >> > >> >> wrote: > >> > >> >> > >> > >> >> > I'm still having problems with this one. I can't tell if this > >> > >> something > >> > >> >> > dumb Im doing with itertools, or if its something in pytables. > >> > >> >> > > >> > >> >> > Would appreciate any help. > >> > >> >> > > >> > >> >> > Thanks > >> > >> >> > > >> > >> >> > > >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < > >> > dav...@gm... > >> > >> >> >wrote: > >> > >> >> > > >> > >> >> >> I think I have to reopen this issue. I have been running > fine > >> for > >> > >> >> awhile > >> > >> >> >> using the combinations method from itertools, but have > recently > >> > run > >> > >> >> into a > >> > >> >> >> memory since I have recently quadrupled the size of the hdf > >> file. > >> > >> >> >> > >> > >> >> >> Here is my code again: > >> > >> >> >> > >> > >> >> >> from itertools import combinations, izip > >> > >> >> >> with tb.openFile(h5_all, 'r') as f: > >> > >> >> >> irises = f.root.irises > >> > >> >> >> > >> > >> >> >> templates = f.root.irises.cols.templates > >> > >> >> >> masks = f.root.irises.cols.masks1 > >> > >> >> >> > >> > >> >> >> N_irises = len(irises) > >> > >> >> >> index = np.ones((20 * 480), np.bool) > >> > >> >> >> > >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) > >> > >> >> >> D = np.empty((N_irises, N_irises)) > >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in > combinations(izip(templates, > >> > >> masks, > >> > >> >> >> range(N_irises)), 2): > >> > >> >> >> # print ii > >> > >> >> >> D[ii, jj] = ham_dist( > >> > >> >> >> t1[8, index], > >> > >> >> >> t2[:, index], > >> > >> >> >> m1[8, index], > >> > >> >> >> m2[:, index], > >> > >> >> >> ) > >> > >> >> >> > >> > >> >> >> And here is the error: > >> > >> >> >> > >> > >> >> >> In [10]: get_hd3() > >> > >> >> >> 10669890 Comparisons > >> > >> >> >> > >> > >> >> >> > >> > >> >> > >> > >> > >> > > >> > --------------------------------------------------------------------------- > >> > >> >> >> MemoryError Traceback (most > >> recent > >> > >> call > >> > >> >> >> last) > >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() > >> > >> >> >> ----> 1 get_hd3() > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> 118 print '%i Comparisons' % > >> > >> (N_irises*(N_irises - > >> > >> >> >> 1)/2) > >> > >> >> >> 119 D = np.empty((N_irises, N_irises)) > >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in > >> > >> >> >> combinations(izip(temp > >> > >> >> >> lates, masks, range(N_irises)), 2): > >> > >> >> >> 121 # print ii > >> > >> >> >> 122 D[ii, jj] = ham_dist( > >> > >> >> >> > >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in > >> __iter__(self) > >> > >> >> >> 3274 for start_row in xrange(0, len(self), > >> nrowsinbuf): > >> > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, > >> > max_row]) > >> > >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, > >> > >> >> >> field=self.pathname) > >> > >> >> >> > >> > >> >> >> 3277 for row in buf: > >> > >> >> >> 3278 yield row > >> > >> >> >> > >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, > >> > start, > >> > >> >> stop, > >> > >> >> >> step, > >> > >> >> >> field) > >> > >> >> >> 1772 (start, stop, step) = > >> > self._processRangeRead(start, > >> > >> >> stop, > >> > >> >> >> step) > >> > >> >> >> 1773 > >> > >> >> >> -> 1774 arr = self._read(start, stop, step, field) > >> > >> >> >> 1775 return internal_to_flavor(arr, self.flavor) > >> > >> >> >> 1776 > >> > >> >> >> > >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, > >> > start, > >> > >> >> >> stop, step, > >> > >> >> >> field) > >> > >> >> >> 1719 if field: > >> > >> >> >> 1720 # Create a container for the results > >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> > >> dtype=dtypeField) > >> > >> >> >> 1722 else: > >> > >> >> >> 1723 # Recarray case > >> > >> >> >> > >> > >> >> >> MemoryError: > >> > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() > >> > >> >> >> 1720 # Create a container for the results > >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, > >> > >> dtype=dtypeField) > >> > >> >> >> 1722 else: > >> > >> >> >> > >> > >> >> >> Also, if you guys see any performance problems in my code, > >> please > >> > >> let > >> > >> >> me > >> > >> >> >> know. > >> > >> >> >> > >> > >> >> >> Thank you so much for the help. > >> > >> >> >> > >> > >> >> >> -Dave > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < > >> > >> >> >> pyt...@li...> wrote: > >> > >> >> >> > >> > >> >> >>> Send Pytables-users mailing list submissions to > >> > >> >> >>> pyt...@li... > >> > >> >> >>> > >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit > >> > >> >> >>> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> or, via email, send a message with subject or body 'help' to > >> > >> >> >>> pyt...@li... > >> > >> >> >>> > >> > >> >> >>> You can reach the person managing the list at > >> > >> >> >>> pyt...@li... > >> > >> >> >>> > >> > >> >> >>> When replying, please edit your Subject line so it is more > >> > specific > >> > >> >> >>> than "Re: Contents of Pytables-users digest..." > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> Today's Topics: > >> > >> >> >>> > >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David > Reed) > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> > >> ---------------------------------------------------------------------- > >> > >> >> >>> > >> > >> >> >>> Message: 1 > >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 > >> > >> >> >>> From: David Reed <dav...@gm...> > >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, > >> > Issue > >> > >> 8 > >> > >> >> >>> To: pyt...@li... > >> > >> >> >>> Message-ID: > >> > >> >> >>> < > >> > >> >> >>> > >> > CAM...@ma... > >> > >> > > >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" > >> > >> >> >>> > >> > >> >> >>> I can't thank you guys enough for the help. I was able to > add > >> > the > >> > >> >> >>> __iter__ > >> > >> >> >>> function to the table.py file and everything seems to be > >> working > >> > >> >> great! > >> > >> >> >>> I'm not quite as fast as I was with iterating right of a > >> matrix > >> > >> but > >> > >> >> >>> pretty > >> > >> >> >>> close. I was at 555 comparisons per second, and now im at > >> 420. > >> > >> >> >>> > >> > >> >> >>> I handled the problem I mentioned earlier by doing this, and > >> it > >> > >> seems > >> > >> >> to > >> > >> >> >>> work great: > >> > >> >> >>> > >> > >> >> >>> A = f.root.data.cols.A > >> > >> >> >>> B = f.root.data.cols.B > >> > >> >> >>> > >> > >> >> >>> D = np.empty((len(A), len(A)) > >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, > >> > >> >> range(len(A))), > >> > >> >> >>> 2): > >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) > >> > >> >> >>> > >> > >> >> >>> Again, thanks a lot. > >> > >> >> >>> > >> > >> >> >>> -Dave > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < > >> > >> >> >>> pyt...@li...> wrote: > >> > >> >> >>> > >> > >> >> >>> > Send Pytables-users mailing list submissions to > >> > >> >> >>> > pyt...@li... > >> > >> >> >>> > > >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit > >> > >> >> >>> > > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > or, via email, send a message with subject or body 'help' > to > >> > >> >> >>> > pyt...@li... > >> > >> >> >>> > > >> > >> >> >>> > You can reach the person managing the list at > >> > >> >> >>> > pyt...@li... > >> > >> >> >>> > > >> > >> >> >>> > When replying, please edit your Subject line so it is more > >> > >> specific > >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > Today's Topics: > >> > >> >> >>> > > >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony > >> > >> Scopatz) > >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony > >> > >> Scopatz) > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> > >> > ---------------------------------------------------------------------- > >> > >> >> >>> > > >> > >> >> >>> > Message: 1 > >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 > >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol > 80, > >> > >> Issue 3 > >> > >> >> >>> > To: Discussion list for PyTables > >> > >> >> >>> > <pyt...@li...> > >> > >> >> >>> > Message-ID: > >> > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= > >> > >> >> >>> > Gz...@ma...> > >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> > >> >> >>> > > >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < > >> > >> dav...@gm...> > >> > >> >> >>> wrote: > >> > >> >> >>> > > >> > >> >> >>> > > Thanks a lot for the help so far guys! > >> > >> >> >>> > > > >> > >> >> >>> > > Looking at itertools, I found what I believe to be the > >> > perfect > >> > >> >> >>> function > >> > >> >> >>> > > for what I need, itertools.combinations. This appears to > >> be a > >> > >> >> valid > >> > >> >> >>> > > replacement to the method proposed. > >> > >> >> >>> > > > >> > >> >> >>> > > >> > >> >> >>> > Yes, combinations is awesome! > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > > >> > >> >> >>> > > There is a small problem that I didn't mention is that > my > >> > >> compare > >> > >> >> >>> > function > >> > >> >> >>> > > actually takes as inputs 2 columns from the table. Like > >> so: > >> > >> >> >>> > > > >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) > >> > >> >> >>> > > for ii in xrange(N_elements): > >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): > >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], > >> > >> >> >>> > data['element1'][jj],data['element2'][ii], > >> > >> >> >>> > > data['element2'][jj]) > >> > >> >> >>> > > > >> > >> >> >>> > > Is there an efficient way of using itertools with this > >> > >> structure? > >> > >> >> >>> > > > >> > >> >> >>> > > >> > >> >> >>> > You can always make two other iterators for each column. > >> Since > >> > >> you > >> > >> >> >>> have > >> > >> >> >>> > two columns you would have 4 iterators. I am not sure how > >> fast > >> > >> >> this is > >> > >> >> >>> > going to be but I am confident that there is definitely a > >> way > >> > to > >> > >> do > >> > >> >> >>> this in > >> > >> >> >>> > one for-loop, which is going to be way faster than nested > >> > loops. > >> > >> >> >>> > > >> > >> >> >>> > Be Well > >> > >> >> >>> > Anthony > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < > >> > >> >> >>> > > pyt...@li...> wrote: > >> > >> >> >>> > > > >> > >> >> >>> > >> Send Pytables-users mailing list submissions to > >> > >> >> >>> > >> pyt...@li... > >> > >> >> >>> > >> > >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, > visit > >> > >> >> >>> > >> > >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > >> or, via email, send a message with subject or body > >> 'help' to > >> > >> >> >>> > >> pyt...@li... > >> > >> >> >>> > >> > >> > >> >> >>> > >> You can reach the person managing the list at > >> > >> >> >>> > >> pyt...@li... > >> > >> >> >>> > >> > >> > >> >> >>> > >> When replying, please edit your Subject line so it is > >> more > >> > >> >> specific > >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> Today's Topics: > >> > >> >> >>> > >> > >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh > >> > Ayers) > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> ---------------------------------------------------------------------- > >> > >> >> >>> > >> > >> > >> >> >>> > >> Message: 1 > >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 > >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> > >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 > >> using > >> > >> >> >>> PyTables > >> > >> >> >>> > >> To: Discussion list for PyTables > >> > >> >> >>> > >> <pyt...@li...> > >> > >> >> >>> > >> Message-ID: > >> > >> >> >>> > >> < > >> > >> >> >>> > >> > >> > >> >> > >> CAC...@ma...> > >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> >> >>> > >> > >> > >> >> >>> > >> David, > >> > >> >> >>> > >> > >> > >> >> >>> > >> The change in issue 27 was only for iteration over a > >> > >> >> tables.Column > >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. > >> This > >> > >> will > >> > >> >> >>> > iterate > >> > >> >> >>> > >> over the "element" column, as in your original example. > >> > >> >> >>> > >> > >> > >> >> >>> > >> Note also that this will only work with the development > >> > >> version > >> > >> >> of > >> > >> >> >>> > >> PyTables > >> > >> >> >>> > >> available on github. It will be very slow using the > >> > released > >> > >> >> >>> v2.4.0. > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> from itertools import izip > >> > >> >> >>> > >> > >> > >> >> >>> > >> with tb.openFile(...) as f: > >> > >> >> >>> > >> data = f.root.data.cols.element > >> > >> >> >>> > >> data_i = iter(data) > >> > >> >> >>> > >> data_j = iter(data) > >> > >> >> >>> > >> data_i.next() # throw the first value away > >> > >> >> >>> > >> for i, j in izip(data_i, data_j): > >> > >> >> >>> > >> compare(i, j) > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> Hope that helps, > >> > >> >> >>> > >> Josh > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < > >> > >> >> sc...@gm...> > >> > >> >> >>> > >> wrote: > >> > >> >> >>> > >> > >> > >> >> >>> > >> > HI David, > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > Tables and table column iteration have been > overhauled > >> > >> fairly > >> > >> >> >>> recently > >> > >> >> >>> > >> > [1]. So you might try creating two iterators, offset > >> by > >> > >> one, > >> > >> >> and > >> > >> >> >>> then > >> > >> >> >>> > >> > doing the comparison. I am hacking this out super > >> quick > >> > so > >> > >> >> please > >> > >> >> >>> > >> forgive > >> > >> >> >>> > >> > me: > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > from itertools import izip > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > with tb.openFile(...) as f: > >> > >> >> >>> > >> > data = f.root.data > >> > >> >> >>> > >> > data_i = iter(data) > >> > >> >> >>> > >> > data_j = iter(data) > >> > >> >> >>> > >> > data_i.next() # throw the first value away > >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): > >> > >> >> >>> > >> > compare(i, j) > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > You get the idea ;) > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > Be Well > >> > >> >> >>> > >> > Anthony > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> > >> >> >>> dav...@gm...> > >> > >> >> >>> > >> wrote: > >> > >> >> >>> > >> > > >> > >> >> >>> > >> >> I was hoping someone could help me out here. > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> I am have a fairly large dataset that I store in > HDF5 > >> and > >> > >> >> access > >> > >> >> >>> > using > >> > >> >> >>> > >> >> PyTables. One operation I need to do on this dataset > >> are > >> > >> >> pairwise > >> > >> >> >>> > >> >> comparisons between each of the elements. This > >> requires 2 > >> > >> >> loops, > >> > >> >> >>> one > >> > >> >> >>> > to > >> > >> >> >>> > >> >> iterate over each element, and an inner loop to > >> iterate > >> > >> over > >> > >> >> >>> every > >> > >> >> >>> > >> other > >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 > >> > comparisons. > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> For fairly small sets I found it to be faster to > dump > >> the > >> > >> >> >>> contents > >> > >> >> >>> > >> into a > >> > >> >> >>> > >> >> multdimensional numpy array and then do my > iteration. > >> I > >> > run > >> > >> >> into > >> > >> >> >>> > >> problems > >> > >> >> >>> > >> >> with large sets because of memory issues and need to > >> > access > >> > >> >> each > >> > >> >> >>> > >> element of > >> > >> >> >>> > >> >> the dataset at run time. > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> Putting the elements into an array gives me about > 600 > >> > >> >> >>> comparisons per > >> > >> >> >>> > >> >> second, while operating on hdf5 data itself gives me > >> > about > >> > >> 300 > >> > >> >> >>> > >> comparisons > >> > >> >> >>> > >> >> per second. > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> Is there a way to speed this process up? > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> Example follows (this is not my real code, just an > >> > >> example): > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> *Small Set*: > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> > >> >> >>> > >> >> data = f.root.data > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> N_elements = len(data) > >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> for ii, d in enumerate(data): > >> > >> >> >>> > >> >> elements[ii] = data['element'] > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in > >> > >> >> xrange(N_elements): > >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], > >> elements[jj]) > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> *Large Set*: > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: > >> > >> >> >>> > >> >> data = f.root.data > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> N_elements = len(data) > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) > >> > >> >> >>> > >> >> for ii in xrange(N_elements): > >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): > >> > >> >> >>> > >> >> D[ii, jj] = > compare(data['element'][ii], > >> > >> >> >>> > >> data['element'][jj]) > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> > >> > >> > >> > > >> > ------------------------------------------------------------------------------ > >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# > >> 2012, > >> > >> >> HTML5, > >> > >> >> >>> CSS, > >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep > >> your > >> > >> >> skills > >> > >> >> >>> > current > >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video > tutorials > >> by > >> > >> >> >>> Microsoft > >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn > >> more > >> > at: > >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 > >> > >> >> >>> > >> >> _______________________________________________ > >> > >> >> >>> > >> >> Pytables-users mailing list > >> > >> >> >>> > >> >> Pyt...@li... > >> > >> >> >>> > >> >> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> >> > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> > >> > >> > >> > > >> > ------------------------------------------------------------------------------ > >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > >> 2012, > >> > >> >> HTML5, > >> > >> >> >>> CSS, > >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep > >> your > >> > >> skills > >> > >> >> >>> > current > >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials > >> by > >> > >> >> Microsoft > >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn > more > >> > at: > >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 > >> > >> >> >>> > >> > _______________________________________________ > >> > >> >> >>> > >> > Pytables-users mailing list > >> > >> >> >>> > >> > Pyt...@li... > >> > >> >> >>> > >> > > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > >> > >> >> >>> > >> -------------- next part -------------- > >> > >> >> >>> > >> An HTML attachment was scrubbed... > >> > >> >> >>> > >> > >> > >> >> >>> > >> ------------------------------ > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> > >> > >> > >> > > >> > ------------------------------------------------------------------------------ > >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > >> > >> HTML5, > >> > >> >> >>> CSS, > >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep > your > >> > >> skills > >> > >> >> >>> current > >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials > by > >> > >> >> Microsoft > >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more > >> at: > >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 > >> > >> >> >>> > >> > >> > >> >> >>> > >> ------------------------------ > >> > >> >> >>> > >> > >> > >> >> >>> > >> _______________________________________________ > >> > >> >> >>> > >> Pytables-users mailing list > >> > >> >> >>> > >> Pyt...@li... > >> > >> >> >>> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 > >> > >> >> >>> > >> ********************************************* > >> > >> >> >>> > >> > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> > >> > >> > >> > > >> > ------------------------------------------------------------------------------ > >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > 2012, > >> > >> HTML5, > >> > >> >> CSS, > >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your > >> > skills > >> > >> >> >>> current > >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by > >> > >> Microsoft > >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more > >> at: > >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 > >> > >> >> >>> > > _______________________________________________ > >> > >> >> >>> > > Pytables-users mailing list > >> > >> >> >>> > > Pyt...@li... > >> > >> >> >>> > > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > -------------- next part -------------- > >> > >> >> >>> > An HTML attachment was scrubbed... > >> > >> >> >>> > > >> > >> >> >>> > ------------------------------ > >> > >> >> >>> > > >> > >> >> >>> > Message: 2 > >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 > >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> > >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol > 80, > >> > >> Issue 4 > >> > >> >> >>> > To: Discussion list for PyTables > >> > >> >> >>> > <pyt...@li...> > >> > >> >> >>> > Message-ID: > >> > >> >> >>> > < > >> > >> >> >>> > > >> > >> CAP...@ma... > > > >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" > >> > >> >> >>> > > >> > >> >> >>> > Josh is right that you can just edit the code by hand > (which > >> > >> works > >> > >> >> but > >> > >> >> >>> > sucks). > >> > >> >> >>> > > >> > >> >> >>> > However, on Windows -- on the rare occasion when I also > >> have to > >> > >> >> >>> develop on > >> > >> >> >>> > it -- I typically use a distribution that includes a > >> compiler, > >> > >> >> cython, > >> > >> >> >>> > hdf5, and pytables already and then I install my > development > >> > >> version > >> > >> >> >>> from > >> > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, > >> though > >> > >> other > >> > >> >> >>> > distributions listed here [1] might also work. > >> > >> >> >>> > > >> > >> >> >>> > Be well > >> > >> >> >>> > Anthony > >> > >> >> >>> > > >> > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < > >> > jos...@gm... > >> > >> > > >> > >> >> >>> wrote: > >> > >> >> >>> > > >> > >> >> >>> > > The change was in pure Python code, so you should be > able > >> to > >> > >> just > >> > >> >> >>> paste > >> > >> >> >>> > in > >> > >> >> >>> > > the changes to your local copy. Start with the > >> > >> >> table.Column.__iter__ > >> > >> >> >>> > > method (lines 3296-3310) here. > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> > >> > >> > >> > > >> > https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py > >> > >> >> >>> > > > >> > >> >> >>> > > It needs to be modified slightly because it uses some > >> > >> additional > >> > >> >> >>> features > >> > >> >> >>> > > that aren't available in the released version (the > >> > >> out=buf_slice > >> > >> >> >>> argument > >> > >> >> >>> > > to table.read). The following should work. > >> > >> >> >>> > > > >> > >> >> >>> > > def __iter__(self): > >> > >> >> >>> > > table = self.table > >> > >> >> >>> > > itemsize = self.dtype.itemsize > >> > >> >> >>> > > nrowsinbuf = > >> table._v_file.params['IO_BUFFER_SIZE'] > >> > // > >> > >> >> >>> itemsize > >> > >> >> >>> > > max_row = len(self) > >> > >> >> >>> > > for start_row in xrange(0, len(self), > nrowsinbuf): > >> > >> >> >>> > > end_row = min([start_row + nrowsinbuf, > >> max_row]) > >> > >> >> >>> > > buf = table.read(start_row, end_row, 1, > >> > >> >> >>> field=self.pathname) > >> > >> >> >>> > > for row in buf: > >> > >> >> >>> > > yield row > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > I haven't tested this, but I think it will work. > >> > >> >> >>> > > > >> > >> >> >>> > > Josh > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > > >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < > >> > >> >> dav...@gm...> > >> > >> >> >>> > wrote: > >> > >> >> >>> > > > >> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm > >> > forced > >> > >> to > >> > >> >> >>> work on > >> > >> >> >>> > >> Windows 7 at work and have never had luck compiling > >> python > >> > >> source > >> > >> >> >>> > >> successfully. I have had to rely on precompiled > binaries > >> > and > >> > >> now > >> > >> >> >>> its > >> > >> >> >>> > >> biting me in the butt. > >> > >> >> >>> > >> > >> > >> >> >>> > >> Is there any quick fix I can do to improve this > iteration > >> > >> using > >> > >> >> >>> v2.4.0? > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < > >> > >> >> >>> > >> pyt...@li...> wrote: > >> > >> >> >>> > >> > >> > >> >> >>> > >>> Send Pytables-users mailing list submissions to > >> > >> >> >>> > >>> pyt...@li... > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, > >> visit > >> > >> >> >>> > >>> > >> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > >>> or, via email, send a message with subject or body > >> 'help' > >> > to > >> > >> >> >>> > >>> pyt...@li... > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> You can reach the person managing the list at > >> > >> >> >>> > >>> pyt...@li... > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> When replying, please edit your Subject line so it is > >> more > >> > >> >> specific > >> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> Today's Topics: > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 > (David > >> > Reed) > >> > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 > (David > >> > Reed) > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >> > >> > >> ---------------------------------------------------------------------- > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> Message: 1 > >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 > >> > >> >> >>> > >>> From: David Reed <dav...@gm...> > >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, > Vol > >> > 80, > >> > >> >> Issue > >> > >> >> >>> 2 > >> > >> >> >>> > >>> To: pyt...@li... > >> > >> >> >>> > >>> Message-ID: > >> > >> >> >>> > >>> > <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= > >> > >> >> >>> > >>> ev...@ma...> > >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I > don't > >> > think > >> > >> >> that > >> > >> >> >>> > method > >> > >> >> >>> > >>> will work since this will only be comparing the ith > >> element > >> > >> with > >> > >> >> >>> ith+1 > >> > >> >> >>> > >>> element. I still need 2 for loops right? > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> Using itertools might speed things up though, I've > never > >> > used > >> > >> >> them > >> > >> >> >>> so I > >> > >> >> >>> > >>> will give it a shot and let you know how it goes. > Looks > >> > >> like I > >> > >> >> >>> need to > >> > >> >> >>> > >>> download the latest release before I do that too. > >> Thanks > >> > for > >> > >> >> the > >> > >> >> >>> help. > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> -Dave > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < > >> > >> >> >>> > >>> pyt...@li...> wrote: > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to > >> > >> >> >>> > >>> > pyt...@li... > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, > >> visit > >> > >> >> >>> > >>> > > >> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > >>> > or, via email, send a message with subject or body > >> 'help' > >> > >> to > >> > >> >> >>> > >>> > > pyt...@li... > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > You can reach the person managing the list at > >> > >> >> >>> > >>> > pyt...@li... > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > When replying, please edit your Subject line so it > is > >> > more > >> > >> >> >>> specific > >> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > Today's Topics: > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables > >> > (Anthony > >> > >> >> >>> Scopatz) > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > > >> > >> >> > >> > ---------------------------------------------------------------------- > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > Message: 1 > >> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 > >> > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> > >> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of > HDF5 > >> > >> using > >> > >> >> >>> PyTables > >> > >> >> >>> > >>> > To: Discussion list for PyTables > >> > >> >> >>> > >>> > <pyt...@li...> > >> > >> >> >>> > >>> > Message-ID: > >> > >> >> >>> > >>> > <CAPk-6T5b= > >> > >> >> >>> > >>> > > >> 1EG...@ma... > >> > > > >> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > HI David, > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > Tables and table column iteration have been > overhauled > >> > >> fairly > >> > >> >> >>> > recently > >> > >> >> >>> > >>> [1]. > >> > >> >> >>> > >>> > So you might try creating two iterators, offset by > >> one, > >> > >> and > >> > >> >> then > >> > >> >> >>> > >>> doing the > >> > >> >> >>> > >>> > comparison. I am hacking this out super quick so > >> please > >> > >> >> forgive > >> > >> >> >>> me: > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > from itertools import izip > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > with tb.openFile(...) as f: > >> > >> >> >>> > >>> > data = f.root.data > >> > >> >> >>> > >>> > data_i = iter(data) > >> > >> >> >>> > >>> > data_j = iter(data) > >> > >> >> >>> > >>> > data_i.next() # throw the first value away > >> > >> >> >>> > >>> > for i, j in izip(data_i, data_j): > >> > >> >> >>> > >>> > compare(i, j) > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > You get the idea ;) > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > Be Well > >> > >> >> >>> > >>> > Anthony > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < > >> > >> >> >>> dav...@gm...> > >> > >> >> >>> > >>> wrote: > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > I was hoping someone could help me out here. > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > This is from a post I put up on StackOverflow, > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > I am have a fairly large dataset that I store in > >> HDF5 > >> > and > >> > >> >> >>> access > >> > >> >> >>> > >>> using > >> > >> >> >>> > >>> > > PyTables. One operation I need to do on this > dataset > >> > are > >> > >> >> >>> pairwise > >> > >> >> >>> > >>> > > comparisons between each of the elements. This > >> > requires 2 > >> > >> >> >>> loops, > >> > >> >> >>> > one > >> > >> >> >>> > >>> to > >> > >> >> >>> > >>> > > iterate over each element, and an inner loop to > >> iterate > >> > >> over > >> > >> >> >>> every > >> > >> >> >>> > >>> other > >> > >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 > >> > >> comparisons. > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > For fairly small sets I found it to be faster to > >> dump > >> > the > >> > >> >> >>> contents > >> > >> >> >>> > >>> into a > >> > >> >> >>> > >>> > > multdimensional numpy array and then do my > >> iteration. I > >> > >> run > >> > >> >> >>> into > >> > >> >> >>> > >>> problems > >> > >> >> >>> > >>> > > with large sets because of memory issues and need > to > >> > >> access > >> > >> >> >>> each > >> > >> >> >>> > >>> element > >> > >> >> >>> > >>> > of > >> > >> >> >>> > >>> > > the dataset at run time. > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > Putting the elements into an array gives me about > >> 600 > >> > >> >> >>> comparisons > >> > >> >> >>> > per > >> > >> >> >>> > >>> > > second, while operating on hdf5 data itself gives > me > >> > >> about > >> > >> >> 300 > >> > >> >> >>> > >>> > comparisons > >> > >> >> >>> > >>> > > per second. > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > Is there a way to speed this process up? > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > Example follows (this is not my real code, just an > >> > >> example): > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > *Small Set*: > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > >> > >> >> >>> > >>> > > data = f.root.data > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > N_elements = len(data) > >> > >> >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > for ii, d in enumerate(data): > >> > >> >> >>> > >>> > > elements[ii] = data['element'] > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in > >> > >> >> >>> xrange(N_elements): > >> > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > >> > >> >> >>> > >>> > > D[ii, jj] = compare(elements[ii], > >> elements[jj]) > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > *Large Set*: > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: > >> > >> >> >>> > >>> > > data = f.root.data > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > N_elements = len(data) > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) > >> > >> >> >>> > >>> > > for ii in xrange(N_elements): > >> > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): > >> > >> >> >>> > >>> > > D[ii, jj] = > >> compare(data['element'][ii], > >> > >> >> >>> > >>> > data['element'][jj]) > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> > >> > >> > >> > > >> > ------------------------------------------------------------------------------ > >> > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, > C# > >> > 2012, > >> > >> >> >>> HTML5, > >> > >> >> >>> > CSS, > >> > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. > Keep > >> > your > >> > >> >> skills > >> > >> >> >>> > >>> current > >> > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video > >> tutorials > >> > by > >> > >> >> >>> Microsoft > >> > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn > >> more > >> > >> at: > >> > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 > >> > >> >> >>> > >>> > > _______________________________________________ > >> > >> >> >>> > >>> > > Pytables-users mailing list > >> > >> >> >>> > >>> > > Pyt...@li... > >> > >> >> >>> > >>> > > > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > > > >> > >> >> >>> > >>> > -------------- next part -------------- > >> > >> >> >>> > >>> > An HTML attachment was scrubbed... > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > ------------------------------ > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> > >> > >> > >> > > >> > ------------------------------------------------------------------------------ > >> > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# > >> 2012, > >> > >> >> HTML5, > >> > >> >> >>> CSS, > >> > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep > >> your > >> > >> >> skills > >> > >> >> >>> > current > >> > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video > tutorials > >> by > >> > >> >> >>> Microsoft > >> > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn > >> more > >> > at: > >> > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > ------------------------------ > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > _______________________________________________ > >> > >> >> >>> > >>> > Pytables-users mailing list > >> > >> >> >>> > >>> > Pyt...@li... > >> > >> >> >>> > >>> > > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 > >> > >> >> >>> > >>> > ***************... [truncated message content] |
From: Anthony S. <sc...@gm...> - 2013-02-04 16:16:55
|
On Mon, Feb 4, 2013 at 9:53 AM, David Reed <dav...@gm...> wrote: > Hi Josh, > > Here is my __iter__ code: > > def __iter__(self): > table = self.table > itemsize = self.dtype.itemsize > nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize > max_row = len(self) > for start_row in xrange(0, len(self), nrowsinbuf): > end_row = min([start_row + nrowsinbuf, max_row]) > buf = table.read(start_row, end_row, 1, field=self.pathname) > for row in buf: > yield row > > It does look different, I will try swapping in the code from github and > see what happens. > Yes, please let us know how that goes! Otherwise send the list both the test data generator script and the script that fails. Be Well Anthony > > > On Mon, Feb 4, 2013 at 9:59 AM, < > pyt...@li...> wrote: > >> Send Pytables-users mailing list submissions to >> pyt...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> or, via email, send a message with subject or body 'help' to >> pyt...@li... >> >> You can reach the person managing the list at >> pyt...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Pytables-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Josh Ayers) >> 2. Re: Pytables-users Digest, Vol 81, Issue 6 (David Reed) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 1 Feb 2013 14:08:47 -0800 >> From: Josh Ayers <jos...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 >> To: Discussion list for PyTables >> <pyt...@li...> >> Message-ID: >> <CACOB4aPG4NZ6b2a3v= >> 1Ue...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> David, >> >> You added a custom version of table.Column.__iter__, correct? Could you >> also include that along with the script to reproduce the error? >> >> It seems like the problem may be in the 'nrowsinbuf' calculation - see >> [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using the >> default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6 >> chunks. Instead, it's reading the entire table. >> >> [1]: >> https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 >> >> >> >> On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> >> wrote: >> >> > >> > >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> >> wrote: >> > >> >> at the error: >> >> >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> >> >> I'm not sure what that means as a dtype, but thats what it is. >> >> >> >> Forgive me if I'm being totally naive, but I thought the whole point of >> >> __iter__ with pyttables was to do iteration on the fly, so there is no >> >> preallocation. >> >> >> > >> > Nope you are not being naive at all. That is the point. >> > >> > >> >> If you have any ideas on this I'm all ears. >> >> >> > >> > If you could send a minimal script which reproduces this error, that >> would >> > help a lot. >> > >> > Be Well >> > Anthony >> > >> > >> >> >> >> >> >> Thanks again. >> >> >> >> Dave >> >> >> >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < >> >> pyt...@li...> wrote: >> >> >> >>> Send Pytables-users mailing list submissions to >> >>> pyt...@li... >> >>> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >>> or, via email, send a message with subject or body 'help' to >> >>> pyt...@li... >> >>> >> >>> You can reach the person managing the list at >> >>> pyt...@li... >> >>> >> >>> When replying, please edit your Subject line so it is more specific >> >>> than "Re: Contents of Pytables-users digest..." >> >>> >> >>> >> >>> Today's Topics: >> >>> >> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >> >>> >> >>> >> >>> ---------------------------------------------------------------------- >> >>> >> >>> Message: 1 >> >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> >>> From: Anthony Scopatz <sc...@gm...> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 >> >>> To: Discussion list for PyTables >> >>> <pyt...@li...> >> >>> Message-ID: >> >>> < >> >>> CAP...@ma...> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> >>> >> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> >> >>> wrote: >> >>> >> >>> > Hi Anthony, >> >>> > >> >>> > Thanks for the reply. >> >>> > >> >>> > I honestly don't know how to monitor my Python memory usage, but I'm >> >>> sure >> >>> > that its caused by out of memory. >> >>> > >> >>> >> >>> Well, I would just run top or process monitor or something while >> running >> >>> the python script to see what happens to memory usage as the script >> chugs >> >>> along... >> >>> >> >>> >> >>> > I'm just trying to find out how to fix it. My HDF5 table has 4620 >> >>> rows >> >>> > and the column I'm iterating over is a 17x9600 boolean matrix. The >> >>> > __iter__ method is preallocating an array that is this size which >> >>> appears >> >>> > to be root of the error. I was hoping there is a fix somewhere in >> >>> here to >> >>> > not have to do this preallocation. >> >>> > >> >>> >> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of >> >>> these is ~760 MB. If you have 2 GB of memory and you are iterating >> over >> >>> 2 >> >>> of these (templates & masks) it is conceivable that you are just >> running >> >>> out of memory. Maybe there is a way that __iter__ could not >> preallocate >> >>> something that is basically a temporary. What is the dtype of the >> >>> templates array? >> >>> >> >>> Be Well >> >>> Anthony >> >>> >> >>> >> >>> > >> >>> > Thanks again. >> >>> >> >>> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 4 Feb 2013 09:58:53 -0500 >> From: David Reed <dav...@gm...> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 6 >> To: pyt...@li... >> Message-ID: >> <CAM6XA7= >> h50...@ma...> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi Anthony, >> >> Sorry to just get back to you. I can send a script, should I send a script >> that creates some fake data as well? >> >> -Dave >> >> >> On Fri, Feb 1, 2013 at 4:50 PM, < >> pyt...@li...> wrote: >> >> > Send Pytables-users mailing list submissions to >> > pyt...@li... >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > or, via email, send a message with subject or body 'help' to >> > pyt...@li... >> > >> > You can reach the person managing the list at >> > pyt...@li... >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Pytables-users digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Pytables-users Digest, Vol 81, Issue 4 (Anthony Scopatz) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Fri, 1 Feb 2013 15:50:11 -0600 >> > From: Anthony Scopatz <sc...@gm...> >> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4 >> > To: Discussion list for PyTables >> > <pyt...@li...> >> > Message-ID: >> > < >> > CAP...@ma...> >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> >> wrote: >> > >> > > at the error: >> > > >> > > result = numpy.empty(shape=nrows, dtype=dtypeField) >> > > >> > > nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> > > >> > > I'm not sure what that means as a dtype, but thats what it is. >> > > >> > > Forgive me if I'm being totally naive, but I thought the whole point >> of >> > > __iter__ with pyttables was to do iteration on the fly, so there is no >> > > preallocation. >> > > >> > >> > Nope you are not being naive at all. That is the point. >> > >> > >> > > If you have any ideas on this I'm all ears. >> > > >> > >> > If you could send a minimal script which reproduces this error, that >> would >> > help a lot. >> > >> > Be Well >> > Anthony >> > >> > >> > > >> > > >> > > Thanks again. >> > > >> > > Dave >> > > >> > > >> > > On Fri, Feb 1, 2013 at 3:45 PM, < >> > > pyt...@li...> wrote: >> > > >> > >> Send Pytables-users mailing list submissions to >> > >> pyt...@li... >> > >> >> > >> To subscribe or unsubscribe via the World Wide Web, visit >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> or, via email, send a message with subject or body 'help' to >> > >> pyt...@li... >> > >> >> > >> You can reach the person managing the list at >> > >> pyt...@li... >> > >> >> > >> When replying, please edit your Subject line so it is more specific >> > >> than "Re: Contents of Pytables-users digest..." >> > >> >> > >> >> > >> Today's Topics: >> > >> >> > >> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >> > >> >> > >> >> > >> >> ---------------------------------------------------------------------- >> > >> >> > >> Message: 1 >> > >> Date: Fri, 1 Feb 2013 14:44:40 -0600 >> > >> From: Anthony Scopatz <sc...@gm...> >> > >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 >> > >> To: Discussion list for PyTables >> > >> <pyt...@li...> >> > >> Message-ID: >> > >> < >> > >> CAP...@ma...> >> > >> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> > >> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> >> > >> wrote: >> > >> >> > >> > Hi Anthony, >> > >> > >> > >> > Thanks for the reply. >> > >> > >> > >> > I honestly don't know how to monitor my Python memory usage, but >> I'm >> > >> sure >> > >> > that its caused by out of memory. >> > >> > >> > >> >> > >> Well, I would just run top or process monitor or something while >> running >> > >> the python script to see what happens to memory usage as the script >> > chugs >> > >> along... >> > >> >> > >> >> > >> > I'm just trying to find out how to fix it. My HDF5 table has 4620 >> > rows >> > >> > and the column I'm iterating over is a 17x9600 boolean matrix. The >> > >> > __iter__ method is preallocating an array that is this size which >> > >> appears >> > >> > to be root of the error. I was hoping there is a fix somewhere in >> > here >> > >> to >> > >> > not have to do this preallocation. >> > >> > >> > >> >> > >> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 >> of >> > >> these is ~760 MB. If you have 2 GB of memory and you are iterating >> > over 2 >> > >> of these (templates & masks) it is conceivable that you are just >> running >> > >> out of memory. Maybe there is a way that __iter__ could not >> preallocate >> > >> something that is basically a temporary. What is the dtype of the >> > >> templates array? >> > >> >> > >> Be Well >> > >> Anthony >> > >> >> > >> >> > >> > >> > >> > Thanks again. >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > On Fri, Feb 1, 2013 at 11:12 AM, < >> > >> > pyt...@li...> wrote: >> > >> > >> > >> >> Send Pytables-users mailing list submissions to >> > >> >> pyt...@li... >> > >> >> >> > >> >> To subscribe or unsubscribe via the World Wide Web, visit >> > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> or, via email, send a message with subject or body 'help' to >> > >> >> pyt...@li... >> > >> >> >> > >> >> You can reach the person managing the list at >> > >> >> pyt...@li... >> > >> >> >> > >> >> When replying, please edit your Subject line so it is more >> specific >> > >> >> than "Re: Contents of Pytables-users digest..." >> > >> >> >> > >> >> >> > >> >> Today's Topics: >> > >> >> >> > >> >> 1. Re: Pytables-users Digest, Vol 80, Issue 9 (Anthony Scopatz) >> > >> >> >> > >> >> >> > >> >> >> > ---------------------------------------------------------------------- >> > >> >> >> > >> >> Message: 1 >> > >> >> Date: Fri, 1 Feb 2013 10:11:47 -0600 >> > >> >> From: Anthony Scopatz <sc...@gm...> >> > >> >> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> Issue 9 >> > >> >> To: Discussion list for PyTables >> > >> >> <pyt...@li...> >> > >> >> Message-ID: >> > >> >> < >> > >> >> >> CAP...@ma...> >> > >> >> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >> > >> >> Hi David, >> > >> >> >> > >> >> Sorry, I haven't had a ton of time recently. You seem to be >> getting >> > a >> > >> >> memory error on creating a numpy array. This kind of thing >> typically >> > >> >> happens when you are out of memory. Does this seem to be the case >> > with >> > >> >> you? When this dies, is your memory usage at 100%? If so, this >> > >> algorithm >> > >> >> might require a little tweaking... >> > >> >> >> > >> >> Be Well >> > >> >> Anthony >> > >> >> >> > >> >> >> > >> >> On Fri, Feb 1, 2013 at 6:15 AM, David Reed < >> dav...@gm...> >> > >> >> wrote: >> > >> >> >> > >> >> > I'm still having problems with this one. I can't tell if this >> > >> something >> > >> >> > dumb Im doing with itertools, or if its something in pytables. >> > >> >> > >> > >> >> > Would appreciate any help. >> > >> >> > >> > >> >> > Thanks >> > >> >> > >> > >> >> > >> > >> >> > On Wed, Jan 30, 2013 at 5:00 PM, David Reed < >> > dav...@gm... >> > >> >> >wrote: >> > >> >> > >> > >> >> >> I think I have to reopen this issue. I have been running fine >> for >> > >> >> awhile >> > >> >> >> using the combinations method from itertools, but have recently >> > run >> > >> >> into a >> > >> >> >> memory since I have recently quadrupled the size of the hdf >> file. >> > >> >> >> >> > >> >> >> Here is my code again: >> > >> >> >> >> > >> >> >> from itertools import combinations, izip >> > >> >> >> with tb.openFile(h5_all, 'r') as f: >> > >> >> >> irises = f.root.irises >> > >> >> >> >> > >> >> >> templates = f.root.irises.cols.templates >> > >> >> >> masks = f.root.irises.cols.masks1 >> > >> >> >> >> > >> >> >> N_irises = len(irises) >> > >> >> >> index = np.ones((20 * 480), np.bool) >> > >> >> >> >> > >> >> >> print '%i Comparisons' % (N_irises*(N_irises - 1)/2) >> > >> >> >> D = np.empty((N_irises, N_irises)) >> > >> >> >> for (t1, m1, ii), (t2, m2, jj) in combinations(izip(templates, >> > >> masks, >> > >> >> >> range(N_irises)), 2): >> > >> >> >> # print ii >> > >> >> >> D[ii, jj] = ham_dist( >> > >> >> >> t1[8, index], >> > >> >> >> t2[:, index], >> > >> >> >> m1[8, index], >> > >> >> >> m2[:, index], >> > >> >> >> ) >> > >> >> >> >> > >> >> >> And here is the error: >> > >> >> >> >> > >> >> >> In [10]: get_hd3() >> > >> >> >> 10669890 Comparisons >> > >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> --------------------------------------------------------------------------- >> > >> >> >> MemoryError Traceback (most >> recent >> > >> call >> > >> >> >> last) >> > >> >> >> <ipython-input-10-cfb255ce7bd1> in <module>() >> > >> >> >> ----> 1 get_hd3() >> > >> >> >> >> > >> >> >> >> > >> >> >> 118 print '%i Comparisons' % >> > >> (N_irises*(N_irises - >> > >> >> >> 1)/2) >> > >> >> >> 119 D = np.empty((N_irises, N_irises)) >> > >> >> >> --> 120 for (t1, m1, ii), (t2, m2, jj) in >> > >> >> >> combinations(izip(temp >> > >> >> >> lates, masks, range(N_irises)), 2): >> > >> >> >> 121 # print ii >> > >> >> >> 122 D[ii, jj] = ham_dist( >> > >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in >> __iter__(self) >> > >> >> >> 3274 for start_row in xrange(0, len(self), >> nrowsinbuf): >> > >> >> >> 3275 end_row = min([start_row + nrowsinbuf, >> > max_row]) >> > >> >> >> -> 3276 buf = table.read(start_row, end_row, 1, >> > >> >> >> field=self.pathname) >> > >> >> >> >> > >> >> >> 3277 for row in buf: >> > >> >> >> 3278 yield row >> > >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in read(self, >> > start, >> > >> >> stop, >> > >> >> >> step, >> > >> >> >> field) >> > >> >> >> 1772 (start, stop, step) = >> > self._processRangeRead(start, >> > >> >> stop, >> > >> >> >> step) >> > >> >> >> 1773 >> > >> >> >> -> 1774 arr = self._read(start, stop, step, field) >> > >> >> >> 1775 return internal_to_flavor(arr, self.flavor) >> > >> >> >> 1776 >> > >> >> >> >> > >> >> >> c:\python27\lib\site-packages\tables\table.pyc in _read(self, >> > start, >> > >> >> >> stop, step, >> > >> >> >> field) >> > >> >> >> 1719 if field: >> > >> >> >> 1720 # Create a container for the results >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> > >> dtype=dtypeField) >> > >> >> >> 1722 else: >> > >> >> >> 1723 # Recarray case >> > >> >> >> >> > >> >> >> MemoryError: >> > >> >> >> > c:\python27\lib\site-packages\tables\table.py(1721)_read() >> > >> >> >> 1720 # Create a container for the results >> > >> >> >> -> 1721 result = numpy.empty(shape=nrows, >> > >> dtype=dtypeField) >> > >> >> >> 1722 else: >> > >> >> >> >> > >> >> >> Also, if you guys see any performance problems in my code, >> please >> > >> let >> > >> >> me >> > >> >> >> know. >> > >> >> >> >> > >> >> >> Thank you so much for the help. >> > >> >> >> >> > >> >> >> -Dave >> > >> >> >> >> > >> >> >> >> > >> >> >> On Fri, Jan 4, 2013 at 8:57 AM, < >> > >> >> >> pyt...@li...> wrote: >> > >> >> >> >> > >> >> >>> Send Pytables-users mailing list submissions to >> > >> >> >>> pyt...@li... >> > >> >> >>> >> > >> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> > >> >> >>> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> or, via email, send a message with subject or body 'help' to >> > >> >> >>> pyt...@li... >> > >> >> >>> >> > >> >> >>> You can reach the person managing the list at >> > >> >> >>> pyt...@li... >> > >> >> >>> >> > >> >> >>> When replying, please edit your Subject line so it is more >> > specific >> > >> >> >>> than "Re: Contents of Pytables-users digest..." >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> Today's Topics: >> > >> >> >>> >> > >> >> >>> 1. Re: Pytables-users Digest, Vol 80, Issue 8 (David Reed) >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> ---------------------------------------------------------------------- >> > >> >> >>> >> > >> >> >>> Message: 1 >> > >> >> >>> Date: Fri, 4 Jan 2013 08:56:28 -0500 >> > >> >> >>> From: David Reed <dav...@gm...> >> > >> >> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> > Issue >> > >> 8 >> > >> >> >>> To: pyt...@li... >> > >> >> >>> Message-ID: >> > >> >> >>> < >> > >> >> >>> >> > CAM...@ma... >> > >> > >> > >> >> >>> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> >> > >> >> >>> I can't thank you guys enough for the help. I was able to add >> > the >> > >> >> >>> __iter__ >> > >> >> >>> function to the table.py file and everything seems to be >> working >> > >> >> great! >> > >> >> >>> I'm not quite as fast as I was with iterating right of a >> matrix >> > >> but >> > >> >> >>> pretty >> > >> >> >>> close. I was at 555 comparisons per second, and now im at >> 420. >> > >> >> >>> >> > >> >> >>> I handled the problem I mentioned earlier by doing this, and >> it >> > >> seems >> > >> >> to >> > >> >> >>> work great: >> > >> >> >>> >> > >> >> >>> A = f.root.data.cols.A >> > >> >> >>> B = f.root.data.cols.B >> > >> >> >>> >> > >> >> >>> D = np.empty((len(A), len(A)) >> > >> >> >>> for (a1, b1, ii), (a2, b2, jj) in combinations(izip(A, B, >> > >> >> range(len(A))), >> > >> >> >>> 2): >> > >> >> >>> D[ii, jj] = compare(a1, a2, b1, b2) >> > >> >> >>> >> > >> >> >>> Again, thanks a lot. >> > >> >> >>> >> > >> >> >>> -Dave >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> On Thu, Jan 3, 2013 at 6:31 PM, < >> > >> >> >>> pyt...@li...> wrote: >> > >> >> >>> >> > >> >> >>> > Send Pytables-users mailing list submissions to >> > >> >> >>> > pyt...@li... >> > >> >> >>> > >> > >> >> >>> > To subscribe or unsubscribe via the World Wide Web, visit >> > >> >> >>> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > or, via email, send a message with subject or body 'help' to >> > >> >> >>> > pyt...@li... >> > >> >> >>> > >> > >> >> >>> > You can reach the person managing the list at >> > >> >> >>> > pyt...@li... >> > >> >> >>> > >> > >> >> >>> > When replying, please edit your Subject line so it is more >> > >> specific >> > >> >> >>> > than "Re: Contents of Pytables-users digest..." >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > Today's Topics: >> > >> >> >>> > >> > >> >> >>> > 1. Re: Pytables-users Digest, Vol 80, Issue 3 (Anthony >> > >> Scopatz) >> > >> >> >>> > 2. Re: Pytables-users Digest, Vol 80, Issue 4 (Anthony >> > >> Scopatz) >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >> > ---------------------------------------------------------------------- >> > >> >> >>> > >> > >> >> >>> > Message: 1 >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:26:55 -0600 >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> > >> Issue 3 >> > >> >> >>> > To: Discussion list for PyTables >> > >> >> >>> > <pyt...@li...> >> > >> >> >>> > Message-ID: >> > >> >> >>> > <CAPk-6T6sz=J5ay_a9YGLPe_yBLGa9c+XgxG0CRNs6fJ= >> > >> >> >>> > Gz...@ma...> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >> > >> >> >>> > On Thu, Jan 3, 2013 at 2:17 PM, David Reed < >> > >> dav...@gm...> >> > >> >> >>> wrote: >> > >> >> >>> > >> > >> >> >>> > > Thanks a lot for the help so far guys! >> > >> >> >>> > > >> > >> >> >>> > > Looking at itertools, I found what I believe to be the >> > perfect >> > >> >> >>> function >> > >> >> >>> > > for what I need, itertools.combinations. This appears to >> be a >> > >> >> valid >> > >> >> >>> > > replacement to the method proposed. >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> >>> > Yes, combinations is awesome! >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > > >> > >> >> >>> > > There is a small problem that I didn't mention is that my >> > >> compare >> > >> >> >>> > function >> > >> >> >>> > > actually takes as inputs 2 columns from the table. Like >> so: >> > >> >> >>> > > >> > >> >> >>> > > D = np.empty((N_irises, N_irises)) >> > >> >> >>> > > for ii in xrange(N_elements): >> > >> >> >>> > > for jj in xrange(ii+1, N_elements): >> > >> >> >>> > > D[ii, jj] = compare(data['element1'][ii], >> > >> >> >>> > data['element1'][jj],data['element2'][ii], >> > >> >> >>> > > data['element2'][jj]) >> > >> >> >>> > > >> > >> >> >>> > > Is there an efficient way of using itertools with this >> > >> structure? >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> >>> > You can always make two other iterators for each column. >> Since >> > >> you >> > >> >> >>> have >> > >> >> >>> > two columns you would have 4 iterators. I am not sure how >> fast >> > >> >> this is >> > >> >> >>> > going to be but I am confident that there is definitely a >> way >> > to >> > >> do >> > >> >> >>> this in >> > >> >> >>> > one for-loop, which is going to be way faster than nested >> > loops. >> > >> >> >>> > >> > >> >> >>> > Be Well >> > >> >> >>> > Anthony >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:29 PM, < >> > >> >> >>> > > pyt...@li...> wrote: >> > >> >> >>> > > >> > >> >> >>> > >> Send Pytables-users mailing list submissions to >> > >> >> >>> > >> pyt...@li... >> > >> >> >>> > >> >> > >> >> >>> > >> To subscribe or unsubscribe via the World Wide Web, visit >> > >> >> >>> > >> >> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >> or, via email, send a message with subject or body >> 'help' to >> > >> >> >>> > >> pyt...@li... >> > >> >> >>> > >> >> > >> >> >>> > >> You can reach the person managing the list at >> > >> >> >>> > >> pyt...@li... >> > >> >> >>> > >> >> > >> >> >>> > >> When replying, please edit your Subject line so it is >> more >> > >> >> specific >> > >> >> >>> > >> than "Re: Contents of Pytables-users digest..." >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> Today's Topics: >> > >> >> >>> > >> >> > >> >> >>> > >> 1. Re: Nested Iteration of HDF5 using PyTables (Josh >> > Ayers) >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> >> > >> >> ---------------------------------------------------------------------- >> > >> >> >>> > >> >> > >> >> >>> > >> Message: 1 >> > >> >> >>> > >> Date: Thu, 3 Jan 2013 10:29:33 -0800 >> > >> >> >>> > >> From: Josh Ayers <jos...@gm...> >> > >> >> >>> > >> Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> using >> > >> >> >>> PyTables >> > >> >> >>> > >> To: Discussion list for PyTables >> > >> >> >>> > >> <pyt...@li...> >> > >> >> >>> > >> Message-ID: >> > >> >> >>> > >> < >> > >> >> >>> > >> >> > >> >> >> CAC...@ma...> >> > >> >> >>> > >> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >> >> > >> >> >>> > >> David, >> > >> >> >>> > >> >> > >> >> >>> > >> The change in issue 27 was only for iteration over a >> > >> >> tables.Column >> > >> >> >>> > >> instance. To use it, tweak Anthony's code as follows. >> This >> > >> will >> > >> >> >>> > iterate >> > >> >> >>> > >> over the "element" column, as in your original example. >> > >> >> >>> > >> >> > >> >> >>> > >> Note also that this will only work with the development >> > >> version >> > >> >> of >> > >> >> >>> > >> PyTables >> > >> >> >>> > >> available on github. It will be very slow using the >> > released >> > >> >> >>> v2.4.0. >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> from itertools import izip >> > >> >> >>> > >> >> > >> >> >>> > >> with tb.openFile(...) as f: >> > >> >> >>> > >> data = f.root.data.cols.element >> > >> >> >>> > >> data_i = iter(data) >> > >> >> >>> > >> data_j = iter(data) >> > >> >> >>> > >> data_i.next() # throw the first value away >> > >> >> >>> > >> for i, j in izip(data_i, data_j): >> > >> >> >>> > >> compare(i, j) >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> Hope that helps, >> > >> >> >>> > >> Josh >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz < >> > >> >> sc...@gm...> >> > >> >> >>> > >> wrote: >> > >> >> >>> > >> >> > >> >> >>> > >> > HI David, >> > >> >> >>> > >> > >> > >> >> >>> > >> > Tables and table column iteration have been overhauled >> > >> fairly >> > >> >> >>> recently >> > >> >> >>> > >> > [1]. So you might try creating two iterators, offset >> by >> > >> one, >> > >> >> and >> > >> >> >>> then >> > >> >> >>> > >> > doing the comparison. I am hacking this out super >> quick >> > so >> > >> >> please >> > >> >> >>> > >> forgive >> > >> >> >>> > >> > me: >> > >> >> >>> > >> > >> > >> >> >>> > >> > from itertools import izip >> > >> >> >>> > >> > >> > >> >> >>> > >> > with tb.openFile(...) as f: >> > >> >> >>> > >> > data = f.root.data >> > >> >> >>> > >> > data_i = iter(data) >> > >> >> >>> > >> > data_j = iter(data) >> > >> >> >>> > >> > data_i.next() # throw the first value away >> > >> >> >>> > >> > for i, j in izip(data_i, data_j): >> > >> >> >>> > >> > compare(i, j) >> > >> >> >>> > >> > >> > >> >> >>> > >> > You get the idea ;) >> > >> >> >>> > >> > >> > >> >> >>> > >> > Be Well >> > >> >> >>> > >> > Anthony >> > >> >> >>> > >> > >> > >> >> >>> > >> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> > >> >> >>> dav...@gm...> >> > >> >> >>> > >> wrote: >> > >> >> >>> > >> > >> > >> >> >>> > >> >> I was hoping someone could help me out here. >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> This is from a post I put up on StackOverflow, >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> I am have a fairly large dataset that I store in HDF5 >> and >> > >> >> access >> > >> >> >>> > using >> > >> >> >>> > >> >> PyTables. One operation I need to do on this dataset >> are >> > >> >> pairwise >> > >> >> >>> > >> >> comparisons between each of the elements. This >> requires 2 >> > >> >> loops, >> > >> >> >>> one >> > >> >> >>> > to >> > >> >> >>> > >> >> iterate over each element, and an inner loop to >> iterate >> > >> over >> > >> >> >>> every >> > >> >> >>> > >> other >> > >> >> >>> > >> >> element. This operation thus looks at N(N-1)/2 >> > comparisons. >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> For fairly small sets I found it to be faster to dump >> the >> > >> >> >>> contents >> > >> >> >>> > >> into a >> > >> >> >>> > >> >> multdimensional numpy array and then do my iteration. >> I >> > run >> > >> >> into >> > >> >> >>> > >> problems >> > >> >> >>> > >> >> with large sets because of memory issues and need to >> > access >> > >> >> each >> > >> >> >>> > >> element of >> > >> >> >>> > >> >> the dataset at run time. >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> Putting the elements into an array gives me about 600 >> > >> >> >>> comparisons per >> > >> >> >>> > >> >> second, while operating on hdf5 data itself gives me >> > about >> > >> 300 >> > >> >> >>> > >> comparisons >> > >> >> >>> > >> >> per second. >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> Is there a way to speed this process up? >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> Example follows (this is not my real code, just an >> > >> example): >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> *Small Set*: >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> > >> >> >>> > >> >> data = f.root.data >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> > >> >> >>> > >> >> elements = np.empty((N_irises, 1e5)) >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> for ii, d in enumerate(data): >> > >> >> >>> > >> >> elements[ii] = data['element'] >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) for ii in >> > >> >> xrange(N_elements): >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >> >> D[ii, jj] = compare(elements[ii], >> elements[jj]) >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> *Large Set*: >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> with tb.openFile(h5_file, 'r') as f: >> > >> >> >>> > >> >> data = f.root.data >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> N_elements = len(data) >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> D = np.empty((N_irises, N_irises)) >> > >> >> >>> > >> >> for ii in xrange(N_elements): >> > >> >> >>> > >> >> for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >> >> D[ii, jj] = compare(data['element'][ii], >> > >> >> >>> > >> data['element'][jj]) >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> > >> >> HTML5, >> > >> >> >>> CSS, >> > >> >> >>> > >> >> MVC, Windows 8 Apps, JavaScript and much more. Keep >> your >> > >> >> skills >> > >> >> >>> > current >> > >> >> >>> > >> >> with LearnDevNow - 3,200 step-by-step video tutorials >> by >> > >> >> >>> Microsoft >> > >> >> >>> > >> >> MVPs and experts. ON SALE this month only -- learn >> more >> > at: >> > >> >> >>> > >> >> http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >> >> _______________________________________________ >> > >> >> >>> > >> >> Pytables-users mailing list >> > >> >> >>> > >> >> Pyt...@li... >> > >> >> >>> > >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >> >> >> > >> >> >>> > >> >> >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> > >> >> HTML5, >> > >> >> >>> CSS, >> > >> >> >>> > >> > MVC, Windows 8 Apps, JavaScript and much more. Keep >> your >> > >> skills >> > >> >> >>> > current >> > >> >> >>> > >> > with LearnDevNow - 3,200 step-by-step video tutorials >> by >> > >> >> Microsoft >> > >> >> >>> > >> > MVPs and experts. ON SALE this month only -- learn more >> > at: >> > >> >> >>> > >> > http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >> > _______________________________________________ >> > >> >> >>> > >> > Pytables-users mailing list >> > >> >> >>> > >> > Pyt...@li... >> > >> >> >>> > >> > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> -------------- next part -------------- >> > >> >> >>> > >> An HTML attachment was scrubbed... >> > >> >> >>> > >> >> > >> >> >>> > >> ------------------------------ >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> > >> HTML5, >> > >> >> >>> CSS, >> > >> >> >>> > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your >> > >> skills >> > >> >> >>> current >> > >> >> >>> > >> with LearnDevNow - 3,200 step-by-step video tutorials by >> > >> >> Microsoft >> > >> >> >>> > >> MVPs and experts. ON SALE this month only -- learn more >> at: >> > >> >> >>> > >> http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >> >> > >> >> >>> > >> ------------------------------ >> > >> >> >>> > >> >> > >> >> >>> > >> _______________________________________________ >> > >> >> >>> > >> Pytables-users mailing list >> > >> >> >>> > >> Pyt...@li... >> > >> >> >>> > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> End of Pytables-users Digest, Vol 80, Issue 3 >> > >> >> >>> > >> ********************************************* >> > >> >> >>> > >> >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, >> > >> HTML5, >> > >> >> CSS, >> > >> >> >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep your >> > skills >> > >> >> >>> current >> > >> >> >>> > > with LearnDevNow - 3,200 step-by-step video tutorials by >> > >> Microsoft >> > >> >> >>> > > MVPs and experts. ON SALE this month only -- learn more >> at: >> > >> >> >>> > > http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > > _______________________________________________ >> > >> >> >>> > > Pytables-users mailing list >> > >> >> >>> > > Pyt...@li... >> > >> >> >>> > > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > -------------- next part -------------- >> > >> >> >>> > An HTML attachment was scrubbed... >> > >> >> >>> > >> > >> >> >>> > ------------------------------ >> > >> >> >>> > >> > >> >> >>> > Message: 2 >> > >> >> >>> > Date: Thu, 3 Jan 2013 17:30:59 -0600 >> > >> >> >>> > From: Anthony Scopatz <sc...@gm...> >> > >> >> >>> > Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, >> > >> Issue 4 >> > >> >> >>> > To: Discussion list for PyTables >> > >> >> >>> > <pyt...@li...> >> > >> >> >>> > Message-ID: >> > >> >> >>> > < >> > >> >> >>> > >> > >> CAP...@ma...> >> > >> >> >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >> > >> >> >>> > Josh is right that you can just edit the code by hand (which >> > >> works >> > >> >> but >> > >> >> >>> > sucks). >> > >> >> >>> > >> > >> >> >>> > However, on Windows -- on the rare occasion when I also >> have to >> > >> >> >>> develop on >> > >> >> >>> > it -- I typically use a distribution that includes a >> compiler, >> > >> >> cython, >> > >> >> >>> > hdf5, and pytables already and then I install my development >> > >> version >> > >> >> >>> from >> > >> >> >>> > github OVER this. I recommend either EPD or Anaconda, >> though >> > >> other >> > >> >> >>> > distributions listed here [1] might also work. >> > >> >> >>> > >> > >> >> >>> > Be well >> > >> >> >>> > Anthony >> > >> >> >>> > >> > >> >> >>> > 1. http://numfocus.org/projects-2/software-distributions/ >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers < >> > jos...@gm... >> > >> > >> > >> >> >>> wrote: >> > >> >> >>> > >> > >> >> >>> > > The change was in pure Python code, so you should be able >> to >> > >> just >> > >> >> >>> paste >> > >> >> >>> > in >> > >> >> >>> > > the changes to your local copy. Start with the >> > >> >> table.Column.__iter__ >> > >> >> >>> > > method (lines 3296-3310) here. >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py >> > >> >> >>> > > >> > >> >> >>> > > It needs to be modified slightly because it uses some >> > >> additional >> > >> >> >>> features >> > >> >> >>> > > that aren't available in the released version (the >> > >> out=buf_slice >> > >> >> >>> argument >> > >> >> >>> > > to table.read). The following should work. >> > >> >> >>> > > >> > >> >> >>> > > def __iter__(self): >> > >> >> >>> > > table = self.table >> > >> >> >>> > > itemsize = self.dtype.itemsize >> > >> >> >>> > > nrowsinbuf = >> table._v_file.params['IO_BUFFER_SIZE'] >> > // >> > >> >> >>> itemsize >> > >> >> >>> > > max_row = len(self) >> > >> >> >>> > > for start_row in xrange(0, len(self), nrowsinbuf): >> > >> >> >>> > > end_row = min([start_row + nrowsinbuf, >> max_row]) >> > >> >> >>> > > buf = table.read(start_row, end_row, 1, >> > >> >> >>> field=self.pathname) >> > >> >> >>> > > for row in buf: >> > >> >> >>> > > yield row >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > I haven't tested this, but I think it will work. >> > >> >> >>> > > >> > >> >> >>> > > Josh >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > On Thu, Jan 3, 2013 at 1:25 PM, David Reed < >> > >> >> dav...@gm...> >> > >> >> >>> > wrote: >> > >> >> >>> > > >> > >> >> >>> > >> I apologize if I'm starting to sound helpless, but I'm >> > forced >> > >> to >> > >> >> >>> work on >> > >> >> >>> > >> Windows 7 at work and have never had luck compiling >> python >> > >> source >> > >> >> >>> > >> successfully. I have had to rely on precompiled binaries >> > and >> > >> now >> > >> >> >>> its >> > >> >> >>> > >> biting me in the butt. >> > >> >> >>> > >> >> > >> >> >>> > >> Is there any quick fix I can do to improve this iteration >> > >> using >> > >> >> >>> v2.4.0? >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> On Thu, Jan 3, 2013 at 3:17 PM, < >> > >> >> >>> > >> pyt...@li...> wrote: >> > >> >> >>> > >> >> > >> >> >>> > >>> Send Pytables-users mailing list submissions to >> > >> >> >>> > >>> pyt...@li... >> > >> >> >>> > >>> >> > >> >> >>> > >>> To subscribe or unsubscribe via the World Wide Web, >> visit >> > >> >> >>> > >>> >> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> or, via email, send a message with subject or body >> 'help' >> > to >> > >> >> >>> > >>> pyt...@li... >> > >> >> >>> > >>> >> > >> >> >>> > >>> You can reach the person managing the list at >> > >> >> >>> > >>> pyt...@li... >> > >> >> >>> > >>> >> > >> >> >>> > >>> When replying, please edit your Subject line so it is >> more >> > >> >> specific >> > >> >> >>> > >>> than "Re: Contents of Pytables-users digest..." >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> Today's Topics: >> > >> >> >>> > >>> >> > >> >> >>> > >>> 1. Re: Pytables-users Digest, Vol 80, Issue 2 (David >> > Reed) >> > >> >> >>> > >>> 2. Re: Pytables-users Digest, Vol 80, Issue 3 (David >> > Reed) >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> >> > >> >> ---------------------------------------------------------------------- >> > >> >> >>> > >>> >> > >> >> >>> > >>> Message: 1 >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 13:44:29 -0500 >> > >> >> >>> > >>> From: David Reed <dav...@gm...> >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> > 80, >> > >> >> Issue >> > >> >> >>> 2 >> > >> >> >>> > >>> To: pyt...@li... >> > >> >> >>> > >>> Message-ID: >> > >> >> >>> > >>> <CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha= >> > >> >> >>> > >>> ev...@ma...> >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >>> >> > >> >> >>> > >>> Thanks Anthony, but unless Im missing something I don't >> > think >> > >> >> that >> > >> >> >>> > method >> > >> >> >>> > >>> will work since this will only be comparing the ith >> element >> > >> with >> > >> >> >>> ith+1 >> > >> >> >>> > >>> element. I still need 2 for loops right? >> > >> >> >>> > >>> >> > >> >> >>> > >>> Using itertools might speed things up though, I've never >> > used >> > >> >> them >> > >> >> >>> so I >> > >> >> >>> > >>> will give it a shot and let you know how it goes. Looks >> > >> like I >> > >> >> >>> need to >> > >> >> >>> > >>> download the latest release before I do that too. >> Thanks >> > for >> > >> >> the >> > >> >> >>> help. >> > >> >> >>> > >>> >> > >> >> >>> > >>> -Dave >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 12:12 PM, < >> > >> >> >>> > >>> pyt...@li...> wrote: >> > >> >> >>> > >>> >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, >> visit >> > >> >> >>> > >>> > >> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> > or, via email, send a message with subject or body >> 'help' >> > >> to >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > You can reach the person managing the list at >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > When replying, please edit your Subject line so it is >> > more >> > >> >> >>> specific >> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Today's Topics: >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > 1. Re: Nested Iteration of HDF5 using PyTables >> > (Anthony >> > >> >> >>> Scopatz) >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >> > >> >> >> > ---------------------------------------------------------------------- >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Message: 1 >> > >> >> >>> > >>> > Date: Thu, 3 Jan 2013 11:11:47 -0600 >> > >> >> >>> > >>> > From: Anthony Scopatz <sc...@gm...> >> > >> >> >>> > >>> > Subject: Re: [Pytables-users] Nested Iteration of HDF5 >> > >> using >> > >> >> >>> PyTables >> > >> >> >>> > >>> > To: Discussion list for PyTables >> > >> >> >>> > >>> > <pyt...@li...> >> > >> >> >>> > >>> > Message-ID: >> > >> >> >>> > >>> > <CAPk-6T5b= >> > >> >> >>> > >>> > >> 1EG...@ma... >> > > >> > >> >> >>> > >>> > Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > HI David, >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Tables and table column iteration have been overhauled >> > >> fairly >> > >> >> >>> > recently >> > >> >> >>> > >>> [1]. >> > >> >> >>> > >>> > So you might try creating two iterators, offset by >> one, >> > >> and >> > >> >> then >> > >> >> >>> > >>> doing the >> > >> >> >>> > >>> > comparison. I am hacking this out super quick so >> please >> > >> >> forgive >> > >> >> >>> me: >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > from itertools import izip >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > with tb.openFile(...) as f: >> > >> >> >>> > >>> > data = f.root.data >> > >> >> >>> > >>> > data_i = iter(data) >> > >> >> >>> > >>> > data_j = iter(data) >> > >> >> >>> > >>> > data_i.next() # throw the first value away >> > >> >> >>> > >>> > for i, j in izip(data_i, data_j): >> > >> >> >>> > >>> > compare(i, j) >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > You get the idea ;) >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > Be Well >> > >> >> >>> > >>> > Anthony >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > 1. https://github.com/PyTables/PyTables/issues/27 >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed < >> > >> >> >>> dav...@gm...> >> > >> >> >>> > >>> wrote: >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > > I was hoping someone could help me out here. >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > This is from a post I put up on StackOverflow, >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > I am have a fairly large dataset that I store in >> HDF5 >> > and >> > >> >> >>> access >> > >> >> >>> > >>> using >> > >> >> >>> > >>> > > PyTables. One operation I need to do on this dataset >> > are >> > >> >> >>> pairwise >> > >> >> >>> > >>> > > comparisons between each of the elements. This >> > requires 2 >> > >> >> >>> loops, >> > >> >> >>> > one >> > >> >> >>> > >>> to >> > >> >> >>> > >>> > > iterate over each element, and an inner loop to >> iterate >> > >> over >> > >> >> >>> every >> > >> >> >>> > >>> other >> > >> >> >>> > >>> > > element. This operation thus looks at N(N-1)/2 >> > >> comparisons. >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > For fairly small sets I found it to be faster to >> dump >> > the >> > >> >> >>> contents >> > >> >> >>> > >>> into a >> > >> >> >>> > >>> > > multdimensional numpy array and then do my >> iteration. I >> > >> run >> > >> >> >>> into >> > >> >> >>> > >>> problems >> > >> >> >>> > >>> > > with large sets because of memory issues and need to >> > >> access >> > >> >> >>> each >> > >> >> >>> > >>> element >> > >> >> >>> > >>> > of >> > >> >> >>> > >>> > > the dataset at run time. >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > Putting the elements into an array gives me about >> 600 >> > >> >> >>> comparisons >> > >> >> >>> > per >> > >> >> >>> > >>> > > second, while operating on hdf5 data itself gives me >> > >> about >> > >> >> 300 >> > >> >> >>> > >>> > comparisons >> > >> >> >>> > >>> > > per second. >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > Is there a way to speed this process up? >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > Example follows (this is not my real code, just an >> > >> example): >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > *Small Set*: >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> > >> >> >>> > >>> > > data = f.root.data >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > N_elements = len(data) >> > >> >> >>> > >>> > > elements = np.empty((N_irises, 1e5)) >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > for ii, d in enumerate(data): >> > >> >> >>> > >>> > > elements[ii] = data['element'] >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) for ii in >> > >> >> >>> xrange(N_elements): >> > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >>> > > D[ii, jj] = compare(elements[ii], >> elements[jj]) >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > *Large Set*: >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > with tb.openFile(h5_file, 'r') as f: >> > >> >> >>> > >>> > > data = f.root.data >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > N_elements = len(data) >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > D = np.empty((N_irises, N_irises)) >> > >> >> >>> > >>> > > for ii in xrange(N_elements): >> > >> >> >>> > >>> > > for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >>> > > D[ii, jj] = >> compare(data['element'][ii], >> > >> >> >>> > >>> > data['element'][jj]) >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >>> > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> > 2012, >> > >> >> >>> HTML5, >> > >> >> >>> > CSS, >> > >> >> >>> > >>> > > MVC, Windows 8 Apps, JavaScript and much more. Keep >> > your >> > >> >> skills >> > >> >> >>> > >>> current >> > >> >> >>> > >>> > > with LearnDevNow - 3,200 step-by-step video >> tutorials >> > by >> > >> >> >>> Microsoft >> > >> >> >>> > >>> > > MVPs and experts. ON SALE this month only -- learn >> more >> > >> at: >> > >> >> >>> > >>> > > http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >>> > > _______________________________________________ >> > >> >> >>> > >>> > > Pytables-users mailing list >> > >> >> >>> > >>> > > Pyt...@li... >> > >> >> >>> > >>> > > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > > >> > >> >> >>> > >>> > -------------- next part -------------- >> > >> >> >>> > >>> > An HTML attachment was scrubbed... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > ------------------------------ >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> >> > >> >> >>> > >> > >> >> >>> >> > >> >> >> > >> >> > >> ------------------------------------------------------------------------------ >> > >> >> >>> > >>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# >> 2012, >> > >> >> HTML5, >> > >> >> >>> CSS, >> > >> >> >>> > >>> > MVC, Windows 8 Apps, JavaScript and much more. Keep >> your >> > >> >> skills >> > >> >> >>> > current >> > >> >> >>> > >>> > with LearnDevNow - 3,200 step-by-step video tutorials >> by >> > >> >> >>> Microsoft >> > >> >> >>> > >>> > MVPs and experts. ON SALE this month only -- learn >> more >> > at: >> > >> >> >>> > >>> > http://p.sf.net/sfu/learnmore_122712 >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > ------------------------------ >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > _______________________________________________ >> > >> >> >>> > >>> > Pytables-users mailing list >> > >> >> >>> > >>> > Pyt...@li... >> > >> >> >>> > >>> > >> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > End of Pytables-users Digest, Vol 80, Issue 2 >> > >> >> >>> > >>> > ********************************************* >> > >> >> >>> > >>> > >> > >> >> >>> > >>> -------------- next part -------------- >> > >> >> >>> > >>> An HTML attachment was scrubbed... >> > >> >> >>> > >>> >> > >> >> >>> > >>> ------------------------------ >> > >> >> >>> > >>> >> > >> >> >>> > >>> Message: 2 >> > >> >> >>> > >>> Date: Thu, 3 Jan 2013 15:17:01 -0500 >> > >> >> >>> > >>> From: David Reed <dav...@gm...> >> > >> >> >>> > >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol >> > 80, >> > >> >> Issue >> > >> >> >>> 3 >> > >> >> >>> > >>> To: pyt...@li... >> > >> >> >>> > >>> Message-ID: >> > >> >> >>> > >>> < >> > >> >> >>> > >>> >> > >> >> >> CAM...@ma... >> > >> >> >>> > >> > >> >> >>> > >>> Content-Type: text/plain; charset="iso-8859-1" >> > >> >> >>> > >>> >> > >> >> >>> > >>> Thanks a lot for the help so far guys! >> > >> >> >>> > >>> >> > >> >> >>> > >>> Looking at itertools, I found what I believe to be the >> > >> perfect >> > >> >> >>> function >> > >> >> >>> > >>> for >> > >> >> >>> > >>> what I need, itertools.combinations. This appears to be >> a >> > >> valid >> > >> >> >>> > >>> replacement >> > >> >> >>> > >>> to the method proposed. >> > >> >> >>> > >>> >> > >> >> >>> > >>> There is a small problem that I didn't mention is that >> my >> > >> >> compare >> > >> >> >>> > >>> function >> > >> >> >>> > >>> actually takes as inputs 2 columns from the table. Like >> so: >> > >> >> >>> > >>> >> > >> >> >>> > >>> D = np.empty((N_irises, N_irises)) >> > >> >> >>> > >>> for ii in xrange(N_elements): >> > >> >> >>> > >>> for jj in xrange(ii+1, N_elements): >> > >> >> >>> > >>> D[ii, jj] = compare(data['element1'][ii], >> > >> >> >>> > >>> data['element1'][jj],data['element2'][ii], >> > >> >> >>> > >>> data['element2'][jj]) >> > >> >> >>> > >>> >> > >> >> >>> > >>> Is there an efficient way of using itertools with this >> > >> >> structure? >> > >> >> >>> > >>> >> > >> >> >>> > >>> >> > >> >> >>> > >>> On Thu, Jan 3, 2013 at 1:29 PM, < >> > >> >> >>> > >>> pyt...@li...> wrote: >> > >> >> >>> > >>> >> > >> >> >>> > >>> > Send Pytables-users mailing list submissions to >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > To subscribe or unsubscribe via the World Wide Web, >> visit >> > >> >> >>> > >>> > >> > >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >>> > >>> > or, via email, send a message with subject or body >> 'help' >> > >> to >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > You can reach the person managing the list at >> > >> >> >>> > >>> > pyt...@li... >> > >> >> >>> > >>> > >> > >> >> >>> > >>> > When replying, please edit your Subject line so it is >> > more >> > >> >> >>> specific >> > >> >> >>> > >>> > than "Re: Contents of Pytables-users digest..." >> > >> > > ... > > [Message clipped] > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |