From: Josh A. <jos...@gm...> - 2013-02-01 22:08:53
|
David, You added a custom version of table.Column.__iter__, correct? Could you also include that along with the script to reproduce the error? It seems like the problem may be in the 'nrowsinbuf' calculation - see [1]. Each of your rows is 17 x 9600 = 163200 bytes. If you're using the default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6 chunks. Instead, it's reading the entire table. [1]: https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296 On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> wrote: > > > On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> wrote: > >> at the error: >> >> result = numpy.empty(shape=nrows, dtype=dtypeField) >> >> nrows = 4620 and dtypeField is ('bool', (17, 9600)) >> >> I'm not sure what that means as a dtype, but thats what it is. >> >> Forgive me if I'm being totally naive, but I thought the whole point of >> __iter__ with pyttables was to do iteration on the fly, so there is no >> preallocation. >> > > Nope you are not being naive at all. That is the point. > > >> If you have any ideas on this I'm all ears. >> > > If you could send a minimal script which reproduces this error, that would > help a lot. > > Be Well > Anthony > > >> >> >> Thanks again. >> >> Dave >> >> >> On Fri, Feb 1, 2013 at 3:45 PM, < >> pyt...@li...> wrote: >> >>> Send Pytables-users mailing list submissions to >>> pyt...@li... >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> or, via email, send a message with subject or body 'help' to >>> pyt...@li... >>> >>> You can reach the person managing the list at >>> pyt...@li... >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of Pytables-users digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Fri, 1 Feb 2013 14:44:40 -0600 >>> From: Anthony Scopatz <sc...@gm...> >>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2 >>> To: Discussion list for PyTables >>> <pyt...@li...> >>> Message-ID: >>> < >>> CAP...@ma...> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...> >>> wrote: >>> >>> > Hi Anthony, >>> > >>> > Thanks for the reply. >>> > >>> > I honestly don't know how to monitor my Python memory usage, but I'm >>> sure >>> > that its caused by out of memory. >>> > >>> >>> Well, I would just run top or process monitor or something while running >>> the python script to see what happens to memory usage as the script chugs >>> along... >>> >>> >>> > I'm just trying to find out how to fix it. My HDF5 table has 4620 >>> rows >>> > and the column I'm iterating over is a 17x9600 boolean matrix. The >>> > __iter__ method is preallocating an array that is this size which >>> appears >>> > to be root of the error. I was hoping there is a fix somewhere in >>> here to >>> > not have to do this preallocation. >>> > >>> >>> So a 17x9600 boolean matrix should only be 0.155 MB in space. 4620 of >>> these is ~760 MB. If you have 2 GB of memory and you are iterating over >>> 2 >>> of these (templates & masks) it is conceivable that you are just running >>> out of memory. Maybe there is a way that __iter__ could not preallocate >>> something that is basically a temporary. What is the dtype of the >>> templates array? >>> >>> Be Well >>> Anthony >>> >>> >>> > >>> > Thanks again. >>> >>> |