Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 4

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

David,

You added a custom version of table.Column.__iter__, correct?  Could you
also include that along with the script to reproduce the error?

It seems like the problem may be in the 'nrowsinbuf' calculation - see
[1].  Each of your rows is 17 x 9600 = 163200 bytes.  If you're using the
default 1MB value for IO_BUFFER_SIZE, it should be reading in rows of 6
chunks.  Instead, it's reading the entire table.

[1]: https://github.com/PyTables/PyTables/blob/develop/tables/table.py#L3296

On Fri, Feb 1, 2013 at 1:50 PM, Anthony Scopatz <sc...@gm...> wrote:

>
>
> On Fri, Feb 1, 2013 at 3:27 PM, David Reed <dav...@gm...> wrote:
>
>> at the error:
>>
>> result = numpy.empty(shape=nrows, dtype=dtypeField)
>>
>> nrows = 4620 and dtypeField is ('bool', (17, 9600))
>>
>> I'm not sure what that means as a dtype, but thats what it is.
>>
>> Forgive me if I'm being totally naive, but I thought the whole point of
>> __iter__ with pyttables was to do iteration on the fly, so there is no
>> preallocation.
>>
>
> Nope you are not being naive at all.  That is the point.
>
>
>>  If you have any ideas on this I'm all ears.
>>
>
> If you could send a minimal script which reproduces this error, that would
> help a lot.
>
> Be Well
> Anthony
>
>
>>
>>
>>  Thanks again.
>>
>> Dave
>>
>>
>> On Fri, Feb 1, 2013 at 3:45 PM, <
>> pyt...@li...> wrote:
>>
>>> Send Pytables-users mailing list submissions to
>>>         pyt...@li...
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>         https://lists.sourceforge.net/lists/listinfo/pytables-users
>>> or, via email, send a message with subject or body 'help' to
>>>         pyt...@li...
>>>
>>> You can reach the person managing the list at
>>>         pyt...@li...
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of Pytables-users digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>    1. Re: Pytables-users Digest, Vol 81, Issue 2 (Anthony Scopatz)
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Fri, 1 Feb 2013 14:44:40 -0600
>>> From: Anthony Scopatz <sc...@gm...>
>>> Subject: Re: [Pytables-users] Pytables-users Digest, Vol 81, Issue 2
>>> To: Discussion list for PyTables
>>>         <pyt...@li...>
>>> Message-ID:
>>>         <
>>> CAP...@ma...>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> On Fri, Feb 1, 2013 at 12:43 PM, David Reed <dav...@gm...>
>>> wrote:
>>>
>>> > Hi Anthony,
>>> >
>>> > Thanks for the reply.
>>> >
>>> > I honestly don't know how to monitor my Python memory usage, but I'm
>>> sure
>>> > that its caused by out of memory.
>>> >
>>>
>>> Well, I would just run top or process monitor or something while running
>>> the python script to see what happens to memory usage as the script chugs
>>> along...
>>>
>>>
>>> >  I'm just trying to find out how to fix it.  My HDF5 table has 4620
>>> rows
>>> > and the column I'm iterating over is a 17x9600 boolean matrix.  The
>>> > __iter__ method is preallocating an array that is this size which
>>> appears
>>> > to be root of the error.  I was hoping there is a fix somewhere in
>>> here to
>>> > not have to do this preallocation.
>>> >
>>>
>>> So a 17x9600 boolean matrix should only be 0.155 MB in space.  4620 of
>>> these is ~760 MB.  If you have 2 GB of memory and you are iterating over
>>> 2
>>> of these (templates & masks) it is conceivable that you are just running
>>> out of memory.  Maybe there is a way that __iter__ could not preallocate
>>> something that is basically a temporary.  What is the dtype of the
>>> templates array?
>>>
>>> Be Well
>>> Anthony
>>>
>>>
>>> >
>>> > Thanks again.
>>>
>>>