Re: [Pytables-users] SegFault w. large(ish) DB

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hola Francesc,

  I appreciate your reply, thanks.
The code is embedded in other processes so I will extract the =20
relevant bits and make up a working model.

In the meantime, I understand the point you make about block row-=20
flushes. However the task in which this is taking place is designed =20
to accept real-time data which arrives row-at-a-time and I need to do =20=

multiple computations on parts of the database which includes the new =20=

row. Given that I might need to compute over a few thousand rows =20
(matrix outer-product for example), I'm not sure it is more time-=20
efficient to pull that data in on receipt of a single new row, given =20
that I don't know which of the 10-subgroups of the 3500+ groups it is =20=

to be a leaf of t has to be done when the new row arrives, than to =20
write a single row.

In any event, speed is not likely to be causing a SegFault. Are  you =20
suggesting that even after a row is flushed a pointer or whatever is =20
still retained (al la Ziph's law)?
Is so, is there a way of iterating over something else perhaps?

I'll get a working model up and post asap.

David

On 11/08/2007, at 6:25 PM, Francesc Altet wrote:

> Hello David,
>
> A Saturday 11 August 2007, David Worrall escrigu=E9:
>> Hello All,
>> I was glad to have found Pytables. Thanks to all involved. I hope
>> someone more experienced than I can offer some advice.
>> My Setup: on OSX 10.4.10,  Pytables 2.0,  python 2.4.
>>
>> I'm getting a Segmentation Fault.  I'll outline the structure FYI and
>> perhaps someone can suggest a better approach:
>> DB Structure: Parallel (3500+) group nodes off the root node, each
>> with 10 sub-group nodes from which hang data leaves defined with
>> Class (tables.IsDescription) as per the tutorial.
>>
>> The data (time ordered & multiplexed)  is read sequentially from a
>> flat file.The 3500+ nodes created on-the-fly and before any leaf data
>> is appended.
>> These nodes are created without error.
>>
>> The Seg. Fault occurs at some point in the leaf creation process. I'm
>> using
>> table =3D self.h5fileID.getNode() and then table.row['xxx']=3Ddata.
>> table.row.append() and table.flush() are executed after the addition
>> of each row.
>
> Despite that your description is pretty accurate, in order to easy the
> life of we, poor developers, it is always better if you can send a
> minimal script that can reproduce the issue.  That way we can focus
> quickly into the problem.
>
> Having said that, why are you saving just one row per table flush?
> This is very inneficient and will consume a lot of resources (not only
> memory and I/O but also CPU).  It is always useful to write rows by
> bunches and then, do a flush.  When doing this, it is also more
> efficient to use:
>
> row =3D table.row
> for i in xrange(1000):
>     row.append()
> table.flush()
>
> than:
>
> for i in xrange(1000):
>     table.row.append()
> table.flush()
>
> because in the latter a new Row object is instanciated on each
> iteration, but in the former only once (see tutorials).
>
> Cheers,
>
> --=20
>> 0,0<   Francesc Altet     http://www.carabos.com/
> V   V   C=E1rabos Coop. V.   Enjoy Data
>  "-"
>
>
> ----------------------------------------------------------------------=20=

> ---
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a =20
> browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

_________________________________________________
experimental polymedia:	www.avatar.com.au
Sonic Communications Research Group,
University of Canberra:	 www.canberra.edu.au/vc-forum/scrg
vip=3DVerbal Interactivity Project