Re: [Pytables-users] Problem flushing table with pytables 1.2

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Alain,

I know what is happening with your code. The problem is that, for each
row in a table, you referenced a table and then unreference it. This
is a bad policy in terms of efficiency, specially with the advent of
PyTables 1.2, that cleans the buffers for unbound (unreferenced)
tables. I think this cleaning is a good thing to happen, in order to
save memory when you deal with a large number of tables in the same
PyTables session.

A quick-and-dirty workaround is to avoid the table unbounding by
assigning the table to a slot in Entity1:

=2D-- hdf_lib.py  2005-12-09 12:02:20.766303965 +0100
+++ hdf_lib.py.modif    2005-12-09 12:02:00.645095912 +0100
@@ -75,7 +75,7 @@
=20
##-------------------------------------------------------------------------=
=2D----
=20
##-------------------------------------------------------------------------=
=2D----
 class Entity1(object):
=2D    __slots__ =3D ('_HDFId',
+    __slots__ =3D ('_HDFId', 'table'
         )

=20
##-------------------------------------------------------------------------=
=2D----
@@ -90,6 +90,7 @@
         entity_g._v_attrs.lastHDFId +=3D 1
         self._HDFId =3D entity_g._v_attrs.lastHDFId
         table =3D entity_g.att
+        self.table =3D table
         an_entity =3D table.row
         an_entity['HDFId'] =3D self._HDFId
         an_entity['value'] =3D att

In that way, the table never gets unbounded, and the flushing for the
buffers does not take place after each Entity1 creation.

However, the most elegant solution would be to keep all the tables
that you are going to fill simultaneously bounded in a list or a
dictionary, and loop over them. If you do this, you will notice much
better performance in your application, both with PyTables 1.1.1 and
1.2+patch.

Cheers,

A Dijous 08 Desembre 2005 18:05, v=E0reu escriure:
> Hi Francesc,
>
> //-----------------------------------------------------------------------=
=2D-
>---------------------------------------------------------- I hardly
> succeeded to reproduce the problem on the smallest program I was able to
> create. Our current software architecture is quite complex ;-)
>
> My computer configuration is:
> - Windows XP
> - python 2.4.1
> - HDF5 1.6.5
> - pyTables 1.2
> - numarray 1.4.1
>
> The problem didn't occur in previous versions of pyTables and associated
> HDF5 libraries
>
> //-----------------------------------------------------------------------=
=2D-
>---------------------------------------------------------- This is the
> result:
>  - test_pytables_1_2.py : the main program
>  - hdf_lib : a library handling
>         +  the hdf5 file through HDF_dataset class
>         + row creation through the Entity1 class
>   - Table.py : with the patch you gave me.
>
> It seems to me that the problem is due to the fact that the hdf5 file is
> handled through the hdf_lib:HDF_dataset class.
>
> In Table.py line 1861 I added a print after your patch to visualize when
> the method is called: def _g_cleanIOBuf(self):
>         """Clean the I/O buffers."""
>         # Flush the buffers before to clean-up them
>         self.flush()
>         print "flushing in pyTables"
>
> In test_pytables_1_2 line 18 you can change the value of the xrange
> (currently 10 rows are created) for i in xrange(10):
>         value =3D "entity Number "+str(i)
>         E1 =3D Entity1(value)
> The creation of Entity 1 instance (hdf_lib.Entity1.__init__) simply creat=
es
> the row and append it in the table without flush.
>
> However with this architecture, the _g_cleanIOBuf() seems to be called at
> each Entity1 creation. Hereafter the output on the console:
>
> Creation of Dataset
> flushing in pyTables
> Creation of Dataset done
> -------------
> Creating rows
> In hdf_lib.Entity1.__init__ : att =3D entity Number 0
> End of hdf_lib.Entity1.__init__
> flushing in pyTables
>        . . .
> In hdf_lib.Entity1.__init__ : att =3D entity Number 9
> End of hdf_lib.Entity1.__init__
> flushing in pyTables
>
> Ellapsed_time 0.145311485119
> In hdf_lib.Entity1.flush
> flushing in pyTables
> Creating rows done
> -------------
> Closing hdf5 file
> Closing hdf5 file done
>
> Thanks for your support.
>
> Best regards,
>
>         Alain Fagot
>
> ----- Original Message -----
> From: "Francesc Altet" <fa...@ca...>
> To: "Alain Fagot" <ala...@fr...>
> Cc: <pyt...@li...>
> Sent: Wednesday, December 07, 2005 7:29 PM
> Subject: Re: [Pytables-users] Problem flushing table with pytables 1.2
>
> A Dimecres 30 Novembre 2005 11:41, Alain Fagot va escriure:
> > unfortunatly due do software architecture I can not apply first method.
> > User can create row from highlevel API and then decide to flush after a
> > number of row created.
> >
> > I so applied the patch you proposed, software runs ok now, but is slown
> > down.
> > It seems that the patch effect is that flush is automatically done after
> > each row creation (or something like this).
> > Ex. Create 100 000 rows and flush each 10 000.
> >  - runs in 50 seconds with pytables 1.1.1
> >  - runs in 1055 seconds with patch in pytables 1.2
>
> Mmmm, I don't quite understand why this is happening. Could you please
> send me a small example showing this slowdown in PyTables 1.2+patch?
> I'd like to investigate more on this.
>
> Cheers,

=2D-=20
>0,0<   Francesc Altet =A0 =A0 http://www.carabos.com/
V   V   C=E1rabos Coop. V. =A0=A0Enjoy Data
 "-"