From: Alain F. <ala...@fr...> - 2005-11-29 19:01:18
|
"""Small but quite comprehensive example showing the use of PyTables.=0A= =0A= The program creates an output file, 'tutorial1.h5'. You can view it=0A= with any HDF5 generic utility.=0A= =0A= """=0A= =0A= =0A= import sys=0A= from numarray import *=0A= from tables import *=0A= =0A= # Define a user record to characterize some kind of particles=0A= class Particle(IsDescription):=0A= name =3D StringCol(16) # 16-character String=0A= idnumber =3D Int64Col() # Signed 64-bit integer=0A= ADCcount =3D UInt16Col() # Unsigned short integer=0A= TDCcount =3D UInt8Col() # unsigned byte=0A= grid_i =3D Int32Col() # integer=0A= grid_j =3D IntCol() # integer (equivalent to Int32Col)=0A= pressure =3D Float32Col() # float (single-precision)=0A= energy =3D FloatCol() # double (double-precision)=0A= =0A= def create_file():=0A= =0A= print=0A= print '-**-**-**-**-**-**- file creation -**-**-**-**-**-**-**-'=0A= =0A= # The name of our HDF5 filename=0A= filename =3D "tutorial1.h5"=0A= =0A= print "Creating file:", filename=0A= =0A= # Open a file in "w"rite mode=0A= h5file =3D openFile(filename, mode =3D "w", title =3D "Test file")=0A= =0A= print=0A= print '-**-**-**-**-**- group and table creation = -**-**-**-**-**-**-**-'=0A= =0A= # Create a new group under "/" (root)=0A= group =3D h5file.createGroup("/", 'detector', 'Detector information')=0A= print "Group '/detector' created"=0A= =0A= # Create one table on it=0A= table =3D h5file.createTable(group, 'readout', Particle, "Readout = example")=0A= print "Table '/detector/readout' created"=0A= =0A= return h5file=0A= =0A= def fill_10(h5file):=0A= for group in h5file.walkGroups("/detector"):=0A= entity_g =3D group=0A= =0A= table =3D entity_g.readout=0A= # Get a shortcut to the record object in table=0A= particle =3D table.row=0A= =0A= # Fill the table with 10 particles=0A= for i in xrange(10):=0A= particle['name'] =3D 'Particle: %6d' % (i)=0A= particle['TDCcount'] =3D i % 256=0A= particle['ADCcount'] =3D (i * 256) % (1 << 16)=0A= particle['grid_i'] =3D i=0A= particle['grid_j'] =3D 10 - i=0A= particle['pressure'] =3D float(i*i)=0A= particle['energy'] =3D float(particle['pressure'] ** 4)=0A= particle['idnumber'] =3D i * (2 ** 34)=0A= # Insert a new particle record=0A= particle.append()=0A= =0A= =0A= def _unittest():=0A= h5file =3D create_file()=0A= =0A= fill_10(h5file)=0A= =0A= # Flush the buffers for table=0A= for group in h5file.walkGroups("/detector"):=0A= entity_g =3D group=0A= =0A= table =3D entity_g.readout=0A= table.flush()# Close the file=0A= h5file.close()=0A= print "File '"+filename+"' created"=0A= =0A= if __name__ =3D=3D '__main__':=0A= _unittest()=0A= =0A= |
From: Waldemar O. <wal...@gm...> - 2005-11-29 19:30:14
Attachments:
test_tables.py
|
I have very similar problem. After the upgrade my program started failing during the flush() operation with almost identical traceback as Alain's. Test suite passes OK: -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D PyTables version: 1.2 HDF5 version: 1.6.5 numarray version: 1.4.1 Zlib version: 1.2.3 Python version: 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] Byte-ordering: little -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D Performing the complete test suite! -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D Numeric (version 24.2) is present. Adding the Numeric test suite. Scientific.IO.NetCDF not found. Skipping HDF5 <--> NetCDF conversion tests. -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D But attached program fails with: C:\WINDOWS\system32\cmd.exe /c test_tables.py Traceback (most recent call last): File "E:\work\BatchJobs\regii\test_tables.py", line 98, in ? main() File "E:\work\BatchJobs\regii\test_tables.py", line 94, in main walkDB(fileh, grp, 'db') File "E:\work\BatchJobs\regii\test_tables.py", line 86, in walkDB fileh.flush() File "C:\python\Python24\Lib\site-packages\tables\File.py", line 1843, in flush leaf.flush() File "C:\python\Python24\Lib\site-packages\tables\Table.py", line 1880, in flush self._saveBufferedRows() File "C:\python\Python24\Lib\site-packages\tables\Table.py", line 1413, in _saveBuffered Rows self._open_append(self._v_wbuffer) File "TableExtension.pyx", line 361, in TableExtension.Table._open_append AttributeError: 'NoneType' object has no attribute '_data' shell returned 1 |
From: Francesc A. <fa...@ca...> - 2005-11-29 20:02:49
|
Hello, Yes, I was aware of this problem shortly after releasing 1.2 :-( =46ortunately there is an easy workaround, which is flushing the table immediately after ending the append loop: for i range(...): .... row.append() table.flush() # Add this after append loops Alternatively, you can apply the next patch: =2D-- pytables-1.2/tables/Table.py 2005-11-07 17:30:41.000000000 +010 0 +++ tables/Table.py 2005-11-29 20:47:42.357539540 +0100 @@ -1861,6 +1861,8 @@ def _g_cleanIOBuf(self): """Clean the I/O buffers.""" + # Flush the buffers before to clean-up them + self.flush() if 'row' in self.__dict__: # Decrement pointers to I/O buffers in Row instance self.row._cleanIOBuf() Cheers, A Dimarts 29 Novembre 2005 20:00, Alain Fagot va escriure: > Hello, > I switched from pytables1.1.1 to pytables 1.2 and have a problem when > flushing tables. I tryed to reproduce the problem I have on my application > to tutorial1-1.py. (I have attached the modified tutorial1-1.py) > > I created two separate functions: > - create_file : which create an hdf5 file and create "detector" group and > "readout" table - fill_10 : which put data in "readout" table > - _unittest : which call the two previous ones and try to flush the > "readout" table and close the hdf5 file > > When running I have the following error: > Traceback (most recent call last): > > File "tutorial1-1.py", line 87, in ? > > _unittest() > > File "tutorial1-1.py", line 82, in _unittest > > table.flush()# Close the file > > File "C:\Logiciels\Devt\Python241\Lib\site-packages\tables\Table.py", line > 1880, in flush > > self._saveBufferedRows() > > File "C:\Logiciels\Devt\Python241\Lib\site-packages\tables\Table.py", line > 1413, in _saveBufferedRows > > self._open_append(self._v_wbuffer) > > File "TableExtension.pyx", line 361, in TableExtension.Table._open_append > > AttributeError: 'NoneType' object has no attribute '_data' =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
From: Waldemar O. <wal...@gm...> - 2005-11-30 04:05:23
|
Thanks for the patch. It fixed the error. Waldemar |
From: Alain F. <ala...@fr...> - 2005-11-30 10:41:10
|
Hi Francesc, thanks for your quick answer. unfortunatly due do software architecture I can not apply first method. User can create row from highlevel API and then decide to flush after a=20 number of row created. I so applied the patch you proposed, software runs ok now, but is slown=20 down. It seems that the patch effect is that flush is automatically done after=20 each row creation (or something like this). Ex. Create 100 000 rows and flush each 10 000. - runs in 50 seconds with pytables 1.1.1 - runs in 1055 seconds with patch in pytables 1.2 For the time being I switch back to pytables 1.1.1, and will follow up th= e=20 evolution of pytables 1.2. Thanks for your help, Best Regards ----- Original Message -----=20 From: "Francesc Altet" <fa...@ca...> To: <pyt...@li...> Cc: "Alain Fagot" <ala...@fr...>; "Waldemar Osuch"=20 <wal...@gm...> Sent: Tuesday, November 29, 2005 8:55 PM Subject: Re: [Pytables-users] Problem flushing table with pytables 1.2 Hello, Yes, I was aware of this problem shortly after releasing 1.2 :-( Fortunately there is an easy workaround, which is flushing the table immediately after ending the append loop: for i range(...): .... row.append() table.flush() # Add this after append loops Alternatively, you can apply the next patch: --- pytables-1.2/tables/Table.py 2005-11-07 17:30:41.000000000 +010 0 +++ tables/Table.py 2005-11-29 20:47:42.357539540 +0100 @@ -1861,6 +1861,8 @@ def _g_cleanIOBuf(self): """Clean the I/O buffers.""" + # Flush the buffers before to clean-up them + self.flush() if 'row' in self.__dict__: # Decrement pointers to I/O buffers in Row instance self.row._cleanIOBuf() Cheers, A Dimarts 29 Novembre 2005 20:00, Alain Fagot va escriure: > Hello, > I switched from pytables1.1.1 to pytables 1.2 and have a problem when > flushing tables. I tryed to reproduce the problem I have on my applicat= ion > to tutorial1-1.py. (I have attached the modified tutorial1-1.py) > > I created two separate functions: > - create_file : which create an hdf5 file and create "detector" group = and > "readout" table - fill_10 : which put data in "readout" table > - _unittest : which call the two previous ones and try to flush the > "readout" table and close the hdf5 file > > When running I have the following error: > Traceback (most recent call last): > > File "tutorial1-1.py", line 87, in ? > > _unittest() > > File "tutorial1-1.py", line 82, in _unittest > > table.flush()# Close the file > > File "C:\Logiciels\Devt\Python241\Lib\site-packages\tables\Table.py", l= ine > 1880, in flush > > self._saveBufferedRows() > > File "C:\Logiciels\Devt\Python241\Lib\site-packages\tables\Table.py", l= ine > 1413, in _saveBufferedRows > > self._open_append(self._v_wbuffer) > > File "TableExtension.pyx", line 361, in TableExtension.Table._open_appe= nd > > AttributeError: 'NoneType' object has no attribute '_data' --=20 >0,0< Francesc Altet http://www.carabos.com/ V V C=E1rabos Coop. V. Enjoy Data "-" |
From: Francesc A. <fa...@ca...> - 2005-12-07 18:29:58
|
A Dimecres 30 Novembre 2005 11:41, Alain Fagot va escriure: > unfortunatly due do software architecture I can not apply first method. > User can create row from highlevel API and then decide to flush after a > number of row created. > > I so applied the patch you proposed, software runs ok now, but is slown > down. > It seems that the patch effect is that flush is automatically done after > each row creation (or something like this). > Ex. Create 100 000 rows and flush each 10 000. > - runs in 50 seconds with pytables 1.1.1 > - runs in 1055 seconds with patch in pytables 1.2 Mmmm, I don't quite understand why this is happening. Could you please send me a small example showing this slowdown in PyTables 1.2+patch? I'd like to investigate more on this. Cheers, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
From: Francesc A. <fa...@ca...> - 2005-12-09 11:16:11
|
Hi Alain, I know what is happening with your code. The problem is that, for each row in a table, you referenced a table and then unreference it. This is a bad policy in terms of efficiency, specially with the advent of PyTables 1.2, that cleans the buffers for unbound (unreferenced) tables. I think this cleaning is a good thing to happen, in order to save memory when you deal with a large number of tables in the same PyTables session. A quick-and-dirty workaround is to avoid the table unbounding by assigning the table to a slot in Entity1: =2D-- hdf_lib.py 2005-12-09 12:02:20.766303965 +0100 +++ hdf_lib.py.modif 2005-12-09 12:02:00.645095912 +0100 @@ -75,7 +75,7 @@ =20 ##-------------------------------------------------------------------------= =2D---- =20 ##-------------------------------------------------------------------------= =2D---- class Entity1(object): =2D __slots__ =3D ('_HDFId', + __slots__ =3D ('_HDFId', 'table' ) =20 ##-------------------------------------------------------------------------= =2D---- @@ -90,6 +90,7 @@ entity_g._v_attrs.lastHDFId +=3D 1 self._HDFId =3D entity_g._v_attrs.lastHDFId table =3D entity_g.att + self.table =3D table an_entity =3D table.row an_entity['HDFId'] =3D self._HDFId an_entity['value'] =3D att In that way, the table never gets unbounded, and the flushing for the buffers does not take place after each Entity1 creation. However, the most elegant solution would be to keep all the tables that you are going to fill simultaneously bounded in a list or a dictionary, and loop over them. If you do this, you will notice much better performance in your application, both with PyTables 1.1.1 and 1.2+patch. Cheers, A Dijous 08 Desembre 2005 18:05, v=E0reu escriure: > Hi Francesc, > > //-----------------------------------------------------------------------= =2D- >---------------------------------------------------------- I hardly > succeeded to reproduce the problem on the smallest program I was able to > create. Our current software architecture is quite complex ;-) > > My computer configuration is: > - Windows XP > - python 2.4.1 > - HDF5 1.6.5 > - pyTables 1.2 > - numarray 1.4.1 > > The problem didn't occur in previous versions of pyTables and associated > HDF5 libraries > > //-----------------------------------------------------------------------= =2D- >---------------------------------------------------------- This is the > result: > - test_pytables_1_2.py : the main program > - hdf_lib : a library handling > + the hdf5 file through HDF_dataset class > + row creation through the Entity1 class > - Table.py : with the patch you gave me. > > It seems to me that the problem is due to the fact that the hdf5 file is > handled through the hdf_lib:HDF_dataset class. > > In Table.py line 1861 I added a print after your patch to visualize when > the method is called: def _g_cleanIOBuf(self): > """Clean the I/O buffers.""" > # Flush the buffers before to clean-up them > self.flush() > print "flushing in pyTables" > > In test_pytables_1_2 line 18 you can change the value of the xrange > (currently 10 rows are created) for i in xrange(10): > value =3D "entity Number "+str(i) > E1 =3D Entity1(value) > The creation of Entity 1 instance (hdf_lib.Entity1.__init__) simply creat= es > the row and append it in the table without flush. > > However with this architecture, the _g_cleanIOBuf() seems to be called at > each Entity1 creation. Hereafter the output on the console: > > Creation of Dataset > flushing in pyTables > Creation of Dataset done > ------------- > Creating rows > In hdf_lib.Entity1.__init__ : att =3D entity Number 0 > End of hdf_lib.Entity1.__init__ > flushing in pyTables > . . . > In hdf_lib.Entity1.__init__ : att =3D entity Number 9 > End of hdf_lib.Entity1.__init__ > flushing in pyTables > > Ellapsed_time 0.145311485119 > In hdf_lib.Entity1.flush > flushing in pyTables > Creating rows done > ------------- > Closing hdf5 file > Closing hdf5 file done > > Thanks for your support. > > Best regards, > > Alain Fagot > > ----- Original Message ----- > From: "Francesc Altet" <fa...@ca...> > To: "Alain Fagot" <ala...@fr...> > Cc: <pyt...@li...> > Sent: Wednesday, December 07, 2005 7:29 PM > Subject: Re: [Pytables-users] Problem flushing table with pytables 1.2 > > A Dimecres 30 Novembre 2005 11:41, Alain Fagot va escriure: > > unfortunatly due do software architecture I can not apply first method. > > User can create row from highlevel API and then decide to flush after a > > number of row created. > > > > I so applied the patch you proposed, software runs ok now, but is slown > > down. > > It seems that the patch effect is that flush is automatically done after > > each row creation (or something like this). > > Ex. Create 100 000 rows and flush each 10 000. > > - runs in 50 seconds with pytables 1.1.1 > > - runs in 1055 seconds with patch in pytables 1.2 > > Mmmm, I don't quite understand why this is happening. Could you please > send me a small example showing this slowdown in PyTables 1.2+patch? > I'd like to investigate more on this. > > Cheers, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
From: Alain F. <ala...@fr...> - 2005-12-09 13:54:16
Attachments:
hdf_lib.py
test_pytables_1_2.py
|
Hi Francesc, It works again perfectly now following your recommendations. 100 000 rows created in one minute. I put a dictionary of tables as a slot in HDF_dataset class. I attached the corrected source code. Thanks a lot for your help, Kind Regards, Alain Fagot ----- Original Message -----=20 From: "Francesc Altet" <fa...@ca...> To: "Alain Fagot" <ala...@fr...> Cc: <pyt...@li...> Sent: Friday, December 09, 2005 12:15 PM Subject: Re: [Pytables-users] Problem flushing table with pytables 1.2 Hi Alain, I know what is happening with your code. The problem is that, for each row in a table, you referenced a table and then unreference it. This is a bad policy in terms of efficiency, specially with the advent of PyTables 1.2, that cleans the buffers for unbound (unreferenced) tables. I think this cleaning is a good thing to happen, in order to save memory when you deal with a large number of tables in the same PyTables session. A quick-and-dirty workaround is to avoid the table unbounding by assigning the table to a slot in Entity1: --- hdf_lib.py 2005-12-09 12:02:20.766303965 +0100 +++ hdf_lib.py.modif 2005-12-09 12:02:00.645095912 +0100 @@ -75,7 +75,7 @@ ##-----------------------------------------------------------------------= ------- ##-----------------------------------------------------------------------= ------- class Entity1(object): - __slots__ =3D ('_HDFId', + __slots__ =3D ('_HDFId', 'table' ) ##-----------------------------------------------------------------------= ------- @@ -90,6 +90,7 @@ entity_g._v_attrs.lastHDFId +=3D 1 self._HDFId =3D entity_g._v_attrs.lastHDFId table =3D entity_g.att + self.table =3D table an_entity =3D table.row an_entity['HDFId'] =3D self._HDFId an_entity['value'] =3D att In that way, the table never gets unbounded, and the flushing for the buffers does not take place after each Entity1 creation. However, the most elegant solution would be to keep all the tables that you are going to fill simultaneously bounded in a list or a dictionary, and loop over them. If you do this, you will notice much better performance in your application, both with PyTables 1.1.1 and 1.2+patch. Cheers, A Dijous 08 Desembre 2005 18:05, v=E0reu escriure: > Hi Francesc, > > //---------------------------------------------------------------------= ---- >---------------------------------------------------------- I hardly > succeeded to reproduce the problem on the smallest program I was able t= o > create. Our current software architecture is quite complex ;-) > > My computer configuration is: > - Windows XP > - python 2.4.1 > - HDF5 1.6.5 > - pyTables 1.2 > - numarray 1.4.1 > > The problem didn't occur in previous versions of pyTables and associate= d > HDF5 libraries > > //---------------------------------------------------------------------= ---- >---------------------------------------------------------- This is the > result: > - test_pytables_1_2.py : the main program > - hdf_lib : a library handling > + the hdf5 file through HDF_dataset class > + row creation through the Entity1 class > - Table.py : with the patch you gave me. > > It seems to me that the problem is due to the fact that the hdf5 file i= s > handled through the hdf_lib:HDF_dataset class. > > In Table.py line 1861 I added a print after your patch to visualize whe= n > the method is called: def _g_cleanIOBuf(self): > """Clean the I/O buffers.""" > # Flush the buffers before to clean-up them > self.flush() > print "flushing in pyTables" > > In test_pytables_1_2 line 18 you can change the value of the xrange > (currently 10 rows are created) for i in xrange(10): > value =3D "entity Number "+str(i) > E1 =3D Entity1(value) > The creation of Entity 1 instance (hdf_lib.Entity1.__init__) simply=20 > creates > the row and append it in the table without flush. > > However with this architecture, the _g_cleanIOBuf() seems to be called = at > each Entity1 creation. Hereafter the output on the console: > > Creation of Dataset > flushing in pyTables > Creation of Dataset done > ------------- > Creating rows > In hdf_lib.Entity1.__init__ : att =3D entity Number 0 > End of hdf_lib.Entity1.__init__ > flushing in pyTables > . . . > In hdf_lib.Entity1.__init__ : att =3D entity Number 9 > End of hdf_lib.Entity1.__init__ > flushing in pyTables > > Ellapsed_time 0.145311485119 > In hdf_lib.Entity1.flush > flushing in pyTables > Creating rows done > ------------- > Closing hdf5 file > Closing hdf5 file done > > Thanks for your support. > > Best regards, > > Alain Fagot > > ----- Original Message ----- > From: "Francesc Altet" <fa...@ca...> > To: "Alain Fagot" <ala...@fr...> > Cc: <pyt...@li...> > Sent: Wednesday, December 07, 2005 7:29 PM > Subject: Re: [Pytables-users] Problem flushing table with pytables 1.2 > > A Dimecres 30 Novembre 2005 11:41, Alain Fagot va escriure: > > unfortunatly due do software architecture I can not apply first metho= d. > > User can create row from highlevel API and then decide to flush after= a > > number of row created. > > > > I so applied the patch you proposed, software runs ok now, but is slo= wn > > down. > > It seems that the patch effect is that flush is automatically done af= ter > > each row creation (or something like this). > > Ex. Create 100 000 rows and flush each 10 000. > > - runs in 50 seconds with pytables 1.1.1 > > - runs in 1055 seconds with patch in pytables 1.2 > > Mmmm, I don't quite understand why this is happening. Could you please > send me a small example showing this slowdown in PyTables 1.2+patch? > I'd like to investigate more on this. > > Cheers, --=20 >0,0< Francesc Altet http://www.carabos.com/ V V C=E1rabos Coop. V. Enjoy Data "-" |