From: Francesc A. <fa...@ca...> - 2005-09-02 14:58:33
|
Hi Francesco, This problem is related with slowness of element-by-element assignment in numarray objects. If you want to achieve big performance for writing PyTables, it is better that you use the Table.append method (instead of Row.append). I normally use the next code: def fill_arrays(self, start, stop): "Some generic filling function" arr_f8 =3D numarray.arange(start, stop, type=3Dnumarray.Float64) arr_i4 =3D numarray.arange(start, stop, type=3Dnumarray.Int32) if self.userandom: arr_f8 +=3D random_array.normal(0, stop*self.scale, shape=3D[stop-start]) arr_i4 =3D numarray.array(arr_f8, type=3Dnumarray.Int32) return arr_i4, arr_f8 def fill_table(self, con): "Fills the table" table =3D con.root.table j =3D 0 for i in xrange(0, self.nrows, self.step): stop =3D (j+1)*self.step if stop > self.nrows: stop =3D self.nrows arr_i4, arr_f8 =3D self.fill_arrays(i, stop) recarr =3D records.fromarrays([arr_i4, arr_f8]) table.append(recarr) j +=3D 1 table.flush() in order to fill a table with two columns (Int32 and Float32). If you try this, I'm sure you will get much better results. Cheers, El dv 02 de 09 del 2005 a les 16:09 +0200, en/na Francesco Del Degan va escriure: > Hi, i have an issue with pytables performance: >=20 > This is my python code for testing: >=20 > ---SNIP--- > from tables import * >=20 > class PytTest(IsDescription): > string =3D Col('CharType', 16) > id =3D Col('Int32', 1) > float =3D Col('Float64', 1) >=20 > h5file =3D openFile('probe.h5','a') >=20 > try: > testGroup =3D h5file.root.testGroup > except NoSuchNodeError: > testGroup =3D h5file.createGroup( > "/", "testGroup", "Test Group") > try: > tbTest =3D testGroup.test > except NoSuchNodeError: > tbTest =3D h5file.createTable( > testGroup, > 'test', > PytTest, > 'Test table') >=20 > import time >=20 > maxRows =3D 10**6 >=20 > ### TEST1 ### > startTime =3D time.time() > row =3D tbTest.row > for i in range(0, maxRows): > row['string'] =3D '1234567890123456' > row['id'] =3D 1 > row['float'] =3D 1.0/3.0 > row.append() > tbTest.flush() > diffTime =3D time.time()-startTime > print 'test1: %d rows in %s seconds (%s/s)' % (maxRows,diffTime, > maxRows/diffTime) >=20 > ### TEST2 ### > startTime =3D time.time() > row =3D tbTest.row > for i in range(0, maxRows): > row['string'] =3D '1234567890123456' > row['id'] =3D 1 > row['float'] =3D 1.0/3.0 > diffTime =3D time.time()-startTime > print 'test2: %d rows in %s seconds (%s/s)' % (maxRows,diffTime, > maxRows/diffTime) >=20 > ### TEST3 ### > startTime =3D time.time() > row =3D tbTest.row > row['string'] =3D '1234567890123456' > row['id'] =3D 1 > row['float'] =3D 1.0/3.0 > for i in range(0, maxRows): > row.append() > tbTest.flush() > diffTime =3D time.time()-startTime > print 'test3: %d rows in %s seconds (%s/s)' % (maxRows,diffTime, > maxRows/diffTime) > h5file.close() >=20 > ---SNIP--- >=20 > This code try to insert maxRows (10**6) into a table. The table is > similar at table in: > http://pytables.sourceforge.net/doc/PyCon.html#section4 (small table) > used for benchmarking >=20 > class Small(IsDescription): > var1 =3D Col("CharType", 16) > var2 =3D Col("Int32", 1) > var3 =3D Col("Float64", 1) >=20 > As you'll notice, there are 3 possible tests: > TEST 1: creation of rows and append() in loop > TEST 2: creation of rows in loop, no append (no disk use) > TEST 3: creation of row before loop, and append in loop() >=20 > flush is always out of loop, at the end. >=20 > The testbed is an AMD Athlon(tm) 64 Processor 2800+, 1GB Ram, and > 5400rpm disk > I've seen same results on a Dual XEON machine, 1GB Ram, SCSI disk. >=20 > testbed:~# python test.py > =20 > test1: 1000000 rows in 22.7905650139 seconds (43877.8064252/s) > test2: 1000000 rows in 20.3718218803 seconds (49087.4113211/s) > test3: 1000000 rows in 2.01304578781 seconds (496759.68925/s) >=20 > that troughput (40-50krows/s) is less (10 times circa) than that in > http://pytables.sourceforge.net/doc/PyCon.html#section4 (small table) >=20 > seems that the row assignment: =20 >=20 > row[fieldName] =3D value >=20 > took huge amount of time and that time for writing to disk is 10 times > smaller than assignation. > I'm doing someting wrong? >=20 > I've made some test on source code, and i've realized that, on > TableExtension.pyx on > __setitem__ of Row (called when i do a row[...] =3D value) the line: >=20 > self._wfields[fieldName][self._unsavednrows] =3D value >=20 > is responsible for that slowness. >=20 > self._wfields[fieldName] is a numarray.array, isnt't? Assignment took so > much time > related to disk? >=20 > I can do a strace of process if you need. >=20 > I've tried with pytables 1.1 and 1.2-b1 compiled from source, > and numarray 1.1.1, 1.3.2, 1.3.3 compiled from source with same results >=20 > It's a normal beaviour, on your opinion? >=20 > Thanks in advance, > kesko78 >=20 >=20 > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practic= es > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & Q= A > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users --=20 >0,0< Francesc Altet http://www.carabos.com/ V V C=E1rabos Coop. V. Enjoy Data "-" |