|
From: Francesc A. <fa...@ca...> - 2005-09-02 14:58:33
|
Hi Francesco,
This problem is related with slowness of element-by-element assignment
in numarray objects. If you want to achieve big performance for writing
PyTables, it is better that you use the Table.append method (instead of
Row.append).
I normally use the next code:
def fill_arrays(self, start, stop):
"Some generic filling function"
arr_f8 =3D numarray.arange(start, stop, type=3Dnumarray.Float64)
arr_i4 =3D numarray.arange(start, stop, type=3Dnumarray.Int32)
if self.userandom:
arr_f8 +=3D random_array.normal(0, stop*self.scale,
shape=3D[stop-start])
arr_i4 =3D numarray.array(arr_f8, type=3Dnumarray.Int32)
return arr_i4, arr_f8
def fill_table(self, con):
"Fills the table"
table =3D con.root.table
j =3D 0
for i in xrange(0, self.nrows, self.step):
stop =3D (j+1)*self.step
if stop > self.nrows:
stop =3D self.nrows
arr_i4, arr_f8 =3D self.fill_arrays(i, stop)
recarr =3D records.fromarrays([arr_i4, arr_f8])
table.append(recarr)
j +=3D 1
table.flush()
in order to fill a table with two columns (Int32 and Float32).
If you try this, I'm sure you will get much better results.
Cheers,
El dv 02 de 09 del 2005 a les 16:09 +0200, en/na Francesco Del Degan va
escriure:
> Hi, i have an issue with pytables performance:
>=20
> This is my python code for testing:
>=20
> ---SNIP---
> from tables import *
>=20
> class PytTest(IsDescription):
> string =3D Col('CharType', 16)
> id =3D Col('Int32', 1)
> float =3D Col('Float64', 1)
>=20
> h5file =3D openFile('probe.h5','a')
>=20
> try:
> testGroup =3D h5file.root.testGroup
> except NoSuchNodeError:
> testGroup =3D h5file.createGroup(
> "/", "testGroup", "Test Group")
> try:
> tbTest =3D testGroup.test
> except NoSuchNodeError:
> tbTest =3D h5file.createTable(
> testGroup,
> 'test',
> PytTest,
> 'Test table')
>=20
> import time
>=20
> maxRows =3D 10**6
>=20
> ### TEST1 ###
> startTime =3D time.time()
> row =3D tbTest.row
> for i in range(0, maxRows):
> row['string'] =3D '1234567890123456'
> row['id'] =3D 1
> row['float'] =3D 1.0/3.0
> row.append()
> tbTest.flush()
> diffTime =3D time.time()-startTime
> print 'test1: %d rows in %s seconds (%s/s)' % (maxRows,diffTime,
> maxRows/diffTime)
>=20
> ### TEST2 ###
> startTime =3D time.time()
> row =3D tbTest.row
> for i in range(0, maxRows):
> row['string'] =3D '1234567890123456'
> row['id'] =3D 1
> row['float'] =3D 1.0/3.0
> diffTime =3D time.time()-startTime
> print 'test2: %d rows in %s seconds (%s/s)' % (maxRows,diffTime,
> maxRows/diffTime)
>=20
> ### TEST3 ###
> startTime =3D time.time()
> row =3D tbTest.row
> row['string'] =3D '1234567890123456'
> row['id'] =3D 1
> row['float'] =3D 1.0/3.0
> for i in range(0, maxRows):
> row.append()
> tbTest.flush()
> diffTime =3D time.time()-startTime
> print 'test3: %d rows in %s seconds (%s/s)' % (maxRows,diffTime,
> maxRows/diffTime)
> h5file.close()
>=20
> ---SNIP---
>=20
> This code try to insert maxRows (10**6) into a table. The table is
> similar at table in:
> http://pytables.sourceforge.net/doc/PyCon.html#section4 (small table)
> used for benchmarking
>=20
> class Small(IsDescription):
> var1 =3D Col("CharType", 16)
> var2 =3D Col("Int32", 1)
> var3 =3D Col("Float64", 1)
>=20
> As you'll notice, there are 3 possible tests:
> TEST 1: creation of rows and append() in loop
> TEST 2: creation of rows in loop, no append (no disk use)
> TEST 3: creation of row before loop, and append in loop()
>=20
> flush is always out of loop, at the end.
>=20
> The testbed is an AMD Athlon(tm) 64 Processor 2800+, 1GB Ram, and
> 5400rpm disk
> I've seen same results on a Dual XEON machine, 1GB Ram, SCSI disk.
>=20
> testbed:~# python test.py
> =20
> test1: 1000000 rows in 22.7905650139 seconds (43877.8064252/s)
> test2: 1000000 rows in 20.3718218803 seconds (49087.4113211/s)
> test3: 1000000 rows in 2.01304578781 seconds (496759.68925/s)
>=20
> that troughput (40-50krows/s) is less (10 times circa) than that in
> http://pytables.sourceforge.net/doc/PyCon.html#section4 (small table)
>=20
> seems that the row assignment: =20
>=20
> row[fieldName] =3D value
>=20
> took huge amount of time and that time for writing to disk is 10 times
> smaller than assignation.
> I'm doing someting wrong?
>=20
> I've made some test on source code, and i've realized that, on
> TableExtension.pyx on
> __setitem__ of Row (called when i do a row[...] =3D value) the line:
>=20
> self._wfields[fieldName][self._unsavednrows] =3D value
>=20
> is responsible for that slowness.
>=20
> self._wfields[fieldName] is a numarray.array, isnt't? Assignment took so
> much time
> related to disk?
>=20
> I can do a strace of process if you need.
>=20
> I've tried with pytables 1.1 and 1.2-b1 compiled from source,
> and numarray 1.1.1, 1.3.2, 1.3.3 compiled from source with same results
>=20
> It's a normal beaviour, on your opinion?
>=20
> Thanks in advance,
> kesko78
>=20
>=20
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practic=
es
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & Q=
A
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
--=20
>0,0< Francesc Altet http://www.carabos.com/
V V C=E1rabos Coop. V. Enjoy Data
"-"
|