pytables-users Mailing List for PyTables - Hierarchical datasets (Page 147)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi, i have an issue with pytables performance:

This is my python code for testing:

---SNIP---
from tables import *

class PytTest(IsDescription):
    string                   = Col('CharType', 16)
    id                       = Col('Int32', 1)
    float                    = Col('Float64', 1)

h5file = openFile('probe.h5','a')

try:
    testGroup = h5file.root.testGroup
except NoSuchNodeError:
    testGroup = h5file.createGroup(
        "/", "testGroup", "Test Group")
try:
    tbTest = testGroup.test
except NoSuchNodeError:
    tbTest = h5file.createTable(
        testGroup,
        'test',
        PytTest,
        'Test table')

import time

maxRows = 10**6

### TEST1 ###
startTime = time.time()
row = tbTest.row
for i in range(0, maxRows):
    row['string'] = '1234567890123456'
    row['id'] = 1
    row['float'] = 1.0/3.0
    row.append()
tbTest.flush()
diffTime = time.time()-startTime
print 'test1: %d rows in %s seconds (%s/s)' % (maxRows,diffTime,
maxRows/diffTime)

### TEST2 ###
startTime = time.time()
row = tbTest.row
for i in range(0, maxRows):
    row['string'] = '1234567890123456'
    row['id'] = 1
    row['float'] = 1.0/3.0
diffTime = time.time()-startTime
print 'test2: %d rows in %s seconds (%s/s)' % (maxRows,diffTime,
maxRows/diffTime)

### TEST3 ###
startTime = time.time()
row = tbTest.row
row['string'] = '1234567890123456'
row['id'] = 1
row['float'] = 1.0/3.0
for i in range(0, maxRows):
    row.append()
tbTest.flush()
diffTime = time.time()-startTime
print 'test3: %d rows in %s seconds (%s/s)' % (maxRows,diffTime,
maxRows/diffTime)
h5file.close()

---SNIP---

This code try to insert maxRows (10**6) into a table. The table is
similar at table in:
http://pytables.sourceforge.net/doc/PyCon.html#section4 (small table)
used for benchmarking

class Small(IsDescription):
    var1 = Col("CharType", 16)
    var2 = Col("Int32", 1)
    var3 = Col("Float64", 1)

As you'll notice, there are 3 possible tests:
TEST 1: creation of rows and append() in loop
TEST 2: creation of rows in loop, no append (no disk use)
TEST 3: creation of row before loop, and append in loop()

flush is always out of loop, at the end.

The testbed is an AMD Athlon(tm) 64 Processor 2800+, 1GB Ram, and
5400rpm disk
I've seen same results on a Dual XEON machine, 1GB Ram, SCSI disk.

testbed:~# python test.py

test1: 1000000 rows in 22.7905650139 seconds (43877.8064252/s)
test2: 1000000 rows in 20.3718218803 seconds (49087.4113211/s)
test3: 1000000 rows in 2.01304578781 seconds (496759.68925/s)

that troughput (40-50krows/s) is less (10 times circa) than that in
http://pytables.sourceforge.net/doc/PyCon.html#section4 (small table)

seems that the row assignment:  

row[fieldName] = value

took huge amount of time and that time for writing to disk is 10 times
smaller than assignation.
I'm doing someting wrong?

I've made some test on source code, and i've realized that, on
TableExtension.pyx on
__setitem__ of Row (called when i do a row[...] = value) the line:

      self._wfields[fieldName][self._unsavednrows] = value

is responsible for that slowness.

self._wfields[fieldName] is a numarray.array, isnt't? Assignment took so
much time
related to disk?

I can do a strace of process if you need.

I've tried with pytables 1.1 and 1.2-b1 compiled from source,
and numarray 1.1.1, 1.3.2, 1.3.3 compiled from source with same results

It's a normal beaviour, on your opinion?

Thanks in advance,
kesko78

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

pytables-users Mailing List for PyTables - Hierarchical datasets (Page 147)

pytables-users — PyTables users discussion list