[Pytables-users] Row.append() performance

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I am writing a lot of data(close to 122GB ) to a hdf5 file using PyTables.
The execution time for writing the query result to the file is close to 10
hours, which includes querying the database and then writing to the file.
When I timed the entire execution, I found that it takes as much time to
get the data from the database as it takes to write to the hdf5 file. Here
is the small snippet(P.S: the execution time noted below is not for 122GB
data, but a small subset close to 10GB):

class ContactClass(table.IsDescription):
    name= tb.StringCol(4200)
    address= tb.StringCol(4200)
    emailAddr= tb.StringCol(180)
    phone= tb.StringCol(256)

h5File= table.openFile(<file name>, mode="a", title= "Contacts")
t= h5File.createTable(h5File.root, 'ContactClass', ContactClass,
filters=table.Filters(5, 'blosc'), expectedrows=77806938)

resultSet= get data from database
currRow= t.row
print("Before appending data: %s" % str(datetime.now()))
for (attributes ..) in resultSet:
     currRow['name']= attribute[0]
     currRow['address']= attribute[1]
     currRow['emailAddr']= attribute[2]
     currRow['phone']= attribute[3]
     currRow.append()
print("After done appending: %s" % str(datetime.now()))
t.flush()
print("After done flushing: %s" % str(datetime.now()))

.. gives me:
*Before appending data  2013-04-11 10:42:39.903713  *
*After done appending: 2013-04-11 11:04:10.002712*
*After done flushing: 2013-04-11 11:05:50.059893*
*
*
it seems like append() takes a lot of time. Any suggestions on how to
improve this?

Thanks,
Shyam