[Pytables-users] My use of PyTables and a question

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I just started using PyTables last week and I'd not come across HDF5
before so I'm still getting up to speed.  However, I think using PyTables
to complement a system I already have is going to work out well.

I am responsible for the backend of a large website.  This backend uses
Jini and is distributed across 20 machines and could be classified as a
Services Oriented Architecture.  Each machine has at least one VM (but
possibly up to four VMs) running and each VM has anywhere from 15-30
services running.  Each VM is printing out performance statistics (total
number of requests, number of successful, failed or active requests,
processing time for each request, time spent waiting for other backend
systems to respond etc) for every running service at one minute intervals.
 As you can imagine this is a lot of data but we have a strict SLA which
in addition to specifying uptime requirements has response time
requirements.  Each month I generate a report based on these files.

Initially this was done entirely in Python.  Then I moved to loading these
files into MySQL and then into PostgreSQL.  Now I've decided to store the
actual log files in an HDF5 format and use PyTables to compute hourly,
daily, weekly and monthly "roll-ups" of averages and standard deviations
of response times.  These roll-ups will be stored in PostgreSQL since many
people query them.  Relatively few people query out a log file entry for a
specific minute.  Currently I'm converting the log files from a pipe
delimited file to HDF5 using PyTables.  Eventually I'd like to have the
application generate the file in HDF5 format to avoid the transform step. 
Then I plan to use PyTables to compute the "roll-ups".

So far everything is working out well but I can't say that I've used
PyTables enough yet to make any suggestions.  As I get deeper into it I'll
be sure to post.  Also I'd like to express my thanks for the great
documentation that goes along with this project.  It is responsible for me
getting this far without having to post to the list.

However, I do have a few questions:
Is there a way to get a Column of a table returned as a numarray so I can
computed means and standard deviations with NumPy?  Or do I just read the
column as a Python list and then create a numarray out of it?

One of my fields is a timestamp.  I don't see such a datatype in Appendix
A.  However, upon digging further it appears that timestamps are supported
by the HDF5 spec but have not yet been implemented.  Is this correct?  If
so, how are other people getting around this?  I use the excellent
mx.DateTime library and am heading down the path of calling .ticks() on
any timestamp fields and storing that as a Float32Col.

Thanks,
chuck