Hi,
In the context of optimizing the PyTables support for numarray and recarray
objects I have been playing with recarray module, and ended with a
somewhat improved version of it. Roughly, the modifications done are:
- Addition of a cache to quickly access the columns (numarrays) in
recarrays. This object is a map (dictionary) where keys are the name
fields and values are the pointers to columns regarded as numarrays
entities. This dictionary is accessible through the new attribute
"_fields".
- Addition of an attribute for recarray objects named "_record" which
points to a special object ("Record2" class) and that it is aware of
the "_fields" cache. It that can be used to access the different
rows in recarray objects in an efficient way.
- The "_record" object is callable (it defines the "__call__" method)
so as to select the recarray row that is active during access to the
different fields.
Advantages
- Access to rows and columns (fields) in recarray objects are one
order of magnitude faster (!).
- The new "_fields" and "_record" attributes provides convenient and
intuitive ways to access the information in recarrays.
- The "_record" attribute suports the "__getattr__" and "__setattr__"
methods that are very convenient to access fields in a row.
Drawbacks
- "_record" attribute points always to the same object and you must
pass it the row over which you want to operate. So, if you want to
have two different objects pointing to different rows, you can't use
the "_record" attribute to get them (but you can still use the
existing Record class through by calling the "__getitem__" method
of a recarray object).
- Two new attributes are added to the already large number of recarray
variables. However, this new variables has no special space
requirements as "_record" object has only three scalar variables
and "_fields" is a dictionary with many entries as fields in
recarray, which should be not a large amount.
I'm attaching this modified version as well as a testbed program in order to
test their new access methods and improved performance. The output of this
program ran in a pentium4@2GHz machine is also included.
Feel free to play with it and/or take/adapt the parts you consider better
suited to recarray module.
--
Francesc Alted PGP KeyID: 0x61C8C11F
|