[Pytables-users] PyTables 0.7 pre-release

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi everybody,

I'm about to release PyTables 0.7, and I would be grateful if somebody can
test this version on her platform and tell me if it the tests were ok or
not. Please, do not forget to tell me the platform where you have tested the
package!.

You can download it from the sourceforge site:

http://sourceforge.net/project/showfiles.php?group_id=63486

If all goes well, I'll made the official announcement by the end of this
week.

I'm attaching the announcement. Hope you will enjoy the new features ;-)

-- 
Francesc Alted

Announcing PyTables 0.7
-----------------------

This is the third public beta release. The version 0.6 was internal
and will never be released.

On this release you will find:
       - new AttributeSet class 
       - 25% I/O speed improvement 
       - fully multidimensional table cells support
       - new column descriptors
       - row deletion in tables is finally here
       - much more!

More in detail:

What's new
-----------

- A new AttributeSet class has been added. This will allow the
  addition and deletion of generic attributes (any scalar type plus
  any Python object supported by Pickle) as easy as this:

  table.attrs.date = "2003/07/28 10:32"     # Attach a string to table
  group._v_attrs.tempShift = 1.2            # Attach a float to group
  array.attrs.detectorList = [1,2,3,4]      # Attach a list to array
  del array.attrs.detectorList        # Detach detectorList attr from array

- PyTables now has fully multidimensional table cells support. This
  has been possible in part by the implementation of multidimensional
  cells in numarray.records.RecArray object. Thanks to numarray crew
  for their hard work!.

- New column descriptors added: IntCol, Int8Col, UInt8Col, Int16Col,
  UInt16Col, Int32Col, UInt32Col, Int64Col, UInt64Col, FloatCol,
  Float32Col, Float64Col and StringCol. I think they are more explicit
  and easy-to-use than the, now deprecated (but still supported!),
  Col() descriptor. All the examples and user's manual has been
  accordingly updated.

- The new Table.removeRows(start, stop) allow to remove rows from
  tables!. This was a long-time asked feature. There is still a
  limitation, though: you cannot delete rows in extremely large Tables
  (because it is needed to put the remaining rows after the stop
  parameter in memory). Also, in this release, the performance is not 
  optimized. These issues will hopefully be addressed in future
  releases.

- Added iterators to File, Group and Table (they support now the
  __iter__() special method). They allow much better object usability,
  especially in interactive mode. See documentation for examples of
  use.

- Added a __getitem__() method to Table, that works more or less like
  read(), but with extended slices support.

- As a consequence of rewriting table iterators in C (with the help of
  Pyrex, of course) the table read performance has been improved
  between 20% and 30%. With that, data selections in PyTables are
  starting to beat powerful relational databases like SQLite, even for
  the in-core case (!). I think there is still room for another 20% to
  30% improvement, so stay tuned.

- A checksum is now added automatically when using LZO (not with UCL
  where I'm having some difficulties implementing that
  capability). The Adler32 algorithm has been chosen because of its
  speed. With that, the compressing/decompressing speed has dropped 1%
  or 2%, which is hardly noticeable. I think this addition will allow
  the cautious user to be a bit more confident about this excellent
  compressor. Code has been added to still be able to read files
  created without this checksum (so you can be confident that you will
  be able to read your existing files compressed with LZO and UCL).

- Recursion has been removed from PyTables. Before, this made the
  maximum depth tree to be less than the Python recursion limit (which
  depends on implementation, but is around 900, at least in
  Linux). Now, the limit has been set (somewhat arbitrarily) to be
  2048. Thanks to John Nielsen for implementing the new iterative
  method!.

- A new rootUEP parameter to openFile() has been added. You can now
  define the root from where you want to start build the object
  tree. Thanks to John Nielsen for the suggestion and a first
  implementation of it.

- Some (non-serious) bugs were discovered and fixed.

- Updated documentation, so that you can learn more about how to use
  all these new jingles and bells. It is also available on the web:
  http://pytables.sourceforge.net/html-doc/usersguide-html.html

- Added more unit tests (up to 353 now!)

- PyTables 0.7 *needs* the newest numarray 0.6 and HDF-1.6.0 to
  compile and work. It has been tested with Python 2.2.3 and Python
  2.3c2 and should work fine on both versions.

What it is
----------

In short, PyTables provides a powerful and very Pythonic interface to
process and organize your table and array data on disk.

Its goal is to enable the end user to manipulate easily scientific
data tables and Numerical and numarray Python objects in a persistent
hierarchical structure. The foundation of the underlying hierarchical
data organization is the excellent HDF5 library
(http://hdf.ncsa.uiuc.edu/HDF5).

A table is defined as a collection of records whose values are stored
in fixed-length fields. All records have the same structure and all
values in each field have the same data type.  The terms
"fixed-length" and "strict data types" seems to be quite a strange
requirement for an language like Python, that supports dynamic data
types, but they serve a useful function if the goal is to save very
large quantities of data (such as is generated by many scientific
applications, for example) in an efficient manner that reduces demand
on CPU time and I/O resources.

Quite a bit effort has been invested to make browsing the hierarchical
*data structure a pleasant experience. PyTables implements two
(orthogonal) easy-to-use methods for browsing.

What is HDF5?
-------------

For those people who know nothing about HDF5, it is is a general
purpose library and file format for storing scientific data made at
NCSA. HDF5 can store two primary objects: datasets and groups. A
dataset is essentially a multidimensional array of data elements, and
a group is a structure for organizing objects in an HDF5 file. Using
these two basic constructs, one can create and store almost any kind of
scientific data structure, such as images, arrays of vectors, and
structured and unstructured grids. You can also mix and match them in
HDF5 files according to your needs.

Platforms
---------

I'm using Linux as the main development platform, but PyTables should
be easy to compile/install on other UNIX machines. This package has
also passed all the tests on a UltraSparc platform with Solaris 7 and
Solaris 8. It also compiles and passes all the tests on a SGI
Origin2000 with MIPS R12000 processors and running IRIX 6.5.

Regarding Windows platforms, PyTables has been tested with Windows
2000 and Windows XP, but it should also work with other flavors.

An example?
-----------

For online code examples, have a look at

http://pytables.sourceforge.net/tut/tutorial1-1.html

and 

http://pytables.sourceforge.net/tut/tutorial1-2.html

There is also an small one attached at the end of this message.

Web site
--------

Go to the PyTables web site for more details:

http://pytables.sourceforge.net/

Share your experience
---------------------

Let me know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Have fun!

-- Francesc Alted
fa...@op...

--------------- Example of use --------------------------------------
from numarray import *
from tables import *

class Particle(IsDescription):
    name        = StringCol(length=16)    # 16-character String
    lati        = Int16Col()              # integer
    longi       = IntCol()                # integer
    pressure    = Float32Col(shape=(2,3)) # 2-D float array (single-precision)
    temperature = Float64Col(shape=(2,3)) # 2-D float array (double-precision)

# Open a file in "w"rite mode
fileh = openFile("table-simple.h5", mode = "w")

# Create a new table in root
table = fileh.createTable(fileh.root, 'table', Particle, "Title example")
particle = table.row

# Fill the table with 10 particles
for i in xrange(10):
    # First, assign the values to the Particle record
    particle['name']  = 'Particle: %6d' % (i)
    particle['lati'] = i 
    particle['longi'] = 10 - i
    particle['pressure'] = array(i*arange(2*3), shape=(2,3))
    particle['temperature'] = float(i**2)
    # This injects the row values.
    particle.append()

# We need to flush the buffers in table in order to get an
# accurate number of records on it.
table.flush()

# Delete the third and fourth row
table.removeRows(3,5)

print "Table metadata:"
print repr(table)

print "Table contents:"
for row in table:
    print row

print "name column of the 5th row:"
print table[4].field("name")

# Finally, close the file
fileh.close()