|
From: Francesc A. <fa...@op...> - 2003-07-29 19:26:41
|
Hi everybody, I'm about to release PyTables 0.7, and I would be grateful if somebody can test this version on her platform and tell me if it the tests were ok or not. Please, do not forget to tell me the platform where you have tested the package!. You can download it from the sourceforge site: http://sourceforge.net/project/showfiles.php?group_id=63486 If all goes well, I'll made the official announcement by the end of this week. I'm attaching the announcement. Hope you will enjoy the new features ;-) -- Francesc Alted Announcing PyTables 0.7 ----------------------- This is the third public beta release. The version 0.6 was internal and will never be released. On this release you will find: - new AttributeSet class - 25% I/O speed improvement - fully multidimensional table cells support - new column descriptors - row deletion in tables is finally here - much more! More in detail: What's new ----------- - A new AttributeSet class has been added. This will allow the addition and deletion of generic attributes (any scalar type plus any Python object supported by Pickle) as easy as this: table.attrs.date = "2003/07/28 10:32" # Attach a string to table group._v_attrs.tempShift = 1.2 # Attach a float to group array.attrs.detectorList = [1,2,3,4] # Attach a list to array del array.attrs.detectorList # Detach detectorList attr from array - PyTables now has fully multidimensional table cells support. This has been possible in part by the implementation of multidimensional cells in numarray.records.RecArray object. Thanks to numarray crew for their hard work!. - New column descriptors added: IntCol, Int8Col, UInt8Col, Int16Col, UInt16Col, Int32Col, UInt32Col, Int64Col, UInt64Col, FloatCol, Float32Col, Float64Col and StringCol. I think they are more explicit and easy-to-use than the, now deprecated (but still supported!), Col() descriptor. All the examples and user's manual has been accordingly updated. - The new Table.removeRows(start, stop) allow to remove rows from tables!. This was a long-time asked feature. There is still a limitation, though: you cannot delete rows in extremely large Tables (because it is needed to put the remaining rows after the stop parameter in memory). Also, in this release, the performance is not optimized. These issues will hopefully be addressed in future releases. - Added iterators to File, Group and Table (they support now the __iter__() special method). They allow much better object usability, especially in interactive mode. See documentation for examples of use. - Added a __getitem__() method to Table, that works more or less like read(), but with extended slices support. - As a consequence of rewriting table iterators in C (with the help of Pyrex, of course) the table read performance has been improved between 20% and 30%. With that, data selections in PyTables are starting to beat powerful relational databases like SQLite, even for the in-core case (!). I think there is still room for another 20% to 30% improvement, so stay tuned. - A checksum is now added automatically when using LZO (not with UCL where I'm having some difficulties implementing that capability). The Adler32 algorithm has been chosen because of its speed. With that, the compressing/decompressing speed has dropped 1% or 2%, which is hardly noticeable. I think this addition will allow the cautious user to be a bit more confident about this excellent compressor. Code has been added to still be able to read files created without this checksum (so you can be confident that you will be able to read your existing files compressed with LZO and UCL). - Recursion has been removed from PyTables. Before, this made the maximum depth tree to be less than the Python recursion limit (which depends on implementation, but is around 900, at least in Linux). Now, the limit has been set (somewhat arbitrarily) to be 2048. Thanks to John Nielsen for implementing the new iterative method!. - A new rootUEP parameter to openFile() has been added. You can now define the root from where you want to start build the object tree. Thanks to John Nielsen for the suggestion and a first implementation of it. - Some (non-serious) bugs were discovered and fixed. - Updated documentation, so that you can learn more about how to use all these new jingles and bells. It is also available on the web: http://pytables.sourceforge.net/html-doc/usersguide-html.html - Added more unit tests (up to 353 now!) - PyTables 0.7 *needs* the newest numarray 0.6 and HDF-1.6.0 to compile and work. It has been tested with Python 2.2.3 and Python 2.3c2 and should work fine on both versions. What it is ---------- In short, PyTables provides a powerful and very Pythonic interface to process and organize your table and array data on disk. Its goal is to enable the end user to manipulate easily scientific data tables and Numerical and numarray Python objects in a persistent hierarchical structure. The foundation of the underlying hierarchical data organization is the excellent HDF5 library (http://hdf.ncsa.uiuc.edu/HDF5). A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seems to be quite a strange requirement for an language like Python, that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. Quite a bit effort has been invested to make browsing the hierarchical *data structure a pleasant experience. PyTables implements two (orthogonal) easy-to-use methods for browsing. What is HDF5? ------------- For those people who know nothing about HDF5, it is is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors and running IRIX 6.5. Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP, but it should also work with other flavors. An example? ----------- For online code examples, have a look at http://pytables.sourceforge.net/tut/tutorial1-1.html and http://pytables.sourceforge.net/tut/tutorial1-2.html There is also an small one attached at the end of this message. Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! -- Francesc Alted fa...@op... --------------- Example of use -------------------------------------- from numarray import * from tables import * class Particle(IsDescription): name = StringCol(length=16) # 16-character String lati = Int16Col() # integer longi = IntCol() # integer pressure = Float32Col(shape=(2,3)) # 2-D float array (single-precision) temperature = Float64Col(shape=(2,3)) # 2-D float array (double-precision) # Open a file in "w"rite mode fileh = openFile("table-simple.h5", mode = "w") # Create a new table in root table = fileh.createTable(fileh.root, 'table', Particle, "Title example") particle = table.row # Fill the table with 10 particles for i in xrange(10): # First, assign the values to the Particle record particle['name'] = 'Particle: %6d' % (i) particle['lati'] = i particle['longi'] = 10 - i particle['pressure'] = array(i*arange(2*3), shape=(2,3)) particle['temperature'] = float(i**2) # This injects the row values. particle.append() # We need to flush the buffers in table in order to get an # accurate number of records on it. table.flush() # Delete the third and fourth row table.removeRows(3,5) print "Table metadata:" print repr(table) print "Table contents:" for row in table: print row print "name column of the 5th row:" print table[4].field("name") # Finally, close the file fileh.close() |