You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2004 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
(1) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
| 2005 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
(1) |
Aug
|
Sep
(3) |
Oct
|
Nov
(1) |
Dec
(2) |
| 2006 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
(5) |
May
(3) |
Jun
(2) |
Jul
(9) |
Aug
(6) |
Sep
(9) |
Oct
|
Nov
|
Dec
(1) |
| 2007 |
Jan
|
Feb
(3) |
Mar
(3) |
Apr
(2) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(2) |
Oct
(1) |
Nov
(3) |
Dec
(2) |
| 2008 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(1) |
Oct
(2) |
Nov
(1) |
Dec
(1) |
| 2009 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
| 2010 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(4) |
Oct
(1) |
Nov
(1) |
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
(2) |
Feb
(1) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Francesc A. <fa...@ca...> - 2005-12-21 11:51:48
|
Hi List, Some of you may find interesting my presentation at the latest HDF Workshop that took place in San Francisco, USA, a few weeks ago: http://pytables.sourceforge.net/doc/HDF_IX_Workshop.pdf Here it is made a brief description of Python and why it is good for scientific/technical work. Moreover, an introduction to PyTables is made, as well as a brief summary of the new capabilities in PyTables 1.2 and the future capabilities in PyTables Pro. Finally, the state of the art in CSTables progress is depicted. That's all folks, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Francesc A. <fa...@ca...> - 2005-11-22 20:00:53
|
Hi List, Below is the official announcement for the new PyTables 1.2. Thanks to every body that has contributed in some way or another to make this release a reality.=20 The release has been checked very toughly (in fact, so toughly that the mantra "release early/release often" is starting to suffer a bit :-/). However, I'm sure that some of you will soon come with some nasty bug that has not been checked properly. Anyway, your feedback is still encouraged and, what's more, appreciated ;-) Enjoy! =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Announcing PyTables 1.2 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The PyTables development team is happy to announce the availability of a new major version of PyTables package. This version sports a completely new in-memory tree implementation based around a *node cache system*. This system loads nodes only when needed and unloads them when they are rarely used. The new feature allows the opening and creation of HDF5 files with large hierarchies very quickly and with a low memory consumption (the object tree is no longer completely loaded in-memory), while retaining all the powerful browsing capabilities of the previous implementation of the object tree. You can read more about the dings and bells of the new cache system in: http://www.carabos.com/downloads/pytables/NewObjectTreeCache.pdf Also, Jeff Whitaker has kindly contributed a new module called tables.NetCDF. It is designed to be used as a drop-in replacement for Scientific.IO.NetCDF, with only minor actions to existing code. Also, if you have the Scientific.IO.NetCDF module installed, it allows to do conversions between HDF5 <--> NetCDF3 formats. Go to the PyTables web site for downloading the beast: http://pytables.sourceforge.net/ If you want more info about this release, please check out the more comprehensive announcement message available in: http://www.carabos.com/downloads/pytables/ANNOUNCE-1.2.html Acknowledgments =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Thanks to the users who provided feature improvements, patches, bug reports, support and suggestions. See THANKS file in distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last but not least, a big thank you to THG (http://www.hdfgroup.org/) for sponsoring many of the new features recently introduced in PyTables. =2D-- **Enjoy data!** -- The PyTables Team |
|
From: Francesc A. <fa...@ca...> - 2005-09-13 18:44:45
|
Hi List, Unfortunately, I've made a mistake and forgot to fix an issue in the PyTables 1.1.1 released yesterday. I've just uploaded new versions of files for 1.1.1 in the usual PyTables file repository: http://sourceforge.net/project/showfiles.php?group_id=3D63486 The problem was related with persistence of properties of indexes for tables on disk. If you don't have indexed tables, then you are not affected by this and you don't need to upgrade. Have in mind that if you do not upgrade, some tests in heavy mode will not pass. This is not grave anyway. The new packages has been extensively checked (Linux on Intel Pentium, Win on Intel Pentium, Linux on Intel Itanium, FreeBSD on AMD Opteron and MacOSX on IBM PowerPC, all in --heavy mode), so chances are that this time the 1.1.1 is as stable as it is supposed to be. Sorry for the inconvenience! =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Francesc A. <fa...@ca...> - 2005-09-12 19:18:52
|
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D Announcing PyTables 1.1.1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D This is a maintenance release of PyTables. On it, several optimizations and bug fixes has been made. As some of the fixed bugs were quite important, it's strongly recommended for users to upgrade. Go to the PyTables web site for downloading the beast: http://pytables.sourceforge.net/ or keep reading for more info about the improvements and bugs fixed. Changes more in depth =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Improvements: =2D Optimized the opening of files with a large number of objects. Now, files with table objects opens a 50% faster, and file with arrays opens more than twice as fast (up to 2000 objects/s on a Pentium 4@2GHz). Hence, a file with a combination of both kinds of objects, opens between a 50% and 100% faster than 1.1. =2D Optimized the creation of NestedRecArray objects using NumArray objects as columns, so that filling a table with Table.append() method achieves the similar performance than PyTables pre-1.1 releases. Bug fixes: =2D ``Table.readCoordinates()`` now converts the coords parameter into ``In= t64`` indices automatically. =2D Fixed a bug that prevented appending to tables (though Table.append) using a list of numarray objects. =2D Solved a small bug for creating indexes for first time in tables and retain the filter properties for posterior use. =2D Int32 attributes are dealed correctly in 64-bit platforms now. =2D Correction for accepting lists of numarrays as input for NestedRecArray= s. =2D Fixed problem creating rank 1 multi-dimensional string columns in ``Tab= le`` objects. Closes SF bug #1269023. =2D Avoid errors when unpickling objects stored in attributes. See the section ``AttributeSet`` in the reference chapter of the User's Manual for more information. Closes SF bug #1254636. =2D Assignment for *Array slices has been improved in order to solve some issues with shapes. Closes SF bug #1288792. Known bugs: =2D Classes inheriting from IsDescription subclasses do not inherit columns defined in the super-class. See SF bug #1207732 for more info. =2D Time datatypes are non-portable between big-endian and little-endian architectures. This is ultimately a consequence of a HDF5 limitation. See SF bug #1234709 for more info. Backward-incompatible changes: =2D None. Important note for MacOSX users =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D UCL compressor seems to work badly on MacOSX platforms. Until the problem would be isolated and eventually solved, UCL will not be compiled by default on MacOSX platforms, even if the installer finds it in the system. However, if you still want to get UCL support on MacOSX, you can use the --force-ucl flag in setup.py. Important note for Python 2.4 and Windows users =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D If you are willing to use PyTables with Python 2.4 in Windows platforms, you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET 2003. It can be found at: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-164-win-net.ZIP Users of Python 2.3 on Windows will have to download the version of HDF5 compiled with MSVC 6.0 available in: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-164-win.ZIP Share your experience =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. =2D--- **Enjoy data!** -- The PyTables Team =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Francesc A. <fa...@ca...> - 2005-09-09 16:01:21
|
Hi List, The next public release of PyTables, namely, 1.1.1 has been made available in: http://www.carabos.com/downloads/pytables/preliminary/ (no binary versions for Windows available yet) Look at the announcement notes below. If I don't hear problems about it for a while, I'll announce it more widely. Also, a new beta version of 1.2 release (beta 2) is also available in case anybody wants to take it a try. It seems that the Windows issues has been resolved. It is also available in url stated above. Enjoy! =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D Announcing PyTables 1.1.1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D This is a maintenance release of PyTables. On it, several optimizations and bug fixes has been made. As some of the fixed bugs are quite important, it's strongly recommended for users to upgrade. Go to the PyTables web site for downloading the beast: http://pytables.sourceforge.net/ or keep reading for more info about the improvements and bugs fixed. Changes more in depth =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Improvements: =2D Optimized the opening of files with a large number of objects. Now, files with table objects opens a 50% faster, and file with arrays opens more than twice as fast (up to 2000 objects/s on a Pentium 4@2GHz). Hence, a file with a combination of both kinds of objects, opens between a 50% and 100% faster than 1.1. =2D Optimized the creation of NestedRecArray objects using NumArray objects as columns, so that filling a table with Table.append() method achieves the similar performance than PyTables pre-1.1 releases. Backward-incompatible changes: =2D None. Bug fixes: =2D ``Table.readCoordinates()`` now converts the coords parameter into ``In= t64`` indices automatically. =2D Fixed a bug that prevented appending to tables (though Table.append) using a list of numarray objects. =2D Solved a small bug for creating indexes for first time in tables and retain the filter properties for posterior use. =2D Int32 attributes are dealed correctly in 64-bit platforms now. =2D Correction for accepting lists of numarrays as input for NestedRecArray= s. =2D Fixed problem creating rank 1 multi-dimensional string columns in ``Tab= le`` objects. Closes SF bug #1269023. =2D Avoid errors when unpickling objects stored in attributes. See the section ``AttributeSet`` in the reference chapter of the User's Manual for more information. Closes SF bug #1254636. Known bugs: =2D Classes inheriting from IsDescription subclasses do not inherit columns defined in the super-class. See SF bug #1207732 for more info. =2D Time datatypes are non-portable between big-endian and little-endian architectures. This is ultimately a consequence of a HDF5 limitation. See SF bug #1234709 for more info. =2D--- **Enjoy data!** -- The PyTables Team |
|
From: Francesc A. <fa...@ca...> - 2005-07-14 13:07:53
|
Hi List, After quite a few testing iterations, I'm happy to announce the immediate availability of the newest PyTables 1.1. Many thanks to the people on this list for contributing not only bug reports, fixes and feedback but also code implementing new features (most specially Antonio Valentino). Here follows the official announcemnt. Enjoy! =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Announcing PyTables 1.1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D On this version you will find support for a nice set of new features, like nested datatypes, enumerated datatypes, nested iterators, support for native multidimensional attributes, a new object for dealing with compressed arrays (CArray), bzip2 compression support and more. Many bugs has been addressed as well. What it is =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D **PyTables** is a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data (with support for full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code, makes it a very easy-to-use tool for high performance data storage and retrieval. Perhaps its more interesting feature is that it optimizes memory and disk resources so that data take much less space (between a factor 3 to 5, and more if the data is compressible) than other solutions, like for example, relational or object oriented databases. Besides, PyTables I/O for table objects is buffered, implemented in C and carefully tuned so that you can reach much better performance with PyTables than with your own home-grown wrappings to the HDF5 library. Changes more in depth =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Improvements: =2D ``Table``, ``EArray`` and ``VLArray`` objects now support enumerated types. ``Array`` objects support opening existing HDF5 enumerated arrays. Enumerated types are restricted sets of ``(name, value)`` pairs. Use the ``Enum`` class to easily define new enumerations that will be saved along with your data. =2D Now, the HDF5 library is responsible to do data conversions when the datasets are written in a machine with different byte-ordering than the machine that reads the dataset. With this, all the data is converted on-the-fly and you always get native datatypes in memory. I think this approach to be more convenient in terms of CPU consumption when using these datasets. Right now, this only works for tables, though. =2D Support for nested datatypes is in place. You can now made columns of tables that host another columns for an unlimited depth (well, theoretically, in practice until the python recursive limit would be reached). Convenient NestedRecArray objects has been implemented as data containers. Cols and Description accessors has been improved so you can navigate on the type hierarchy very easily (natural naming is has been implemented for the task). =2D Added support for native HDF5 multidimensional attributes. Now, you can load native HDF5 files that contains fully multidimensional attributes; these attributes will be mapped to NumArray objects. Also, when you save NumArray objects as attributes, they get saved as native HDF5 attributes (before, NumArray attributes where pickled). =2D A brand-new class, called CArray, has been introduced. It's mainly like an Array class (i.e. non-enlargeable), but with compression capabilities enabled. The existence of CArray also allows PyTables to read native HDF5 chunked, non-enlargeable datasets. =2D Bzip2 compressor is supported. Such a support was already in PyTables 1.0, but forgot to announce it. =2D New LZO2 (http://www.oberhumer.com/opensource/lzo/lzonews.php) compressor is supported. The installer now recognizes whether LZO1 or LZO2 is installed, and adapts automatically to it. If both are installed in your system, then LZO2 is chosen. LZO2 claims to be fully compatible (both backwark and forward) with LZO1, so you should not experience any problem during this transition. =2D The old limit of 256 columns in a table has been released. Now, you can have tables with any number of columns, although if you try to use a too high number (i.e. > 1024), you will start to consume a lot of system resources. You have been warned!. =2D The limit in the length of column names has been released also. =2D Nested iterators for reading in tables are supported now. =2D A new section in tutorial about how to modify values in tables and arrays has been added to the User's Manual. Backward-incompatible changes: =2D None. Bug fixes: =2D VLArray now correctly updates the number of rows internal counter when opening an existing VLArray object. Now you can add new rows to existing VLA's without problems. =2D Tuple flavor for VLArrays now works as intented, i.e. reading VLArray objects will always return tuples even in the case of multimensional Atoms. Before, this operations returned a mix of tuples and lists. =2D If a column was not able to be indexed because it has too few entries, then _whereInRange is called instead of _whereIndexed. Fixes #1203202. =2D You can call now Row.append() in the middle of Table iterators without resetting loop counters. Fixes #1205588. =2D PyTables used to give a segmentation fault when removing the last row out of a table with the table.removeRows() method. This is due to a limitation in the HDF5 library. Until this get fixed in HDF5, a NotImplemented error is raised when trying to do that. Address #1201023. =2D You can safely break a loop over an iterator returned by Table.where(). Fixes #1234637. =2D When removing a Group with hidden child groups, those are effectively closed now. =2D Now, there is a distinction between shapes 1 and (1,) in tables. The former represents a scalar, and the later a 1-D array with just one element. That follows the numarray convention for records, and makes more sense as well. Before 1.1, shapes 1 and (1,) were represented by an scalar on disk. Known bugs: =2D Classes inheriting from IsDescription subclasses do not inherit columns defined in the superclass. See SF bug #1207732 for more info. =2D Time datatypes are non-portable between big-endian and little-endian architectures. This is ultimately a consequence of a HDF5 limitation. See SF bug #1234709 for more info. Important note for MacOSX users =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D UCL compressor seems to work badly on MacOSX platforms. Until the problem would be isolated and eventually solved, UCL will not be compiled by default on MacOSX platforms, even if the installer finds it in the system. However, if you still want to get UCL support on MacOSX, you can use the --force-ucl flag in setup.py. Important note for Python 2.4 and Windows users =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D If you are willing to use PyTables with Python 2.4 in Windows platforms, you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET 2003. It can be found at: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-164-win-net.ZIP Users of Python 2.3 on Windows will have to download the version of HDF5 compiled with MSVC 6.0 available in: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-164-win.ZIP Where can PyTables be applied? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), very large XML files, or for creating a centralized repository for system logs, to name only a few possible uses. What is a table? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for a language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =46or those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms =3D=3D=3D=3D=3D=3D=3D=3D=3D We are using Linux on top of Intel32 as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. This package has also been successfully compiled and tested on a =46reeBSD 5.4 with Opteron64 processors, a UltraSparc platform with Solaris 7 and Solaris 8, a SGI Origin3000 with Itanium processors running IRIX 6.5 (using the gcc compiler), Microsoft Windows and MacOSX (10.2 although 10.3 should work fine as well). In particular, it has been thoroughly tested on 64-bit platforms, like Linux-64 on top of an Intel Itanium, AMD Opteron (in 64-bit mode) or PowerPC G5 (in 64-bit mode) where all the tests pass successfully. Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. Web site =3D=3D=3D=3D=3D=3D=3D=3D Go to the PyTables web site for more details: http://pytables.sourceforge.net/ To know more about the company behind the PyTables development, see: http://www.carabos.com/ Share your experience =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. =2D--- **Enjoy data!** -- The PyTables Team |
|
From: Andreu A. <aa...@cs...> - 2005-05-23 13:45:18
|
Announcing CSTables 1.0b ------------------------- This is a beta release. What it is ---------- CSTables is a Python client-server database layer library that provides concurrency, client metadata cache, lock system, authentication and other features. CSTables is built on top of the HDF5 and PyTables libraries (that provide the data API and data disk management). The union of the forementioned libraries plus the Twisted Framework (in charge of the communication stuff) and NumArray (as a provider of containers for presenting and transporting data) outcomes in a complete client-server database with very interesting features. Features -------- Client-Server CSTables allows remote HDF5 files to be accessed from clients using the PyTables API. Some classes and new methods are added, for dealing with client-server execution and other features. The Twisted Framework has been used to implement the client-server communication. Client metadata cache CSTables uses internally a client cache for storing metadata of the remote file objects. What is metadata? *The attributes of the remote file objects (File, Group, Leaf, AttributeSet, etc). *The remote file tree hierarchy. Relations of ascendats to descendants are cached when they are used for the first time, allowing a smooth navigation in subsequent accesses. Client metadata cache refresh Attributes that are cached on the clients will be refreshed when they are changed in the server. A timestamp mechanism is provided to discard out of phase modifications, and guarantee that older modifications that arrive to the client after a newest modification will be discarded. The cache refresh is very useful for inter-client communication, because clients no need to poll the database for messages of other clients. Concurrency Multiple clients run against the same CSTables server. No transactional support is added. However the Lock System can be used to implement applications with transactional features. All remote operations are guaranteed to be atomic, including those, such as iterrows(), that can require several communications with the server. Authentication CSTables allows client authentication. The server can be configured to accept anonymous logins, or to allow only connections that are authenticated against its user database. Authentication is made through a challenge/authentication protocol with double hashed MD5 passwords. Remote file system management Methods to allow creating, opening, removing and listing files in the server files are supported. Lock System In multi-client applications with clients accessing to the same database, a mechanism should be provided to allow to execute portions of code preventing other clients from making changes to the desired objects while the modifications are not over. CSTables provides a lock mechanism, allowing applications to specifically put a lock in a node or in a full subtree. When a lock is set in a node and other client attempts to modify the node in an incompatible way, that client depending on a switch variable that can be set programatically either becomes blocked and remains blocked until the client that has put the lock releases it or raises a BlockedException exception. Full compatibility with PyTables API There are several ways to program with CSTables. One of these allows to wrap PyTables monolitic scripts without modifying a single line and execute them against a remote server where reside the files the script will expect to find. What is HDF5? ------------- For those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. What is PyTables? ----------------- PyTables is a hierarchical database package designed to efficiently manage very large amounts of data. PyTables is built on top of the HDF5 library and the numarray package. It features an object-oriented interface that, combined with C extensions for the peformance-critical parts of the code (generated using Pyrex), makes it a fast, yet extremely easy to use tool for interactively save and retrieve very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that data take much less space (between a factor 3 to 5, and more if the data is compressible) than other solutions, like for example, relational or object oriented databases. Platforms --------- Currently tested and cross-tested in Windows XP and Red Hat Linux 9.0. Web site -------- Go to the CSTables page for more details: http://www.cstables.org <http://www.cstables.org/> -- Andreu Alted |
|
From: Waldemar O. <wal...@gm...> - 2005-05-12 14:55:48
|
On 5/9/05, Francesc Altet <fa...@ca...> wrote: > Hi List, >=20 > Following is the announcement for PyTables 1.0. It would be very nice > if anybody would like to test the package on her system before a more > announcement would be made. >=20 > Your feedback is very welcome! >=20 [sinp] It passes tests on WinXP with Python 2.4 Waldemar |
|
From: Francesc A. <fa...@ca...> - 2005-05-09 19:09:35
|
Hi List, =46ollowing is the announcement for PyTables 1.0. It would be very nice if anybody would like to test the package on her system before a more announcement would be made.=20 Your feedback is very welcome! =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Announcing PyTables 1.0 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The Carabos crew is very proud to announce the immediate availability of **PyTables release 1.0**. On this release you will find a series of exciting new features, being the most important the Undo/Redo capabilities, support for objects (and indexes!) with more than 2**31 rows, better I/O performance for Numeric objects, new time datatypes (useful for time-stamping fields), support for Octave HDF5 files and improved support for HDF5 native files. What it is =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D **PyTables** is a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data (with support for full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code, makes it a very easy-to-use tool for high performance data storage and retrieval. It is built on top of the HDF5 library and the numarray package, and provides containers for both heterogeneous data (``Table``) and homogeneous data (``Array``, ``EArray``) as well as containers for keeping lists of objects of variable length (like Unicode strings or general Python objects) in a very efficient way (``VLArray``). It also sports a series of filters allowing you to compress your data on-the-fly by using different compressors and compression enablers. But perhaps the more interesting features are its powerful browsing and searching capabilities that allow you to perform data selections over heterogeneous datasets exceeding gigabytes of data in just tenths of second. Besides, the PyTables I/O is buffered, implemented in C and carefully tuned so that you can reach much better performance with PyTables than with your own home-grown wrappings to the HDF5 library. Changes more in depth =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Improvements: =2D New Undo/Redo feature (i.e. integrated support for undoing and/or redoing actions). This functionality lets you to put marks in specific places of your data operations, so that you can make your HDF5 file pop back (undo) to a specific mark (for example for inspecting how your data looked at that point). You can also go forward to a more recent marker (redo). You can even do jumps to the marker you want using just one instruction. =2D Reading Numeric objects from ``*Array`` and ``Table`` (Numeric columns) objects have a 50-100x speedup. With that, Louis Wicker reported that a speed of 350 MB/s can be achieved with Numeric objects (on a SGI Altix with a Raid 5 disk array) while with numarrays, this speed approaches 900 MB/s. This improvement has been possible mainly due to a nice recipe from Todd Miller. Thanks Todd! =2D With PyTables 1.0 you can finally create Tables, EArrays and VLArrays with more than 2**31 (~ 2 thousand millions) rows, as well as retrieve them. Before PyTables 1.0, retrieving data on these beasts was not well supported, in part due to limitations in some slicing functions in Python (that rely on 32-bit adressing). So, we have made the necessary modifications in these functions to support 64-bit indexes and integrated them into PyTables. As a result, our tests shows that this feature works just fine. =2D As a consequence of the above, you can now index columns of tables with more than 2**31 rows. For instance, indexes have been created for integer columns with 10**10 (yeah, 10 thousand million) rows in less than 1 hour using an Opteron @ 1.6 GHz system (~ 1 hour and half with a Xeon Intel32 @ 2.5 GHz platform). Enjoy! =2D Now PyTables supports the native HDF5 time types, both 32-bit signed integer and 64-bit fixed point timestamps. They are mapped to ``Int32`` and ``Float64`` values for easy manipulation. See the documentation for the ``Time32Col`` and ``Time64Col`` classes. =2D The opening and copying of files with large number of objects has been made faster by correcting a typo in ``Table._open()``. Thanks to Ashley Walsh for sending a patch for this. =2D Now, one can modify rank-0 (scalar) ``EArray`` datasets. Thanks to Norbert Nemec for providing a patch for this. =2D You are allowed from this version on to add non-valid natural naming names as node or attribute names. A warning is issued to warn (but not forbid) you in such a case. Of course, you have to use the ``getattr()`` function so as to access such invalid natural names. =2D The indexes of ``Table`` and ``*Array`` datasets can be of long type besides of integer type. However, indexes in slices are still restricted to regular integer type. =2D The concept of ``READ_ONLY`` system attributes has disappeared. You can change them now at your own risk! However, you still cannot remove or rename system attributes. =2D Now, one can do reads in-between write loops using ``table.row`` instances. This is thanks to a decoupling in I/O buffering: now there is a buffer for reading and other for writing, so that no collisions take place anymore. Fixes #1186892. =2D Support for Octave HDF5 output format. Even complex arrays are supported. Thanks to Edward C. Jones for reporting this format. Backward-incompatible changes: =2D The format of indexes has been changed and indexes in files created with PyTables pre-1.0 versions are ignored now. However, ``ptrepack`` can still save your life because it is able to convert your old files into the new indexing format. Also, if you copy the affected tables to other locations (by using ``Leaf.copy()``), it will regenerate your indexes with the new format for you. =2D The API has changed a little bit (nothing serious) for some methods. See ``RELEASE-NOTES.txt`` for more details. Bug fixes: =2D Added partial support for native HDF5 chunked datasets. They can be read now, and even extended, but only along the first extensible dimension. This limitation may be removed when multiple extensible dimensions are supported in PyTables. =2D Formerly, when the name of a column in a table was subsumed in another column name, PyTables crashed while retrieving information of the former column. That has been fixed. =2D A bug prevented the use of indexed columns of tables that were in other hierarchical level than root. This is solved now. =2D When a ``Group`` was renamed you were not able to modify its attributes. This has been fixed. =2D When whether ``Table.modifyColumns()`` or ``Table.modifyRows()`` were called, a subsequent call to ``Table.flush()`` didn't really flush the modified data to disk. This works as intended now. =2D Fixed some issues when iterating over ``*Array`` objects using the ``List`` or ``Tuple`` flavor. Important note for Python 2.4 and Windows users =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D If you are willing to use PyTables with Python 2.4 in Windows platforms, you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET 2003. It can be found at: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-164-win-net.ZIP Users of Python 2.3 on Windows will have to download the version of HDF5 compiled with MSVC 6.0 available in: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-164-win.ZIP Where can PyTables be applied? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), very large XML files, or for creating a centralized repository for system logs, to name only a few possible uses. What is a table? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for a language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =46or those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms =3D=3D=3D=3D=3D=3D=3D=3D=3D We are using Linux on top of Intel32 as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. This package has also been successfully compiled and tested on a =46reeBSD 5.4 with Opteron64 processors, a UltraSparc platform with Solaris 7 and Solaris 8, a SGI Origin3000 with Itanium processors running IRIX 6.5 (using the gcc compiler) and Microsoft Windows. In particular, it has been thoroughly tested on 64-bit platforms, like Linux-64 on top of an Intel Itanium or AMD Opteron (in 64-bit mode). Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. Web site =3D=3D=3D=3D=3D=3D=3D=3D Go to the PyTables web site for more details: http://pytables.sourceforge.net/ To know more about the company behind the PyTables development, see: http://www.carabos.com/ Share your experience =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. =2D--- **Enjoy data!** -- The PyTables Team =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Vicent M. (V+) <vm...@ca...> - 2005-02-25 17:08:24
|
[This is to correct a couple of funny mistakes in my previous announcement. Now, you are told where to get the software. And, who knows, perhaps in the next future a ViTables release will be available *on Mars* too ;)] Announcing ViTables 1.0 beta =2D--------------------------- I'm happy to announce the availability of ViTables-1.0b, the new member of the PyTables family. It's a graphical tool for browsing and editing files in both PyTables and HDF5 format. As it happens with the entire PyTables family, the main strength of ViTables is its ability to manage really large datasets in a fast and comfortable manner. For example, with ViTables you can open a table with one thousand millions of rows in a few tenths of second, with very low memory requirements. In this release you will find, among others, the following features: - Display data hierarchy as a fully browsable object tree. - Open several files simultaneously. - Reorganize your existing files in a graphical way. - Display files and nodes (group or leaf) properties, including metadata and attributes. - Display heterogeneous entities, i.e. tables. - Display homogeneous (numeric or textual) entities, i.e. arrays. - Zoom into multidimensional table cells. - Editing capabilities for nodes and attributes: creation/deletion, copy/paste, rename... - Fully integrated documentation browser Moreover, once CSTables (the client-server version of PyTables) will be out, ViTables will be able to manage remote PyTables/HDF5 files as if they were local ones. Downloads =2D-------- Go to the ViTables web site for more details and downloads: http://www.carabos.com/products/vitables Platforms =2D-------- At the moment, ViTables has been fully tested only on Linux platforms, but as it is made on top of Python, Qt, PyQt and PyTables, its portability should be really good and should work just fine in other Unices (like MacOSX) and Windows. Note for Windows users: Due to license issues, commercial versions of Qt and PyQt are needed to run ViTables on Windows platforms. Furthermore, those libraries must be packaged in a special manner to fulfill some special license requirements. An installer that handles properly these issues is being developed. A Windows version of ViTables will be published as soon as the installer development finishes. Current development state =2D------------------------ This is a beta version. The first stable, commercial, version will be available late in March. What is in the package =2D--------------------- In the package you will find the program sources, some info files as README, INSTALL and LICENSE, and the documentation directory. Documentation includes the User's Guide in HTML4 and also the xml source file, so you can format it as you want. Finally, those of you interested in the internals of ViTables can find the documentation of all its modules in HTML4 format. Legal notice =2D----------- Please, remember that this is commercial software. The beta version is made publically available so that beta testers can work on it, but the terms of the license must be respected. Basically it means that the software or its modifications cannot be distributed to anybody in any way without C=C3=A1rabos explicit permission. See the LICENSE file for detailed information. Share your experience =2D-------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy Data with ViTables, the troll of the PyTables family! =2D-=20 Share what you know, learn what you don't |
|
From: Vicent M. (V+) <vm...@ca...> - 2005-02-25 12:51:46
|
Announcing ViTables-1.0b =2D----------------------- I'm happy to announce the availability of ViTables-1.0b, the new member of the PyTables family. It's a graphical tool for browsing and editing files in both PyTables and HDF5 format. As it happens with the entire PyTables family, the main strength of ViTables is its ability to manage really large datasets in a fast and comfortable manner. For example, with ViTables you can open a table with one thousand millions of rows in a few tenths of second, with very low memory requirements. In this release you will find, among others, the following features: - Display data hierarchy as a fully browsable object tree. - Open several files simultaneously. - Reorganize your existing files in a graphical way. - Display files and nodes (group or leaf) properties, including metadata and attributes. - Display heterogeneous entities, i.e. tables. - Display homogeneous (numeric or textual) entities, i.e. arrays. - Zoom into multidimensional table cells. - Editing capabilities for nodes and attributes: creation/deletion, copy/paste, rename... - Fully integrated documentation browser Moreover, once CSTables (the client-server version of PyTables) will be out, ViTables will be able to manage remote PyTables/HDF5 files as if they were local ones. Platforms =2D-------- At the moment, ViTables has been fully tested only on Linux platforms, but as it is made on top of Python, Qt, PyQt and PyTables, its portability should be really good and should work just fine in other Unices (like MacOSX) and Windows. Note for Windows users: Due to license issues, commercial versions of Qt and PyQt are needed to run ViTables on Windows platforms. Furthermore, those libraries must be packaged in a special manner to fulfill some special license requirements. An installer that handles properly these issues is being developed. A Windows version of ViTables will be published as soon as the installer development finishes. Current development state =2D------------------------ This is a beta version. The first stable, commercial, version will be available late on Mars. What is in the package =2D--------------------- In the package you will find the program sources, some info files as README, INSTALL and LICENSE, and the documentation directory. Documentation includes the User's Guide in HTML4 and also the xml source file, so you can format it as you want. Finally, those of you interested in the internals of ViTables can find the documentation of all its modules in HTML4 format. Legal notice =2D----------- Please, remember that this is commercial software. The beta version is made publically available so that beta testers can work on it, but the terms of the license must be respected. Basically it means that the software or its modifications cannot be distributed to anybody in any way without C=C3=A1rabos explicit permission. See the LICENSE file for detailed information. Share your experience =2D-------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy Data with ViTables, the troll of the PyTables family! Vicent Mas =2D-=20 Share what you know, learn what you don't |
|
From: Francesc A. <fa...@ca...> - 2004-12-02 18:24:37
|
Announcing PyTables 0.9.1 =2D------------------------ This release is mainly a maintenance version. In it, some bugs has been fixed and a few improvements has been made. One important thing is that chunk sizes in EArrays has been re-tuned to get much better performance and compression rations. Besides, it has been tested against the latest Python 2.4 and all tests units seems to pass fine. Changes more in depth =2D-------------------- Improvements: =2D The chunksize computation for EArrays has been re-tuned to allow better performance and *much* better compression rations. =2D New --unpackshort and --quantize flags has been added to nctoh5 script. --unpackshort unpack short integer variables to float variables using scale_factor and add_offset netCDF variable attributes. --quantize quantize data to improve compression using least_significant_digit netCDF variable attribute (not active by default). See http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml for further explanation of what this attribute means. Thanks to Jeff Whitaker for providing this. =2D Table.itersequence has received a new parameter called "sort". This allows to disable the sorting of the sequence in case the user wants so. Backward-incompatible changes: =2D Now, the AttributeSet class throw an AttributeError on __getattr__ for nonexistent attributes in it. Formerly, the routine returned None, which is pretty much against convention in Python and breaks the built-in hasattr() function. Thanks to Robert Nemec for noting this and offering a patch. =2D VLArray.read() has changed its behaviour. Now, it always returns a list, as stated in documentation, even when the number of elements to return is 0 or 1. This is much more consistent when representing the actual number of elements on a certain VLArray row. API additions: =2D A Row.getTable() has been added. It is an accessor for the associated Table object. =2D A File.copyAttrs() has been added. It allows copying attributes from one leaf to other. Properly speaking, this was already there, but not documented :-/ Bug fixes: =2D Now, the copy of hierarchies works even when there are scalar Arrays (i.e. Arrays which shape is ()) on it. Thanks to Robert Nemec for providing a patch. =2D Solved a memory leak regarding the Filters instance associated with the File object, that was not released after closing the file. Now, there are no known leaks on PyTables itself. =2D Fixed a bug in Table.append() when the table was indexed. The problem was that if table was in auto-indexing mode, some rows were lost in the indexation process and hence, not indexed correctly. =2D Improved security of nodes name checking. Closes #1074335 Share your experience =2D-------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! =2D-=20 =46rancesc Altet Who's your data daddy? =A0PyTables |
|
From: Francesc A. <fa...@py...> - 2004-11-08 12:11:42
|
Dear PyTables users, Unfortunately, PyTables 0.9 has received its first bug report shortly after the first release. The good news is that it has been fixed now. The problem iwas that the new setup.py won't install PyTables if either lzo or ucl are not installed on the system, although that they are optional libraries. This problem only happens on Unix platforms, though. A new version of pytables-0.9.tar.gz with a cure for this has been uploaded to: http://sourceforge.net/project/showfiles.php?group_id=63486 For those that already downloaded the tar package and don't want to download it again, it will be enough to apply the next patch: --- ../exports/pytables-0.9/setup.py 2004-11-05 16:33:58.000000000 +0100 +++ setup.py 2004-11-08 11:23:21.000000000 +0100 @@ -94,6 +94,7 @@ else: if not incdir or not libdir: print "Optional %s libraries or include files not found. Disabling support for them." % (libname,) + return else: # Necessary to include code for optional libs def_macros.append(("HAVE_"+libname.upper()+"_LIB", 1)) Sorry for the inconveniences, -- Francesc Altet |
|
From: Francesc A. <fa...@py...> - 2004-11-05 16:47:56
|
Announcing PyTables 0.9 ----------------------- I'm very proud to announce the latest and most powerful flavor of PyTables ever. On this release you will find a series of quite exciting new features, being the most important the indexing capabilities, in-kernel selections, support for complex datatypes and the possibility to modify values in both tables *and* arrays (yeah, finally :). What is ------- PyTables is a hierarchical database package designed to efficiently manage extremely large amounts of data (supports full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the peformance-critical parts of the code, makes it a very easy to use tool for high performance data saving and retrieving. It is built on top of the HDF5 library and the numarray package, and provides containers for both heterogeneous data (Tables) and homogeneous data (Array, EArray). It also sports a container for keeping lists of objects of variable length on a very efficient way (VLArray). A flexible support of filters allows you to compress your data on-the-flight by using different compressors and compression enablers. Moreover, its powerful browsing and searching capabilities allow you to do data selections over tables exceeding gigabytes of data in just tenths of second. Changes more in depth --------------------- New features: - Indexing of columns in tables. That allow to make data selections on tables up to 500 times faster than standard selections (for ex. doing a selection along an indexed column of 100 milion of rows takes less than 1 second on a modern CPU). Perhaps the most interesting thing about the indexing algorithm implemented by PyTables is that the time taken to index grows *lineraly* with the length of the data, so, making the indexation process to be *scalable* (quite differently to many relational databases). This means that it can index, in a relatively quick way, arbitrarily large table columns (for ex. indexing a column of 100 milion of rows takes just 100 seconds, i.e. at a rate of 1 Mrow/sec). See more detailed info about that in http://pytables.sourceforge.net/doc/SciPy04.pdf. - In-kernel selections. This feature allow to make data selections on tables up to 5 times faster than standard selections (i.e. pre-0.9 selections), without a need to create an index. As a hint of how fast these selections can be, they are up to 10 times faster than a traditional relational database. Again, see http://pytables.sourceforge.net/doc/SciPy04.pdf for some experiments on that matter. - Support of complex datatypes for all the data objects (i.e. Table, Array, EArray and VLArray). With that, the complete set of datatypes of Numeric and numarray packages are supported. Thanks to Tom Hedley for providing the patches for Array, EArray and VLArray objects, as well as updating the User's Manual and adding unit tests for the new functionality. - Modification of values. You can modifiy Table, Array, EArray and VLArray values. See Table.modifyRows, Table.modifyColumns() and the newly introduced __setitem__() method for Table, Array, EArray and VLArray entities in the Library Reference of User's Manual. - A new sub-package called "nodes" is there. On it, there will be included different modules to make more easy working with different entities (like images, files, ...). The first module that has been added to this sub-package is "FileNode", whose mission is to enable the creation of a database of nodes which can be used like regular opened files in Python. In other words, you can store a set of files in a PyTables database, and read and write it as you would do with any other file in Python. Thanks to Ivan Vilata i Balaguer for contributing this. Improvements: - New __len__(self) methods added in Arrays, Tables and Columns. This, in combination with __getitem__(self,key) allows to better emulate sequences. - Better capabilities to import generic HDF5 files. In particular, Table objects (in the HDF5_HL naming schema) with "holes" in their compound type definition are supported. That allows to read certain files produced by NASA (thanks to Stephen Walton for reporting this). - Much improved test units. More than 2000 different tests has been implemented which accounts for more than 13000 loc (this represents twice of the PyTables library code itself (!)). Backward-incompatible API changes: - The __call__ special method has been removed from objects File, Group, Table, Array, EArray and VLArray. Now, you should use walkNodes() in File and Group and iterrows in Table, Array, EArray and VLArray to get the same functionality. This would provide better compatibility with IPython as well. 'nctoh5', a new importing utility: - Jeff Whitaker has contributed a script to easily convert NetCDF files into HDF5 files using Scientific Python and PyTables. It has been included and documented as a new utility. Bug fixes: - A call to File.flush() now invoke a call to H5Fflush() so to effectively flushing all the file contents to disk. Thanks to Shack Toms for reporting this and providing a patch. - SF #1054683: Security hole in utils.checkNameValidity(). Reported in 2004-10-26 by ivilata - SF #1049297: Suggestion: new method File.delAttrNode(). Reported in 2004-10-18 by ivilata - SF #1049285: Leak in AttributeSet.__delattr__(). Reported in 2004-10-18 by ivilata - SF #1014298: Wrong method call in examples/tutorial1-2.py. Reported in 2004-08-23 by ivilata - SF #1013202: Cryptic error appending to EArray on RO file. Reported in 2004-08-21 by ivilata - SF #991715: Table.read(field="var1", flavor="List") fails. Reported in 2004-07-15 by falted - SF #988547: Wrong file type assumption in File.__new__. Reported in 2004-07-10 by ivilata Where PyTables can be applied? ------------------------------ PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), very large XML files, or for creating a centralized repository for system logs, to name only a few possible uses. What is a table? ---------------- A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for a language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? ------------- For those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux (Intel 32-bit) as the main development platform, but PyTables should be easy to compile/install on many other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors, with the MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit platforms, like AMD Opteron running GNU/Linux 2.4.21 Server, Intel Itanium (IA64) running GNU/Linux 2.4.21 or PowerPC G5 with Linux 2.6.x in 64bit mode. It has also been tested in MacOSX platforms (10.2 but should also work on newer versions). Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ To know more about the company behind the PyTables development, see: http://www.carabos.com/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Bon profit! -- Francesc Altet |
|
From: Francesc A. <fa...@py...> - 2004-07-13 08:04:50
|
The primary purpose of this release is to incorporate updates to related to the newly released numarray 1.0. I've taken the opportunity to backport some improvements added in PyTables 0.9 (in alpha stage) as well as to fix the known problems Improvements: - The logic for computing the buffer sizes has been revamped. As a consequence, the performance of writing/reading tables with large row sizes has improved by a factor of ten or more, now exceeding 70 MB/s for writing and 130 MB/s for reading (using compression). See http://sf.net/mailarchive/forum.php?thread_id=4963045&forum_id=13760 for more info. - The maximum row size for tables has been raised to 512 KB (before it was 8 KB, due to some internal limitations) - Documentation has been improved in minor details. As a result of a fix in the underlying documentation system (tbook), chapters start now at odd pages, instead of even. So those of you who want to print to double side probably will have better luck now when aligning pages ;). Another one is that HTML documentation has improved its look as well. Bug Fixes: - Indexing of Arrays with list or tuple flavors (#968131) When retrieving single elements from an array with 'List' or 'Tuple' flavors, an error occurred. This has been corrected and now you can retrieve fileh.root.array[2] without problems for 'List' or 'Tuple' flavored (E, VL)Arrays. - Iterators on Arrays with list or tuple flavors fail (#968132) When using iterators with Array objects with 'List' or 'Tuple' flavors, an error occurred. This has been corrected. - Last Index (-1) of Arrays doesn't work (#968149) When accessing to the last element in an Array using the notation -1, an empty list (or tuple or array) is returned instead of the proper value. This happened in general with all negative indices. Fixed. - Table.read(flavor="List") should return pure lists (#972534) However, it used to return a pointer to numarray.records.Record instances, as in: >>> fileh.root.table.read(1,2,flavor="List") [<numarray.records.Record instance at 0x4128352c>] >>> fileh.root.table.read(1,3,flavor="List") [<numarray.records.Record instance at 0x4128396c>, <numarray.records.Record instance at 0x41283a8c>] Now the next records are returned: >>> fileh.root.table.read(1,2, flavor=List) [(' ', 1, 1.0)] >>> fileh.root.table.read(1,3, flavor=List) [(' ', 1, 1.0), (' ', 2, 2.0)] In addition, when reading a single row of a table, a numarray.records.Record pointer was returned: >>> fileh.root.table[1] <numarray.records.Record instance at 0x4128398c> Now, it returns a tuple: >>> fileh.root.table[1] (' ', 1, 1.0) Which I think is more consistent, and more Pythonic. - Copy of leaves fails... (#973370) Attempting to copy leaves (Table or Array with different flavors) on top of themselves caused an internal error in PyTables. This has been corrected by silently avoiding the copy and returning the original Leaf as a result. Minor changes: - When assigning a value to a non-existing field in a table row, now a KeyError is raised, instead of the AttributeError that was issued before. I think this is more consistent with the type of error. - Tests have been improved so as to pass the whole suite when compiled in 64 bit mode on a Linux/PowerPC machine (namely a dual-G5 Powermac running a 64-bit, 2.6.4 Linux kernel and the preview YDL distribution for G5, with 64-bit GCC toolchain). Thanks to Ciro Cattuto for testing and reporting the modifications that were needed. Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! -- Francesc Alted |
|
From: Francesc A. <fa...@py...> - 2004-07-02 09:37:02
|
Hi all, After almost 3 months of waiting, I'm happy to announce that PyTables 0.8 has finally entered in Linux Debian testing (sarge). Besides, a satisfactory build has been made for the following architectures: alpha arm hppa i386 ia64 m68k mips mipsel powerpc s390 sparc which represent the complete set of architectures supported by Debian. Enjoy!, -- Francesc Alted |
|
From: Francesc A. <fa...@py...> - 2004-06-15 17:08:36
|
Maybe this info would be useful to somebody. As you can see, PyTables has
been included in the latest release of Quantian.
Regards,
=46rancesc
=2D--------- Missatge transmes ----------
Subject: New Quantian release 0.5.9.1 available
Date: Dimarts 15 Juny 2004 04:45
=46rom: Dirk Eddelbuettel <ed...@de...>
To: qua...@li...
Cc: qua...@ed...
[ This email is sent to those whose email addresses are in my quantian
mail folder due to prior emails, plus LWN and DWN who had run previous
announcements, and as suggested, the openmosix-general, clusterknoppix
and debian-knoppix lists. Anybody who considers this unwanted is kindly
asked to send me a private mail and I will immediately remove the=20
corresponding alias entry. --edd ]
Announcing Quantian release 0.5.9.1
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
I What is it?
Quantian is a remastering of Knoppix, the self-configuring and directly
bootable cdrom that turns any pc or laptop into a full-featured Linux
workstation, and clusterKnoppix, which adds support for openMosix.=20
However, Quantian differs from (cluster)Knoppix by adding a large set=20
of programs of interest to applied or theoretical workers in=20
quantitative or data-driven fields.=20
See http://dirk.eddelbuettel.com/quantian.html for more details.
=20
II What is new?
o First release based on Knoppix 3.4 with many changes such as improved
hardware detection and support for new hardware, the addition of
captive-ntfs which may allow write support for ntfs partitions, KDE
3.2.2 with kdevelop and more;
o Based on Knoppix 3.4, the clusterKnoppix release from May 10 adds
kernel 2.4.26 with the 'testing status' openMosix patch as well as
a non-openMosix kernel 2.6.6; updated openMosix goodies gomd, chpox,
tyd; support for atheros wireless, cisco mpi 350 wireless, prism54,=20
host-ap
o New Quantian software includes many new R / CRAN packages such as
gregmisc, sm, quadprog and the entire snow suite: snow, rmpi, rpvm,
rsprng for 'Simple Network of Workstations' distributed computing=20
using either one of pvm or mpi; the Axiom computer-algebra system;
python-tables and cernlib which pulls in a large number of packages
from the CERN particle physics lab
o 'kitchen sink' size -- this release comes in at 1.1gb and will /not/
fit onto a cdrom. It runs really well from disk, see the 'lilo howto'
http://dirk.eddelbuettel.com/quantian/howto_liloboot.html for details
on how to boot the iso from hard disks. Burning a dvd with the larger
iso image should also work.
o Last but not least, custom artwork thanks to a contributed background
image provided by Ed Pegg Jr.
o Mailing lists for Quantian up and running
Following the 0.4.9.3 release, a Quantian project was opened on
alioth.debian.org. So far the only use of the alioth infrastructure
has been the creation of two mailing lists
quantian-announce for announcements, intended to be low volume
quantian-general for general discussions about Quantian
Please go to =20
http://alioth.debian.org/mail/?group_id=3D1425
for subscription info etc., and start using the quantian-general lists
for general questions, comments, suggestions or discussions about=20
Quantian.
Quantian-general is subscribed to quantian-announce, so you only need
to subscribe to one (but can of course subscribe to both).
o See http://dirk.eddelbuettel.com/quantian/changelog.html for details.
III Where do I get it?
Downloads are available from the two main hosts both of which also
provide rsync:
o U of Washington: =20
- http://www.analytics.washington.edu/downloads/quantian
- rsync://www.analytics.washington.edu::quantian
o European mirrors, bittorrent site and cdrom vendors will hopefully
catch up over the next few days. See=20
http://dirk.eddelbuettel.com/quantian.html
=20
for download info.
IV Known Bugs
o None right now -- so please test away!=20
V Other items
o Mailing lists have been created, see above.
o Feedback / poll on package additions or removal
As always, I welcome comments and suggestions about programs to be
added or removed. Existing Debian packages get pushed to the front of
the line.
Please send feedback, questions, comments, ... to the=20
=20
qua...@li...
list to maximise the number of eyes glancing at any one question.
Best regards, Dirk
=2D-=20
=46EATURE: VW Beetle license plate in California
=2D------------------------------------------------------
=2D--------- Missatge transmes ----------
Subject: Re: New Quantian release 0.5.9.1 available
Date: Dimarts 15 Juny 2004 13:12
=46rom: Dirk Eddelbuettel <ed...@de...>
To: Francesc Alted <fa...@py...>
=46rancesc,
On Tue, Jun 15, 2004 at 12:29:46PM +0200, Francesc Alted wrote:
> Hi Dirk,
>=20
> Thanks for including PyTables in Quantian. I hope such inclusion would be
> useful to people with a need to deal with large amounts of scientific dat=
a.
>=20
> By the way, for future references, the name of the package should be stat=
ed
> as PyTables (lower case writing should be fine as well), and not
> python-tables, which follows the debian name guidelines, but it is not the
> package name.
My bad -- sorry. I just corrected it in the changelog.wml file from which
the .html is created. It was copy&paste from there to the announcement.
Regards, Dirk
>=20
> Cheers,
>=20
> --=20
> Francesc Alted
>=20
> A Dimarts 15 Juny 2004 04:45, vareu escriure:
> >=20
> > [ This email is sent to those whose email addresses are in my quantian
> > mail folder due to prior emails, plus LWN and DWN who had run previo=
us
> > announcements, and as suggested, the openmosix-general, clusterknopp=
ix
> > and debian-knoppix lists. Anybody who considers this unwanted is kin=
dly
> > asked to send me a private mail and I will immediately remove the=20
> > corresponding alias entry. --edd ]
> >=20
> >=20
> > Announcing Quantian release 0.5.9.1
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >=20
> >=20
> > I What is it?
> >=20
> > Quantian is a remastering of Knoppix, the self-configuring and dire=
ctly
> > bootable cdrom that turns any pc or laptop into a full-featured Lin=
ux
> > workstation, and clusterKnoppix, which adds support for openMosix.=
=20
> > However, Quantian differs from (cluster)Knoppix by adding a large s=
et=20
> > of programs of interest to applied or theoretical workers in=20
> > quantitative or data-driven fields.=20
> >=20
> > See http://dirk.eddelbuettel.com/quantian.html for more details.
> >=20
> > =20
> > II What is new?
> >=20
> > o First release based on Knoppix 3.4 with many changes such as impr=
oved
> > hardware detection and support for new hardware, the addition of
> > captive-ntfs which may allow write support for ntfs partitions, K=
DE
> > 3.2.2 with kdevelop and more;
> >=20
> > o Based on Knoppix 3.4, the clusterKnoppix release from May 10 adds
> > kernel 2.4.26 with the 'testing status' openMosix patch as well as
> > a non-openMosix kernel 2.6.6; updated openMosix goodies gomd, chp=
ox,
> > tyd; support for atheros wireless, cisco mpi 350 wireless, prism5=
4,=20
> > host-ap
> >=20
> > o New Quantian software includes many new R / CRAN packages such as
> > gregmisc, sm, quadprog and the entire snow suite: snow, rmpi, rpv=
m,
> > rsprng for 'Simple Network of Workstations' distributed computing=
=20
> > using either one of pvm or mpi; the Axiom computer-algebra system;
> > python-tables and cernlib which pulls in a large number of packag=
es
> > from the CERN particle physics lab
> >=20
> > o 'kitchen sink' size -- this release comes in at 1.1gb and will /n=
ot/
> > fit onto a cdrom. It runs really well from disk, see the 'lilo ho=
wto'
> > http://dirk.eddelbuettel.com/quantian/howto_liloboot.html for det=
ails
> > on how to boot the iso from hard disks. Burning a dvd with the l=
arger
> > iso image should also work.
> >=20
> > o Last but not least, custom artwork thanks to a contributed backgr=
ound
> > image provided by Ed Pegg Jr.
> >=20
> > o Mailing lists for Quantian up and running
> >=20
> > Following the 0.4.9.3 release, a Quantian project was opened on
> > alioth.debian.org. So far the only use of the alioth infrastruct=
ure
> > has been the creation of two mailing lists
> >=20
> > quantian-announce for announcements, intended to be low volume
> > quantian-general for general discussions about Quantian
> >=20
> > Please go to =20
> >=20
> > http://alioth.debian.org/mail/?group_id=3D1425
> >=20
> > for subscription info etc., and start using the quantian-general =
lists
> > for general questions, comments, suggestions or discussions about=
=20
> > Quantian.
> >=20
> > Quantian-general is subscribed to quantian-announce, so you only =
need
> > to subscribe to one (but can of course subscribe to both).
> >=20
> > o See http://dirk.eddelbuettel.com/quantian/changelog.html for deta=
ils.
> >=20
> >=20
> > III Where do I get it?
> >=20
> > Downloads are available from the two main hosts both of which also
> > provide rsync:
> >=20
> > o U of Washington: =20
> > - http://www.analytics.washington.edu/downloads/quantian
> > - rsync://www.analytics.washington.edu::quantian
> >=20
> > o European mirrors, bittorrent site and cdrom vendors will hopefully
> > catch up over the next few days. See=20
> >=20
> > http://dirk.eddelbuettel.com/quantian.html
> > =20
> > for download info.
> >=20
> >=20
> > IV Known Bugs
> >=20
> > o None right now -- so please test away!=20
> >=20
> >=20
> > V Other items
> >=20
> > o Mailing lists have been created, see above.
> >=20
> > o Feedback / poll on package additions or removal
> >=20
> > As always, I welcome comments and suggestions about programs to be
> > added or removed. Existing Debian packages get pushed to the fron=
t of
> > the line.
> >=20
> > Please send feedback, questions, comments, ... to the=20
> > =20
> > qua...@li...
> >=20
> > list to maximise the number of eyes glancing at any one question.
> >=20
> > Best regards, Dirk
> >=20
>=20
>=20
>=20
=2D-=20
=46EATURE: VW Beetle license plate in California
=2D------------------------------------------------------
=2D--------- Missatge transmes ----------
Subject: Fwd: New Quantian release 0.5.9.1 available
Date: Dimarts 15 Juny 2004 18:07
=46rom: Francesc Alted <fa...@py...>
To: ge...@ca...
Acaben d'incloure pytables en una distribuci=F3 de caire cient=EDfic basada=
en
knoppix. Estar=E0 be de cara a la galeria :)
=46ins ara,
=2D--------- Missatge transmes ----------
Subject: New Quantian release 0.5.9.1 available
Date: Dimarts 15 Juny 2004 04:45
=46rom: Dirk Eddelbuettel <ed...@de...>
To: qua...@li...
Cc: qua...@ed...
[ This email is sent to those whose email addresses are in my quantian
mail folder due to prior emails, plus LWN and DWN who had run previous
announcements, and as suggested, the openmosix-general, clusterknoppix
and debian-knoppix lists. Anybody who considers this unwanted is kindly
asked to send me a private mail and I will immediately remove the=20
corresponding alias entry. --edd ]
Announcing Quantian release 0.5.9.1
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
I What is it?
Quantian is a remastering of Knoppix, the self-configuring and directly
bootable cdrom that turns any pc or laptop into a full-featured Linux
workstation, and clusterKnoppix, which adds support for openMosix.=20
However, Quantian differs from (cluster)Knoppix by adding a large set=20
of programs of interest to applied or theoretical workers in=20
quantitative or data-driven fields.=20
See http://dirk.eddelbuettel.com/quantian.html for more details.
=20
II What is new?
o First release based on Knoppix 3.4 with many changes such as improved
hardware detection and support for new hardware, the addition of
captive-ntfs which may allow write support for ntfs partitions, KDE
3.2.2 with kdevelop and more;
o Based on Knoppix 3.4, the clusterKnoppix release from May 10 adds
kernel 2.4.26 with the 'testing status' openMosix patch as well as
a non-openMosix kernel 2.6.6; updated openMosix goodies gomd, chpox,
tyd; support for atheros wireless, cisco mpi 350 wireless, prism54,=20
host-ap
o New Quantian software includes many new R / CRAN packages such as
gregmisc, sm, quadprog and the entire snow suite: snow, rmpi, rpvm,
rsprng for 'Simple Network of Workstations' distributed computing=20
using either one of pvm or mpi; the Axiom computer-algebra system;
python-tables and cernlib which pulls in a large number of packages
from the CERN particle physics lab
o 'kitchen sink' size -- this release comes in at 1.1gb and will /not/
fit onto a cdrom. It runs really well from disk, see the 'lilo howto'
http://dirk.eddelbuettel.com/quantian/howto_liloboot.html for details
on how to boot the iso from hard disks. Burning a dvd with the larger
iso image should also work.
o Last but not least, custom artwork thanks to a contributed background
image provided by Ed Pegg Jr.
o Mailing lists for Quantian up and running
Following the 0.4.9.3 release, a Quantian project was opened on
alioth.debian.org. So far the only use of the alioth infrastructure
has been the creation of two mailing lists
quantian-announce for announcements, intended to be low volume
quantian-general for general discussions about Quantian
Please go to =20
http://alioth.debian.org/mail/?group_id=3D1425
for subscription info etc., and start using the quantian-general lists
for general questions, comments, suggestions or discussions about=20
Quantian.
Quantian-general is subscribed to quantian-announce, so you only need
to subscribe to one (but can of course subscribe to both).
o See http://dirk.eddelbuettel.com/quantian/changelog.html for details.
III Where do I get it?
Downloads are available from the two main hosts both of which also
provide rsync:
o U of Washington: =20
- http://www.analytics.washington.edu/downloads/quantian
- rsync://www.analytics.washington.edu::quantian
o European mirrors, bittorrent site and cdrom vendors will hopefully
catch up over the next few days. See=20
http://dirk.eddelbuettel.com/quantian.html
=20
for download info.
IV Known Bugs
o None right now -- so please test away!=20
V Other items
o Mailing lists have been created, see above.
o Feedback / poll on package additions or removal
As always, I welcome comments and suggestions about programs to be
added or removed. Existing Debian packages get pushed to the front of
the line.
Please send feedback, questions, comments, ... to the=20
=20
qua...@li...
list to maximise the number of eyes glancing at any one question.
Best regards, Dirk
=2D-=20
=46EATURE: VW Beetle license plate in California
=2D------------------------------------------------------
=2D-=20
=46rancesc Alted
=2D------------------------------------------------------
=2D-=20
=46rancesc Alted
|
|
From: Francesc A. <fa...@py...> - 2004-03-04 11:28:00
|
I'm happy to announce the availability of PyTables 0.8.
PyTables is a hierarchical database package designed to efficiently
manage very large amounts of data. PyTables is built on top of the
HDF5 library and the numarray package. It features an object-oriented
interface that, combined with natural naming and C-code generated from
Pyrex sources, makes it a fast, yet extremely easy-to-use tool for
interactively saving and retrieving very large amounts of data. It also
provides flexible indexed access on disk to anywhere in the data.
PyTables is not designed to work as a relational database competitor,
but rather as a teammate. If you want to work with large datasets of
multidimensional data (for example, for multidimensional analysis), or
just provide a categorized structure for some portions of your cluttered
RDBS, then give PyTables a try. It works well for storing data from data
acquisition systems (DAS), simulation software, network data monitoring
systems (for example, traffic measurements of IP packets on routers),
working with very large XML files or as a centralized repository for
system logs, to name only a few possible uses.
=20
In this release you will find:
- Variable Length Arrays (VLA's) for saving a collection
of variable length of elements in each row of a dataset.
- Enlargeable Arrays (EA's) for enlarge homogeneous
datasets on disk.
- Powerful replication capabilities, ranging from single leaves
up to complete hierarchies.
- With the introduction of the UnImplemented class, greatly=20
improved HDF5 native file import capabilities.
- Two new useful utilities: ptdump & ptrepack.
- Improved documentation (with the help of Scott Prater).
- New record on data size achieved: 5.6 TB (~ 1000 DVD's!) in
one single file.
- Enhanced platform support. New platforms: MacOSX, FreeBSD,
Linux64, IRIX64 (yes, a clean 64-bit port is there) and
probably more.
- More test units (now exceeding 800).
- Many other minor improvements.
More in detail:
What's new
=2D----------
- The new VLArray class enables you to store large lists of rows=20
containing variable numbers of elements. The elements can=20
be scalars or fully multimensional objects, in the PyTables=20
tradition. This class supports two special objects as rows:=20
Unicode strings (UTF-8 codification is used internally) and=20
generic Python objects (through the use of cPickle).
- The new EArray class allows you to enlarge already existing
multidimensional homogeneous data objects. Consider it
an extension of the already existing Array class, but=20
with more functionality. Online compression or other filters=20
can be applied to EArray instances, for example.
Another nice feature of EA's is their support for fully
multidimensional data selection with extended slices. You can
write "earray[1,2:3,...,4:200]", for example, to get the
desired dataset slice from the disk. This is implemented using
the powerful selection capabilities of the HDF5 library, which
results in very highly efficient I/O operations. The same
functionality has been added to Array objects as well.
- New UnImplemented class. If a dataset contains unsupported
datatypes, it will be associated with an UnImplemented
instance, then inserted into to the object tree as usual.
This allows you to continue to work with supported objects
while retaining access to attributes of unsupported
datasets. This has changed from previous versions, where a
RuntimeError occurred when an unsupported object was
encountered.
The combination of the new UnImplemented class with the=20
support for new datatypes will enable PyTables to greatly=20
increase the number of types of native HDF5 files that can
be read and modified.
- Boolean support has been added for all the Leaf objects.
- The Table class has now an append() method that allows you
to save large buffers of data in one go (i.e. bypassing the
Row accessor). This can greatly improve data gathering
speed.
- The standard HDF5 shuffle filter (to further enhance the
compression level) is supported.
- The standard HDF5 fletcher32 checksum filter is supported.
- As the supported number of filters is growing (and may be
further increased in the future), a Filters() class has been
introduced to handle filters more easily. In order to add
support for this class, it was necessary to make a change in
the createTable() method that is not backwards compatible:
the "compress" and "complib" parameters are deprecated now
and the "filters" parameter should be used in their
place. You will be able to continue using the old parameters
(only a Deprecation warning will be issued) for the next few
releases, but you should migrate to the new version as soon
as possible. In general, you can easily migrate old code by
substituting code in its place:
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0table =3D fileh.createTable(g=
roup, 'table', Test, '',
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0complevel, complib)
=A0should be replaced by
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0table =3D fileh.createTable(g=
roup, 'table', Test, '',
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0Filters(complevel, compl=
ib))
- A copy() method that supports slicing and modification of
=A0=A0=A0=A0=A0=A0=A0=A0=A0filtering capabilities has been added for all t=
he Leaf
=A0=A0=A0=A0=A0=A0=A0=A0=A0objects. See the User's Manual for more informa=
tion.
- A couple of new methods, namely copyFile() and copyChilds(),
=A0=A0=A0=A0=A0=A0=A0=A0=A0have been added to File class, to permit easy r=
eplication
=A0=A0=A0=A0=A0=A0=A0=A0=A0of complete hierarchies or sub-hierarchies, eve=
n to
=A0=A0=A0=A0=A0=A0=A0=A0=A0other files. You can change filters during the =
copy
=A0=A0=A0=A0=A0=A0=A0=A0=A0process as well.
- Two new utilities has been added: ptdump and
=A0=A0=A0=A0=A0=A0=A0=A0=A0ptrepack. The utility ptdump allows the user to=
examine=20
the=A0contents of PyTables files (both metadata and actual
=A0=A0=A0=A0=A0=A0=A0=A0=A0data). The powerful ptrepack utility lets you=20
=A0=A0=A0=A0=A0=A0=A0=A0=A0selectively copy (portions of) hierarchies to s=
pecific
=A0=A0=A0=A0=A0=A0=A0=A0=A0locations in other files. It can be also used a=
s an
=A0=A0=A0=A0=A0=A0=A0=A0=A0importer for generic HDF5 files.
=A0=A0=A0=A0=A0=A0=A0- The meaning of the stop parameter in read() methods=
has
=A0=A0=A0=A0=A0=A0=A0=A0=A0changed. Now a value of 'None' means the last r=
ow, and a
=A0=A0=A0=A0=A0=A0=A0=A0=A0value of 0 (zero) means the first row. This is =
more
=A0=A0=A0=A0=A0=A0=A0=A0=A0consistent with the range() function in python =
and the
=A0=A0=A0=A0=A0=A0=A0=A0=A0__getitem__() special method in numarray.
- The method Table.removeRows() is no longer limited by table=20
size. You can now delete rows regardless of the size of the=20
table.
- The "numarray" value has been added to the flavor parameter
=A0=A0=A0=A0=A0=A0=A0=A0=A0in the Table.read() method for completeness.
- The attributes (.attr instance variable) are Python
=A0=A0=A0=A0=A0=A0=A0=A0=A0properties now. Access to their values is no lo=
nger
=A0=A0=A0=A0=A0=A0=A0=A0=A0lazy, i.e. you will be able to see both system =
or user
=A0=A0=A0=A0=A0=A0=A0=A0=A0attributes from the command line using the tab-=
completion
=A0=A0=A0=A0=A0=A0=A0=A0=A0capability of your python console (if enabled).
- Documentation has been greatly improved to explain all the
=A0=A0=A0=A0=A0=A0=A0=A0=A0new functionality. In particular, the internal =
format of
=A0=A0=A0=A0=A0=A0=A0=A0=A0PyTables is now fully described. You can now bu=
ild
=A0=A0=A0=A0=A0=A0=A0=A0=A0"native" PyTables files using any generic HDF5=
=A0software=20
by just duplicating their format.
- Many new tests have been added, not only to check new
=A0=A0=A0=A0=A0=A0=A0=A0=A0functionality but also to more stringently chec=
k=20
=A0=A0=A0=A0=A0=A0=A0=A0=A0existing functionality. There are more than 800=
different
=A0=A0=A0=A0=A0=A0=A0=A0=A0tests now (and the number is increasing :).
- PyTables has a new record in the data size that fits in one
single file: more than 5 TB (yeah, more than 5000 GB), that
accounts for 11 GB compressed, has been created on an AMD
Opteron machine running GNU/Linux-64 (the 64 bits version of
the Linux kernel). See the gory details in:
http://pytables.sf.net/html/StressTests.html.
- New platforms supported: PyTables has been compiled and tested
under GNU/Linux32 (Intel), GNU/Linux64 (AMD Opteron and
Alpha), Win32 (Intel), MacOSX (PowerPC), FreeBSD (Intel),
Solaris (6, 7, 8 and 9 with UltraSparc), IRIX64 (IRIX 6.5 with
R12000) and it probably works in many more architectures. In
particular, release 0.8 is the first one that provides a
relatively clean porting to 64-bit platforms.
- As always, some bugs have been solved (especially bugs that
=A0=A0=A0=A0=A0=A0=A0=A0=A0occur when deleting and/or overwriting attribut=
es).
- And last, but definitely not least, a new donations section
has been=A0added to the PyTables web site
(http://sourceforge.net/projects/pytables, then follow the
"Donations" tag). If you like PyTables and want this effort
to continue, please, donate!
What is a table?
=2D---------------
A table is defined as a collection of records whose values are stored in
fixed-length fields. All records have the same structure and all values
in each field have the same data type. =A0The terms "fixed-length" and
"strict data types" seem to be quite a strange requirement for a
language like Python that supports dynamic data types, but they serve a
useful function if the goal is to save very large quantities of data
(such as is generated by many scientific applications, for example) in
an efficient manner that reduces demand on CPU time and I/O resources.
What is HDF5?
=2D------------
=46or those people who know nothing about HDF5, it is a general purpose
library and file format for storing scientific data made at NCSA. HDF5
can store two primary objects: datasets and groups. A dataset is
essentially a multidimensional array of data elements, and a group is a
structure for organizing objects in an HDF5 file. Using these two basic
constructs, one can create and store almost any kind of scientific data
structure, such as images, arrays of vectors, and structured and
unstructured grids. You can also mix and match them in HDF5 files
according to your needs.
Platforms
=2D--------
I'm using Linux (Intel 32-bit) as the main development platform, but
PyTables should be easy to compile/install on many other UNIX
machines. This package has also passed all the tests on a UltraSparc
platform with Solaris 7 and Solaris 8. It also compiles and passes all
the tests on a SGI Origin2000 with MIPS R12000 processors, with the
MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit
platforms, like an AMD Opteron running SuSe Linux Enterprise Server. It
has also been tested in MacOSX platforms (10.2 but should also work on
newer versions).
Regarding Windows platforms, PyTables has been tested with Windows
2000 and Windows XP (using the Microsoft Visual C compiler), but it
should also work with other flavors as well.
An example?
=2D----------
=46or online code examples, have a look at
http://pytables.sourceforge.net/html/tut/tutorial1-1.html
and, for newly introduced Variable Length Arrays:
http://pytables.sourceforge.net/html/tut/vlarray2.html
Web site
=2D-------
Go to the PyTables web site for more details:
http://pytables.sourceforge.net/
Share your experience
=2D--------------------
Let me know of any bugs, suggestions, gripes, kudos, etc. you may
have.
Have fun!
=2D- Francesc Alted
fa...@py...
|
|
From: Francesc A. <fa...@py...> - 2004-03-03 13:20:36
|
Announcing PyTables 0.8
=2D----------------------
I'm happy to announce the availability of PyTables 0.8.
PyTables is a hierarchical database package designed to efficiently
manage very large amounts of data. PyTables is built on top of the
HDF5 library and the numarray package. It features an object-oriented
interface that, combined with natural naming and C-code generated from
Pyrex sources, makes it a fast, yet extremely easy-to-use tool for
interactively saving and retrieving very large amounts of data. It also
provides flexible indexed access on disk to anywhere in the data.
PyTables is not designed to work as a relational database competitor,
but rather as a teammate. If you want to work with large datasets of
multidimensional data (for example, for multidimensional analysis), or
just provide a categorized structure for some portions of your cluttered
RDBS, then give PyTables a try. It works well for storing data from data
acquisition systems (DAS), simulation software, network data monitoring
systems (for example, traffic measurements of IP packets on routers),
working with very large XML files or as a centralized repository for
system logs, to name only a few possible uses.
=20
In this release you will find:
- Variable Length Arrays (VLA's) for saving a collection
of variable length of elements in each row of an array.
- Extensible Arrays (EA's) for extending homogeneous
datasets on disk.
- Powerful replication capabilities, ranging from single leaves
up to complete hierarchies.
- With the introduction of the UnImplemented class, greatly=20
improved HDF5 native file import capabilities.
- Two new useful utilities: ptdump & ptrepack.
- Improved documentation (with the help of Scott Prater).
- New record on data size achieved: 5.6 TB (!) in one single
file.
- Enhanced platform support. New platforms: MacOSX, FreeBSD,
Linux64, IRIX64 (yes, a clean 64-bit port is there) and
probably more.
- More tests units (now exceeding 800).
- Many other minor improvements.
More in detail:
What's new
=2D----------
- The new VLArray class enables you to store large lists of rows=20
containing variable numbers of elements. The elements can=20
be scalars or fully multimensional objects, in the PyTables=20
tradition. This class supports two special objects as rows:=20
Unicode strings (UTF-8 codification is used internally) and=20
generic Python objects (through the use of cPickle).
- The new EArray class allows you to enlarge already existing
multidimensional homogeneous data objects. Consider it
an extension of the already existing Array class, but=20
with more functionality. Online compression or other filters=20
can be applied to EArray instances, for example.
Another nice feature of EA's is their support for fully
multidimensional data selection with extended slices. You
can write "earray[1,2:3,...,4:200]", for example, to get the
desired dataset slice from the disk. This is implemented
using the powerful selection capabilities of the HDF5
library, which results in very highly efficient I/O
operations. The same functionality has been added to Array
objects as well.
- New UnImplemented class. If a dataset contains unsupported
datatypes, it will be associated with an UnImplemented
instance, then inserted into to the object tree as usual.
This allows you to continue to work with supported objects
while retaining access to attributes of unsupported
datasets. This has changed from previous versions, where a
RuntimeError occurred when an unsupported object was
encountered.
The combination of the new UnImplemented class with the=20
support for new datatypes will enable PyTables to greatly=20
increase the number of types of native HDF5 files that can
be read and modified.
- Boolean support has been added for all the Leaf objects.
- The Table class has now an append() method that allows you
to save large buffers of data in one go (i.e. bypassing the
Row accessor). This can greatly improve data gathering
speed.
- The standard HDF5 shuffle filter (to further enhance the
compression level) is supported.
- The standard HDF5 fletcher32 checksum filter is supported.
- As the supported number of filters is growing (and may be
further increased in the future), a Filters() class has been
introduced to handle filters more easily. In order to add
support for this class, it was necessary to make a change in
the createTable() method that is not backwards compatible:
the "compress" and "complib" parameters are deprecated now
and the "filters" parameter should be used in their
place. You will be able to continue using the old parameters
(only a Deprecation warning will be issued) for the next few
releases, but you should migrate to the new version as soon
as possible. In general, you can easily migrate old code by
substituting code in its place:
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0table =3D fileh.createTable(g=
roup, 'table', Test, '',
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0complevel, complib)
=A0should be replaced by
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0table =3D fileh.createTable(g=
roup, 'table', Test, '',
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0Filters(complevel, compl=
ib))
- A copy() method that supports slicing and modification of
=A0=A0=A0=A0=A0=A0=A0=A0=A0filtering capabilities has been added for all t=
he Leaf
=A0=A0=A0=A0=A0=A0=A0=A0=A0objects. See the User's Manual for more informa=
tion.
- A couple of new methods, namely copyFile() and copyChilds(),
=A0=A0=A0=A0=A0=A0=A0=A0=A0have been added to File class, to permit easy r=
eplication
=A0=A0=A0=A0=A0=A0=A0=A0=A0of complete hierarchies or sub-hierarchies, eve=
n to
=A0=A0=A0=A0=A0=A0=A0=A0=A0other files. You can change filters during the =
copy
=A0=A0=A0=A0=A0=A0=A0=A0=A0process as well.
- Two new utilities has been added: ptdump and
=A0=A0=A0=A0=A0=A0=A0=A0=A0ptrepack. The utility ptdump allows the user to=
examine=20
the=A0contents of PyTables files (both metadata and actual
=A0=A0=A0=A0=A0=A0=A0=A0=A0data). The powerful ptrepack utility lets you=20
=A0=A0=A0=A0=A0=A0=A0=A0=A0selectively copy (portions of) hierarchies to s=
pecific
=A0=A0=A0=A0=A0=A0=A0=A0=A0locations in other files. It can be also used a=
s an
=A0=A0=A0=A0=A0=A0=A0=A0=A0importer for generic HDF5 files.
=A0=A0=A0=A0=A0=A0=A0- The meaning of the stop parameter in read() methods=
has
=A0=A0=A0=A0=A0=A0=A0=A0=A0changed. Now a value of 'None' means the last r=
ow, and a
=A0=A0=A0=A0=A0=A0=A0=A0=A0value of 0 (zero) means the first row. This is =
more
=A0=A0=A0=A0=A0=A0=A0=A0=A0consistent with the range() function in python =
and the
=A0=A0=A0=A0=A0=A0=A0=A0=A0__getitem__() special method in numarray.
- The method Table.removeRows() is no longer limited by table=20
size. You can now delete rows regardless of the size of the=20
table.
- The "numarray" value has been added to the flavor parameter
=A0=A0=A0=A0=A0=A0=A0=A0=A0in the Table.read() method for completeness.
- The attributes (.attr instance variable) are Python
=A0=A0=A0=A0=A0=A0=A0=A0=A0properties now. Access to their values is no lo=
nger
=A0=A0=A0=A0=A0=A0=A0=A0=A0lazy, i.e. you will be able to see both system =
or user
=A0=A0=A0=A0=A0=A0=A0=A0=A0attributes from the command line using the tab-=
completion
=A0=A0=A0=A0=A0=A0=A0=A0=A0capability of your python console (if enabled).
- Documentation has been greatly improved to explain all the
=A0=A0=A0=A0=A0=A0=A0=A0=A0new functionality. In particular, the internal =
format of
=A0=A0=A0=A0=A0=A0=A0=A0=A0PyTables is now fully described. You can now bu=
ild
=A0=A0=A0=A0=A0=A0=A0=A0=A0"native" PyTables files using any generic HDF5=
=A0software=20
by just duplicating their format.
- Many new tests have been added, not only to check new
=A0=A0=A0=A0=A0=A0=A0=A0=A0functionality but also to more stringently chec=
k=20
=A0=A0=A0=A0=A0=A0=A0=A0=A0existing functionality. There are more than 800=
different
=A0=A0=A0=A0=A0=A0=A0=A0=A0tests now (and the number is increasing :).
- PyTables has a new record in the data size that fits in one
single file: more than 5 TB (yeah, more than 5000 GB), that
accounts for 11 GB compressed, has been created on an AMD
Opteron machine running Linux-64 (the 64 bits version of the
Linux kernel). See the gory details in:
http://pytables.sf.net/html/HowFast.html.
- New platforms supported: PyTables has been compiled and tested
under Linux32 (Intel), Linux64 (AMD Opteron and Alpha), Win32
(Intel), MacOSX (PowerPC), FreeBSD (Intel), Solaris (6, 7, 8
and 9 with UltraSparc), IRIX64 (IRIX 6.5 with R12000) and it
probably works in many more architectures. In particular,
release 0.8 is the first one that provides a relatively clean
porting to 64-bit platforms.
- As always, some bugs have been solved (especially bugs that
=A0=A0=A0=A0=A0=A0=A0=A0=A0occur when deleting and/or overwriting attribut=
es).
- And last, but definitely not least, a new donations section
has been=A0added to the PyTables web site
(http://sourceforge.net/projects/pytables, then follow the
"Donations" tag). If you like PyTables and want this effort
to continue, please, donate!
What is a table?
=2D---------------
A table is defined as a collection of records whose values are stored
in fixed-length fields. All records have the same structure and all
values in each field have the same data type. =A0The terms
"fixed-length" and "strict data types" seem to be quite a strange
requirement for an language like Python that supports dynamic data
types, but they serve a useful function if the goal is to save very
large quantities of data (such as is generated by many scientific
applications, for example) in an efficient manner that reduces demand
on CPU time and I/O resources.
What is HDF5?
=2D------------
=46or those people who know nothing about HDF5, it is is a general
purpose library and file format for storing scientific data made at
NCSA. HDF5 can store two primary objects: datasets and groups. A
dataset is essentially a multidimensional array of data elements, and
a group is a structure for organizing objects in an HDF5 file. Using
these two basic constructs, one can create and store almost any kind of
scientific data structure, such as images, arrays of vectors, and
structured and unstructured grids. You can also mix and match them in
HDF5 files according to your needs.
Platforms
=2D--------
I'm using Linux (Intel 32-bit) as the main development platform, but
PyTables should be easy to compile/install on many other UNIX
machines. This package has also passed all the tests on a UltraSparc
platform with Solaris 7 and Solaris 8. It also compiles and passes all
the tests on a SGI Origin2000 with MIPS R12000 processors, with the
MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit
platforms, like an AMD Opteron running SuSe Linux Enterprise Server. It
has also been tested in MacOSX platforms (10.2 but should also work on
newer versions).
Regarding Windows platforms, PyTables has been tested with Windows
2000 and Windows XP (using the Microsoft Visual C compiler), but it
should also work with other flavors as well.
An example?
=2D----------
=46or online code examples, have a look at
http://pytables.sourceforge.net/html/tut/tutorial1-1.html
and, for newly introduced Variable Length Arrays:
http://pytables.sourceforge.net/html/tut/vlarray2.html
Web site
=2D-------
Go to the PyTables web site for more details:
http://pytables.sourceforge.net/
Share your experience
=2D--------------------
Let me know of any bugs, suggestions, gripes, kudos, etc. you may
have.
Have fun!
=2D- Francesc Alted
fa...@py...
|
|
From: Francesc A. <fa...@op...> - 2003-09-22 17:36:08
|
Announcing PyTables 0.7.2
-------------------------
As promised, here you have the latest and coolest pytables encarnation!.
On this release you will not find any exciting new features. It is
mainly a maintenance release where the next issues has been addressed:
- a memory leak was fixed
- memory consumption is being addressed and lowered
- much faster opening of files
- Some important index patterns cases in table reads has been
optimized
More in detail:
What's new
-----------
- Fixed a nasty memory leak located on the C libraries (it was
happening during HDF5 attribute writes). After that, the
memory consumption when using large object trees has dropped
quite a bit. However, there remains some small leaks that
has been tracked down to the underlying numarray
library. These leaks has been reported, and hopefully they
should be fixed more sooner than later.
- Table buffers are built dinamically now, so if Tables are
not accessed for reading or writing this memory will not be
booked. This will help to reduce the memory consumption.
- The opening of files with lots of nodes has been accelerated
between a factor 2 or 3. For example, a file with 10 groups
and 3000 tables that takes 9.3 seconds to open in 0.7.1, now
takes only 2.8 seconds.
- The Table.read() method has been refactored and optimized
and some parts of its code has been moved to Pyrex. In
particular, in the special case of step=1, up to a factor 5
of speedup (reaching 160 MB/s on a Pentium4 @ 2 GHz) when
reading table contents can be achieved now.
- Done some cosmetic changes in the user manual, but, as no
new features has been added, you won't need to read the
manual again :-)
Enjoy!,
--
Francesc Alted
|
|
From: Francesc A. <fa...@op...> - 2003-07-31 23:33:40
|
Hello everybody!,
After one week of intensive testing of the new version of PyTables, it is
finally here!:
Announcing PyTables 0.7
-----------------------
PyTables is a hierarchical database package designed to efficently
manage very large amounts of data. PyTables is built on top of the
HDF5 library and the numarray package and features an object-oriented
interface that, combined with C-code generated from Pyrex sources,
makes it a fast, yet extremely easy to use tool for interactively save
and retrieve large amounts of data.
Release 0.7 is the third public beta release. The version 0.6 was
internal and will never be released.
On this release you will find:
- new AttributeSet class
- 25% I/O speed improvement
- fully multidimensional table cells support
- new column descriptors
- row deletion in tables is finally here
- much more!
More in detail:
What's new
-----------
- A new AttributeSet class has been added. This will allow the
addition and deletion of generic attributes (any scalar type plus
any Python object supported by Pickle) as easy as this:
table.attrs.date = "2003/07/28 10:32" # Attach a string to table
group._v_attrs.tempShift = 1.2 # Attach a float to group
array.attrs.detectorList = [1,2,3,4] # Attach a list to array
del array.attrs.detectorList # Detach detectorList attr from array
- PyTables now has support for fully multidimensional table cells. This
has been made possible in part by implementation of multidimensional
cells in numarray.records.RecArray object. Thanks to numarray crew,
and especially to Jin-chung Hsu, for willingly accepting to do
that, and also for including some cache improvements in RecArray.
- New column descriptors added: IntCol, Int8Col, UInt8Col, Int16Col,
UInt16Col, Int32Col, UInt32Col, Int64Col, UInt64Col, FloatCol,
Float32Col, Float64Col and StringCol. I think they are more explicit
and easy-to-use than the now deprecated (but still supported)
Col() descriptor. All the examples and user's manual has been
accordingly updated.
- The new Table.removeRows(start, stop) function allows you to remove
rows from tables. This feature was requested a long time ago. There
are still limitations, however: you cannot delete rows in extremely
large Tables (as the remaining rows after the stop parameter
are stored in memory). Nor is the performance optimized. These issues
will hopefully be addressed in future releases.
- Added iterators to File, Group and Table (they now support the special
__iter__() method). They make the object much more user-friendly,
especially in interactive mode. See documentation for usage examples.
- Added a __getitem__() method to Table that works more or less like
read(), but with extended slices support.
- As a consequence of rewriting table iterators in C (with the help of
Pyrex, of course) the table read performance has been improved
between 20% and 30%. Data selections in PyTables are now starting to
beat powerful relational databases like SQLite, even compared to
in-core selects (!). I think there is still room for another 20% or
30% speed improvement, so stay tuned.
- A checksum is now added automatically when using LZO (not with UCL
where I'm having some difficulties implementing that
capability). The Adler32 algorithm has been chosen because of its
speed. With that, the compressing/decompressing speed has dropped 1%
or 2%, which is hardly noticeable. I think this addition will allow
the cautious user to be a bit more confident about this excellent
compressor. Code has been added to be able to read files created
without this checksum (so you can be confident that you will be able
to read your existing files compressed with LZO and UCL).
- Recursion has been removed from PyTables. Before, this made the
maximum depth tree to be less than the Python recursion limit (which
depends on implementation, but is around 900, at least in
Linux). Now, the limit has been set (somewhat arbitrarily) at
2048. Thanks to John Nielsen for implementing the new iterative
method!.
- A new rootUEP parameter to openFile() has been added. You can now
define the root from which you want to start to build the object tree.
Thanks to John Nielsen for the suggestion and a first implementation.
- A small bug fixed when dealing with non-native PyTables files that
prevented the use of the "classname" filter during a listNodes()
call. Thanks to Jeff Robbins for reporting that.
- Some (non-serious) bugs were discovered and fixed.
- Updated documentation to explain all these new bells and whistles. It
is also available on the web:
http://pytables.sourceforge.net/html-doc/usersguide-html.html
- Added more unit tests (more than 350 now!)
- PyTables 0.7 *needs* numarray 0.6 or higher and HDF-1.6.0 or higher
to compile and work. It has been tested with Python 2.2 and 2.3 and
should work fine on both versions.
What is a table?
----------------
A table is defined as a collection of records whose values are stored
in fixed-length fields. All records have the same structure and all
values in each field have the same data type. The terms
"fixed-length" and "strict data types" seems to be quite a strange
requirement for an language like Python, that supports dynamic data
types, but they serve a useful function if the goal is to save very
large quantities of data (such as is generated by many scientific
applications, for example) in an efficient manner that reduces demand
on CPU time and I/O resources.
What is HDF5?
-------------
For those people who know nothing about HDF5, it is is a general
purpose library and file format for storing scientific data made at
NCSA. HDF5 can store two primary objects: datasets and groups. A
dataset is essentially a multidimensional array of data elements, and
a group is a structure for organizing objects in an HDF5 file. Using
these two basic constructs, one can create and store almost any kind of
scientific data structure, such as images, arrays of vectors, and
structured and unstructured grids. You can also mix and match them in
HDF5 files according to your needs.
Platforms
---------
I'm using Linux as the main development platform, but PyTables should
be easy to compile/install on other UNIX machines. This package has
also passed all the tests on a UltraSparc platform with Solaris 7 and
Solaris 8. It also compiles and passes all the tests on a SGI
Origin2000 with MIPS R12000 processors and running IRIX 6.5.
Regarding Windows platforms, PyTables has been tested with Windows
2000 and Windows XP, but it should also work with other flavors.
An example?
-----------
For online code examples, have a look at
http://pytables.sourceforge.net/tut/tutorial1-1.html
and
http://pytables.sourceforge.net/tut/tutorial1-2.html
Web site
--------
Go to the PyTables web site for more details:
http://pytables.sourceforge.net/
Share your experience
---------------------
Let me know of any bugs, suggestions, gripes, kudos, etc. you may
have.
Have fun!
-- Francesc Alted
fa...@op...
|
|
From: Francesc A. <fa...@op...> - 2003-05-14 08:15:43
|
PyTables 0.5.1 -------------- This is a maintenance release of PyTables. Due to a problem with an optimization in PyTables 0.5, it does not work with numarray 0.4 (although it works just fine with numarray 0.5). Because Todd Miller has already warned me that this optimization is not safe, I am *disabling* it in this release. The consequence is that the 20% of improvement during reading tables has almost evaporated to a rather small 4%, but that's life!. If you already have installed PyTables 0.5, I strongly suggest you to upgrade to 0.5.1, even if you are already using numarray 0.5. I will try to further investigate the problem, and, if a good solution is found, I will re-activate the optimization in a future release. Another new thing you can find in 0.5.1 is that the use of "UInt64" data types has been removed (it has been replaced by the "Int64" type) of the tutorial chapters in User's Manual because the Windows version does not support such a type (due to MSVC compiler limitations). Now, the tutorial section should run fine in all the supported platforms (even Windows). My apologies for being a sinner and trying to optimize too soon ;-) Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! -- Francesc Alted |
|
From: Francesc A. <fa...@op...> - 2003-05-10 16:31:20
|
Announcing PyTables 0.5 ----------------------- This is the second public beta release. On this release you will find a 20% of I/O speed improvement over the previous one (0.4), some bugs has been fixed and support for a couple of compression (LZO and UCL) libraries has been added, and... a long awaited Windows version is finally available!. More in detail: What's new ----------- - As a consequence of some twiking the write/read performance has been improved by a 20% overall. One particular case were performance has largely increased (0.5 is up to 6 times faster than 0.4) is when column elements are unidimensional arrays. This impressive speed-up is mainly because of the recent improvements in numarray 0.5 performance (good work, folks!). With that, the reading speed is reaching its theoretical maximum (at least when using the current data access schema). - When reading a Table object, and the user wants to fetch column elements which are unidimensional arrays, a copy of the array from the I/O buffer is delivered automatically to him, so that there is no need to make a call to .copy() method of the numarray arrays anymore. It think this is more comfortable for the user. - The compression was enabled by default in version 0.4, despite of what was stated in the documentation. Now, this has been corrected and compression is *disabled* by default. - Support for two new compression libraries: LZO and UCL (http://www.oberhumer.com/opensource/). These libraries are made by Markus F.X.J. Oberhumer, and they stand for allowing *very* fast decompression. Now, if your data is compressible, you can obtain better reading speed than if not using compression at all!. The improvement is still more noticeable if your are dealing with extremely large (and compressible) data sets. Read the online documentation for more info about that: http://pytables.sourceforge.net/html-doc/usersguide-html3.html#subsection= 3.4.1 - A couple of memory leaks has been isolated and fixed (it was hard, but I finally did it!). - A bug with column ordering of tables that happens in some special situations has been fixed (thanks to Stan Heckman for reporting this and suggesting the patch). - File class has now an 'isopen' attribute in order to check if a file is open or not. - Updated documentation, specially for giving advice about the use of the new compression libraries. See "Compression issues" subsection, (also on the web: http://pytables.sourceforge.net/html-doc/usersguide-html.html) - Added more unit tests (up to 218 now!) - PyTables has been tested against newest numarray 0.5 and it works just fine. It even works well with Python 2.3b1. - And last, but not least, a Windows version is available!. Thanks to Alan McIntyre for its porting!. There is even a binary ready for click and install. What it is ---------- In short, PyTables provides a powerful and very Pythonic interface to process and organize your table and array data on disk. Its goal is to enable the end user to manipulate easily scientific data tables and Numerical and numarray Python objects in a persistent hierarchical structure. The foundation of the underlying hierarchical data organization is the excellent HDF5 library (http://hdf.ncsa.uiuc.edu/HDF5). A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and strict "data types" seems to be quite a strange requirement for an interpreted language like Python, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. Quite a bit effort has been invested to make browsing the hierarchical data structure a pleasant experience. PyTables implements just two (orthogonal) easy-to-use methods for browsing. What is HDF5? ------------- For those people who know nothing about HDF5, it is is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors and running IRIX 6.5. With Windows, PyTables has been tested with Windows 2000 Professional, Service Pack 1, and Windows XP, but it should also work with other flavors. An example? ----------- For online code examples, have a look at http://pytables.sourceforge.net/tut/tutorial1-1.html and=20 http://pytables.sourceforge.net/tut/tutorial1-2.html Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! -- Francesc Alted fa...@op... |
|
From: Francesc A. <fa...@op...> - 2003-03-19 12:32:48
|
Announcing PyTables 0.4 ----------------------- I'm happy to announce the first beta release of PyTables. It is labelled beta because it has been thoroughly tested, even in production environmen= ts, and it is getting fairly mature. Besides, from now on, the API will remain mostly stable, so you can start using PyTables now with the guarantee that your code will also work (well= , mostly ;-) in the future. The large amount of unit tests included in PyTables will also ensure the backward compatibility as well as the quali= ty of future releases will remain at least as good as it is now (although hopefully it should increase!). What's new ----------- - numarray objects (NumArray, CharArray and RecArray) supported - As a consequence of a large internal code redesign (numarray is at the core of PyTables now), performance has been improved by a factor of 10 (see "How Fast Is It?" section) - It consumes far less memory than previous version - Support for reading generic HDF5 files added (!) - Some bugs and memory leaks existing in 0.2 solved - Updated documentation - Added more unit tests (more than 200 now!) What it is ---------- In short, PyTables provides a powerful and very Pythonic interface to process table and array data. Its goal is to enable the end user to manipulate easily scientific data tables and Numerical and numarray Python objects in a persistent hierarchical structure. The foundation of the underlying hierarchical data organization is the excellent HDF5 library (http://hdf.ncsa.uiuc.edu/HDF5). A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and strict "data types" seems to be quite a strange requirement for an interpreted language like Python, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. Quite a bit effort has been invested to make browsing the hierarchical data structure a pleasant experience. PyTables implements just two (orthogonal) easy-to-use methods for browsing. What is HDF5? ------------- For those people who know nothing about HDF5, it is is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. How fast is it? --------------- PyTables can write table records between 20 and 30 times faster than cPickle and between 3 and 10 times faster than struct (it is a module present in the Standard Library); and retrieves information around 100 times faster than cPickle and between 8 and 10 times faster than struct. When compared with SQLite (http://www.sqlite.org/), one of the fastest (free) relational databases available, PyTables achieves between a 60% and 80% the speed of SQLite during selects of dataset sizes that fit in the O.S. filesystem memory cache. However, when those sizes does not fit in the cache (i.e. when dealing with large amounts of data), PyTables beats SQLite by a factor of 2 or even more (depending on the kind of record selected), and its performance in this case is only limited by the I/O speed of the disk subsystem. Go to http://pytables.sourceforge.net/doc/PyCon.html#section4 for a detailed description on the conducted benchmarks. Platforms --------- I'm using Linux as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors and running IRIX 6.5. If you are using Windows and you get the library to work, please let me know. An example? ----------- At the bottom of this message there is some code that shows basic capabilities of PyTables. You may also look at http://pytables.sourceforge.net/tut/tutorial1-1.html and http://pytables.sourceforge.net/tut/tutorial1-2.html for online code. Web site -------- Go to the PyTables web site for downloading and more details: http://pytables.sf.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! -- Francesc Alted fa...@op... *-*-*-**-*-*-**-*-*-**-*-*- Small code example *-*-*-**-*-*-**-*-*-**-*-= *-* from tables import * class Particle(IsDescription): identity =3D Col("CharType", 22, " ", pos =3D 0) # character String idnumber =3D Col("Int16", 1, pos =3D 1) # short integer speed =3D Col("Float32", 1, pos =3D 1) # single-precision # Open a file in "w"rite mode fileh =3D openFile("objecttree.h5", mode =3D "w") # Get the HDF5 root group root =3D fileh.root # Create the groups: group1 =3D fileh.createGroup(root, "group1") group2 =3D fileh.createGroup(root, "group2") # Now, create a table in "group0" group array1 =3D fileh.createArray(root, "array1", ["string", "array"], "String array") # Create 2 new tables in group1 table1 =3D fileh.createTable(group1, "table1", Particle) table2 =3D fileh.createTable("/group2", "table2", Particle) # Create the last table in group2 array2 =3D fileh.createArray("/group1", "array2", [1,2,3,4]) # Now, fill the tables: for table in (table1, table2): # Get the record object associated with the table: row =3D table.row # Fill the table with 10 records for i in xrange(10): # First, assign the values to the Particle record row['identity'] =3D 'This is particle: %2d' % (i) row['idnumber'] =3D i row['speed'] =3D i * 2. # This injects the Record values row.append() # Flush the table buffers table.flush() # Select actual data from table. # on entries where TDCcount field is greater than 3 and pressure less tha= n 50 out =3D [ x['identity'] for x in table.iterrows() if x['idnumber'] > 3 and 4 < x['speed'] < 10 ] print out # Finally, close the file (this also will flush all the remaining buffers= !) fileh.close() _______________________________________________ SciPy-user mailing list Sci...@sc... http://www.scipy.net/mailman/listinfo/scipy-user ------------------------------------------------------- --=20 Francesc Alted |
|
From: Francesc A. <fa...@op...> - 2002-11-21 12:31:15
|
Ignore it! -- Francesc Alted PGP KeyID: 0x61C8C11F Scientific aplications developer Public PGP key available: http://www.openlc.org/falted_at_openlc.asc Key fingerprint = 1518 38FE 3A3D 8BE8 24A0 3E5B 1328 32CC 61C8 C11F |