From: Andreu A. A. <aa...@ar...> - 2005-05-23 13:37:18
|
Announcing CSTables 1.0b ------------------------- This is a beta release. What it is ---------- CSTables is a Python client-server database layer library that provides concurrency, client metadata cache, lock system, authentication and other features. CSTables is built on top of the HDF5 and PyTables libraries (that provide the data API and data disk management). The union of the forementioned libraries plus the Twisted Framework (in charge of the communication stuff) and NumArray (as a provider of containers for presenting and transporting data) outcomes in a complete client-server database with very interesting features. Features -------- Client-Server CSTables allows remote HDF5 files to be accessed from clients using the PyTables API. Some classes and new methods are added, for dealing with client-server execution and other features. The Twisted Framework has been used to implement the client-server communication. Client metadata cache CSTables uses internally a client cache for storing metadata of the remote file objects. What is metadata? *The attributes of the remote file objects (File, Group, Leaf, AttributeSet, etc). *The remote file tree hierarchy. Relations of ascendats to descendants are cached when they are used for the first time, allowing a smooth navigation in subsequent accesses. Client metadata cache refresh Attributes that are cached on the clients will be refreshed when they are changed in the server. A timestamp mechanism is provided to discard out of phase modifications, and guarantee that older modifications that arrive to the client after a newest modification will be discarded. The cache refresh is very useful for inter-client communication, because clients no need to poll the database for messages of other clients. Concurrency Multiple clients run against the same CSTables server. No transactional support is added. However the Lock System can be used to implement applications with transactional features. All remote operations are guaranteed to be atomic, including those, such as iterrows(), that can require several communications with the server. Authentication CSTables allows client authentication. The server can be configured to accept anonymous logins, or to allow only connections that are authenticated against its user database. Authentication is made through a challenge/authentication protocol with double hashed MD5 passwords. Remote file system management Methods to allow creating, opening, removing and listing files in the server files are supported. Lock System In multi-client applications with clients accessing to the same database, a mechanism should be provided to allow to execute portions of code preventing other clients from making changes to the desired objects while the modifications are not over. CSTables provides a lock mechanism, allowing applications to specifically put a lock in a node or in a full subtree. When a lock is set in a node and other client attempts to modify the node in an incompatible way, that client depending on a switch variable that can be set programatically either becomes blocked and remains blocked until the client that has put the lock releases it or raises a BlockedException exception. Full compatibility with PyTables API There are several ways to program with CSTables. One of these allows to wrap PyTables monolitic scripts without modifying a single line and execute them against a remote server where reside the files the script will expect to find. What is HDF5? ------------- For those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. What is PyTables? ----------------- PyTables is a hierarchical database package designed to efficiently manage very large amounts of data. PyTables is built on top of the HDF5 library and the numarray package. It features an object-oriented interface that, combined with C extensions for the peformance-critical parts of the code (generated using Pyrex), makes it a fast, yet extremely easy to use tool for interactively save and retrieve very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that data take much less space (between a factor 3 to 5, and more if the data is compressible) than other solutions, like for example, relational or object oriented databases. Platforms --------- Currently tested and cross-tested in Windows XP and Red Hat Linux 9.0. Web site -------- Go to the CSTables page for more details: http://www.cstables.org <http://www.cstables.org/> -- Andreu Alted |