this is preliminary documentation and can be subject to change.
Each database files contains a set of 8kb pages stored in random order and linked together in a tree structure in order to form a set of block-oriented data stores. Every new page added to the file will be related to both a version and a transaction. Several versions of the same page can be stored into the database file. A background garbage collector thread do update permanent data and remove outdated versions of the pages.
The tree structure is intended to lower number of I/O operations needed to locate pages. Errors on the tree structure will be automatically discovered / fixed by the storage engine. Free space bitmaps are added every 16Mb boundaries, and allow unused pages in the database file to be reused.
Each block-oriented data store can hold a different kind of data. The engine allow to store table rows, objects, tree indices, bitmap indices, statistical data, views, and more. Data is stored as a set of tuples. Where every tuple is an arbitrary set of data fields. The engine allow each tuple in a collection to have its own set of fields, this allow non-relational data and objects to be stored into the database file.
Dense indices, bitmap indices and statistical data related to the values can be stored into the database file in form of some B+ tree structures stored into data collections.
ISAM indexed sequential access method is a principle, a library, and a file format first implemented by IBM in 1973. Whitebear storage engine provide functionalities of IBM's ISAM library, as well as ACID transaction management, on a completely different file format based on B-tree structures.
In order to lower physical I/O operations, every page read from the database file will be retained into memory until the following event occurs:
This cause the disk caching component to allocate large amount of memory in case of heavy system use. There is no built-in limit on the maximum amount of memory allocated for caching. The maximum amount of memory that will be used can be configured through Java virtual machine's -Xmx parameter. And some parameters of the disk cache can be used in order to tune how much time a page will be retained into memory.
data is written to the physical file in an asynchronous manner. Disk cache and physical file will be synchronized on transaction commit.
The database engine allows several versions of the same data to be stored into the database file. Every new transaction will cause a new version number to be generated.
During a transaction, when a data page is updated, a temporary shadow copy of the page will be created and stored into the transaction state.
A vacuum cleaner process check closed transactions, look for temporary copies and make the changes permanent - copy content of shadow copy back to the original page. The vacuum cleaner will also check rollback-ed transaction, remove unused shadow copies and reclaim free space.
version conflict may occurs when trying to commit a transaction that contains outdated shadow copies: if a concurrent transaction has already committed more recent version of the same pages.
the SERIALIZABLE transaction mode enforce transactions to be run in sequence order if needed to avoid version conflit. In this mode time-outs prevents a transaction to be frozen by the serialization process.
Temporary data are made of shadow copies stored into a transaction state that will never be committed.
API reference can be found in javadoc style documents included in folder /javadoc of the package.
Online API reference is available at <http://whitebear.sourceforge.net/javadoc>.