WhiteBear Wiki

a set of foundation classes for database engines

Status: Beta

Brought to you by: silex6

Runtime_Implemented

Runtime package

The runtime package implements a set of classes that represent the different SQL language construct - as statements, functions, expressions and queries

Result is the base class of all classes used in an expressions. There are ~30 different sub-classes of Result for SQL functions, operators and literals
Statement is the base class for SQL statements with prepare() and execute() methods. There is ~10 different sub-classes that implements SQL statements
There is also a set of classes to describe SQL queries - covering the FROM, JOIN and SELECT keywords. The runtime package will use this query description to generate the pipelines needed to execute the query

All classes implements a prepare() method to cover names and data type validation. Name resolution means to retrieve for example the data definition of a table whose name is used in a SQL statement. The classes also support to be loaded and saved in the database file.

The SQL parser will create instances of theses classes to represent parser's understanding of a SQL source code

The runtime package is in org.whitebear.runtime package and sub-packages. It is currently under development

Pipeline and filters

Query results are produced by a tree of filters connected together - a pipeline. There are filters to implements the different query operations - as select, join, grouping and aggregate.

Each filter is implemented as a state machine and provide the following features:

calculation algorithms. Filters typically implements more than one calculation algorithm. The filter will check which one can be used depending - for example - if there are indexes on the tables
cost calculation. On request, each filter will return the list of possible algorithms and the calculation cost (estimated number of I/O operation) to execute the query using the given algorithm
the nextRows() method on filter is called to retrieve the next records - typically a small subset of the rows to select
purchase order. A consumer filter can inform other filters in the pipeline on which data it wishes to receive. The source filter may then update it's internal status in order to ensure that the next nextRows() calls will return the purchased records

The PipelineBuilder class implement generation of a pipeline that can be used to execute a query. The builder will return all possible pipelines, considering that join operations are commutative and associatives. The pipeline will favor to filter table content as earlier as possible - to limit the number of I/O operation. For this purpose it will prepare the pipeline to apply predicates in the WHERE clause at the first step, if possible .

Once the pipelines are created, the builder will request cost calculation on all generated pipeline, and will then choose the cheapest one - that have the lowest execution cost.

The compiled code of a view is actually the output of the PipelineBuilder, serialized as XML and stored as a BLOB in the catalog. When the client run a CREATE VIEW statement, the SQL parser will generate objects to represent the query expression, then the runtime package will generate the pipelines needed to run the query expression and then the catalog features will be used to store the pipelines as view's object code.

Current implementation

The current pipeline implementation is 100% java code, and doesn't use any threading or messaging API - like mentioned in the specification. The specification describe a push mode pipeline - where source filters push messages to the destination. The actual implementation is a pull mode pipeline - where a destination filter call the nextRows() method to request messages from the source. The specification describe filters as micro-threads running a polling loop. The actual implementation of filters is a state machine - on nextRows() the filter will return the current record and move to the next one. The runtime specification also mention a set of types. These types are actually implemented by the data type framework