WhiteBear Wiki

a set of foundation classes for database engines

Status: Beta

Brought to you by: silex6

Set_engine

The Pipeline engine is the one of the runtime components of the database engine. Goal of the pipeline engine is to execute DML operations expressed as SQL queries. The engine should automatically detect the fastest algorithms to execute a given query, and execute the query accordingly. Selected algorithm depends on physical structures added by the user and overall database content; scenario detection is a fully automated process. The pipeline engine implement a tree of state machines that exchange messages. Any record that has been calculated should be sent to the user as soon as possible even if query is still running.

Background
- Pipeline builder and filters
Relational algebra and SQL
Topics
Filters reference
- The filters
See also

Background

Pipeline engine component goal is to prepare and execute DML operations expressed as SQL queries. This component should use physical engine functions for low-level access to data, including reading and writing records, index seeks, and transaction management. The queries are first translated from SQL code to an internal representation made of a set of objects from the runtime framework.

When the database engine is used as an embedded component every thread of the host application can run its own copy of the pipeline engine. When running as a server, the server runtime should create a worker thread attached to every client connection and every worker thread runs its own copy of the pipeline engine.

The pipeline engine should have two sub-components: the filters set and the pipeline builder.

Set calculations should be done by a set of filters linked together into a pipe-and-filter architecture. Every filter is a specific set operator (filter, project, union, Cartesian product).

The filters can be implemented as state machines that do an operation, and store their current state on a private structure in order to continue operation on next call. Algorithms in this specification are based on continuation programming style, and more complicated algorithms have to be applied if filters are implemented as state machines.

If possible (in most case), calculation should be done as a streaming process: every single record that has been calculated should be sent to the client while filters continue to process subsequent records.

Pipeline builder and filters

The pipeline builder is the component that should create filters instances and link them together. The pipeline builder should try several ways to connect components together, and estimate cost of the calculation in terms of I/O calls. Cost calculation should be done using the filters. The builder should then choose the most cost-effective way, run set calculation that matches current SQL statement and collect results.

The pipeline builder should link filters together on a tree-like structure. Producer filters will be placed on top, and consumer filter will be placed on bottom, while pass filters will be placed in the middle. Join and Union filters are binary operators; they have two input connectors and one output. Filters and filtering predicate are compound objects constructed by the SQL runtime.

A general sequence of the filters in tree like - WHERE - JOIN - UNION - GROUP BY - HAVING - INSERT INTO – could be used by the pipeline as default calculation strategy.

Filters should use internal ids of the tables and columns. Matching between objects name of table / view / field and their internal ids should be done using the catalog component, during the preparation phase of runtime classes.

The pipeline should use expression related classes of the runtime framework in order to calculate result of expressions used in queries. The expression compound objects should first be built by the SQL parser according to SQL source text.

Relational algebra and SQL

The relational algebra theory defines 7 set operators. A set is a collection of elements, where every element have same list of attributes. Sets are called relations and are very similar to a record set that can be extracted from database table or view.

The relational algebra operators are:

Selection (also called filter). This operator creates a record set where every record matches the specified criteria.
Projection: this operator creates a record set where every record has the specified list of attributes.

The relational runtime should contain a filter component that calculates selection and projection at the same time. These operations are requested in SQL using the SELECT statement.

union: operator create a record set C from two record sets A and B, where C will contains records found in A or in B.
intersect: operator create a set C from two record sets A and B, where C will contains records found in both A and B.
difference: operator create a set C from two record sets A and B, where C will contains records found in A but not in B.
product: Cartesian product, create a set C from two records sets A and B, where records of C contains all possible ways to combine attributes of records from A and B.
divide: Cartesian divide, create a set C from two records sets A and B, where records of C are those that combined with B will result on A. this is the opposite of the Cartesian product.

The set engine should contain filter components that calculate union and difference. These operations are requested in SQL using the UNION and EXCEPT statements.

The intersect and product operations are used during SQL joins and can be requested in SQL using the JOIN and WHERE statements. The relational runtime should contain a component that calculates joins.

Cartesian divide does not need to be implemented as this operation can not be requested in SQL (no corresponding statement).

Topics

Filters reference

The pages below are a detailed reference of the filter algorithms that should be implemented, including cost calculation rules and general calculation algorithms. Most filters have several possible algorithms, and cost calculation is different for each algorithm.

The general algorithms explained are a shrink-wrapped form of the algorithms that should be implemented. For purpose of readability of the algorithms the following implementation details have been omitted:

Error handling and rules related to request messages.
There could be several predicate operators, as '=', '>', '=>' ... only one predicate is taken in account in the sample algorithm.
On filters that have 2 inputs, the A and B input stream can be swapped, and swapping could impact calculation cost.