Menu

Select_filter

silex6

The select filter should extract from a database table or view all records that match specified criteria. This filter should have no input connector. Filter will be used by the SELECT and WHERE statements.

The filter should do filtering and projection at the same time and extract specified fields of records that match the specified criteria. Results should be streamed to the output connector.

Algorithms

Filter should have 2 algorithms:

A

A. indexed algorithm: this algorithm could be used only on indexed fields and support '=', 'between', '>' and '

record = index.getfirstkey(criteria)
loop
    if not record.eof then
        if record.field(tosearch)  = criteria then
            output.put_message(record)
            record = index.getnext()
        else
            exit loop
        end if
    else
        exit loop
    end if
end loop
output.put_message(eof)

This algorithm will move to the first index entry matching criteria, and then loop reading next index entries until them no more match criteria. If criteria is '<' then algorithm should first move to the entry matching criteria and then loop reading all previous index entries in descending order, until EOF. If criteria is '>' or '=' then the algorithm should read index entries in ascending order.

When a seek message is received, the filter should move again to the index node specified in message.

B

B. Non-indexed algorithm: this algorithm does not require index. It support '=', '>','<' and '!='. Calculation using to this algorithm will be more expensive than A.

record = table.getfirstrecord()
loop while not record.eof
    if record.field(tosearch) = criteria then
         output.put_message(record)
    end if

    record = table.readnextrecord()
end loop
output.putmessage(eof)

This algorithm (table scan) will sequentially read all records and send to the output connector only those that match criteria.

Cost calculation

A

Algorithm A:

The estimated number of records is:
For '=' operator: cardinality of criteria according to statistical data. (= number of records / number of values)
For '>' and '<' operators: (number of records / number of values) * operation factor

If criteria are character based, then they should first be converted to a hash prior to calculations.

When there is a unique index, and criteria is '=' then the expected number of records is 1.

Operation factor is 1 for '=' operation.
Operation factor for '>' operator is: (max - min) / (max - value)
Operation factor for '<' operator is: (max - min) / (value - min)

Min and max can be extracted from statistical data.

Operation cost = expected number of records * 4 / 3

B

Algorithm B:

Operation cost = number of records in table.

If statistical data are not present (in most case), then the engine should randomly select 50 records, and extract the following data: min value, max value, number of values matching criteria. The expected number of records will then be calculated the same way as algorithm A.

Algorithm A will generate an ordered record set; records will be sorted in ascending or descending order depending on operator. Algorithm B will generate an unordered record set.


Related

Wiki: Filtering_filter
Wiki: Filters
Wiki: Set_engine

MongoDB Logo MongoDB