The select filter should extract from a database table or view all records that match specified criteria. This filter should have no input connector. Filter will be used by the SELECT and WHERE statements.
The filter should do filtering and projection at the same time and extract specified fields of records that match the specified criteria. Results should be streamed to the output connector.
Filter should have 2 algorithms:
A. indexed algorithm: this algorithm could be used only on indexed fields and support '=', 'between', '>' and '
record = index.getfirstkey(criteria)
loop
if not record.eof then
if record.field(tosearch) = criteria then
output.put_message(record)
record = index.getnext()
else
exit loop
end if
else
exit loop
end if
end loop
output.put_message(eof)
This algorithm will move to the first index entry matching criteria, and then loop reading next index entries until them no more match criteria. If criteria is '<' then algorithm should first move to the entry matching criteria and then loop reading all previous index entries in descending order, until EOF. If criteria is '>' or '=' then the algorithm should read index entries in ascending order.
When a seek message is received, the filter should move again to the index node specified in message.
B. Non-indexed algorithm: this algorithm does not require index. It support '=', '>','<' and '!='. Calculation using to this algorithm will be more expensive than A.
record = table.getfirstrecord()
loop while not record.eof
if record.field(tosearch) = criteria then
output.put_message(record)
end if
record = table.readnextrecord()
end loop
output.putmessage(eof)
This algorithm (table scan) will sequentially read all records and send to the output connector only those that match criteria.
Algorithm A:
The estimated number of records is:
For '=' operator: cardinality of criteria according to statistical data. (= number of records / number of values)
For '>' and '<' operators: (number of records / number of values) * operation factor
If criteria are character based, then they should first be converted to a hash prior to calculations.
When there is a unique index, and criteria is '=' then the expected number of records is 1.
Operation factor is 1 for '=' operation.
Operation factor for '>' operator is: (max - min) / (max - value)
Operation factor for '<' operator is: (max - min) / (value - min)
Min and max can be extracted from statistical data.
Operation cost = expected number of records * 4 / 3
Algorithm B:
Operation cost = number of records in table.
If statistical data are not present (in most case), then the engine should randomly select 50 records, and extract the following data: min value, max value, number of values matching criteria. The expected number of records will then be calculated the same way as algorithm A.
Algorithm A will generate an ordered record set; records will be sorted in ascending or descending order depending on operator. Algorithm B will generate an unordered record set.