Menu

Union_filter

silex6

This filter should read records from 2 input connectors, and generate a result according to the relational algebra Union operator. This filter should generate a temporary index if needed. It will be used for the UNION and OR statement.

Algorithms

This filter should have 2 algorithms:

A

Algorithm A: "sorted merge union" is

recordA = indexA.getfirstkey()
recordB = indexB.getfirstkey()

loop while not (recordA.eof and recordB.eof)

    if recordA > recordB then
        output.putmessage(recordB)
        recordB = indexB.getnextkey()
    elseif recordA

this algorithm require both input record sets to be sorted. Result record set will be sorted. this algorithm allows streaming.

B

algorithm B: "differential scan" is

tmpIdx = new index()

recordA = inputA.getfirstrecord()

loop
    if recordA.eof then exit loop

    output.putmessage(recordA)
    tmpIdx.add(recordA)
    recordA = inputA.getnextrecord()

end loop

recordB = inputB.getfirstrecord()
loop
    if recordB.eof then exit loop

    if not tmpIdx.exists(recordB) then
        output.putmessage(recordB)
    end if

    recordB = inputB.getnextrecord()

end loop

This algorithm should be selected only if input record sets are not sorted. The temporary index should be created for the smallest record set, in order to minimize I/O operations.

Cost calculation

During cost calculation, the filter should wait on cost calculation results from both input streams, and add them and add unit cost of the filter to the total cost.

Unit cost using algorithm A is 0, as there is no I/O operations. Unit cost using algorithm B is: number of records in A * 4 / 3 + number of records in B * 4 / 3.

Expected number of records should be calculated as: number of records in A + number of records in B

See also

[Set_engine]


Related

Wiki: Filters
Wiki: Set_engine

MongoDB Logo MongoDB