1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Ticket #478 (closed defect: fixed)

Opened 15 months ago

Last modified 15 months ago

Cluster does not map input solution(s) across shards

Reported by: thompsonbry Owned by: thompsonbry
Priority: major Milestone: Query
Component: Bigdata Federation Version: BIGDATA_RELEASE_1_1_0
Keywords: Cc:

Description (last modified by thompsonbry) (diff)

The cluster is not mapping the first access path across the data services when the first operator in the query is a sharded join.

Query evaluation normally begins by injecting an empty solution into a ChunkedRunningQuery#startQuery?(msg). However, due to an oversight, the initial solution(s) are not being mapped across the shards when the first operator is a sharded join. This results in the query controller using a global view of the index for the first access path, which means that the data flow through the query controller for that access path. Query still produces the correct solutions.

Change History

Changed 15 months ago by thompsonbry

  • status changed from new to accepted
  • description modified (diff)

QueryEngine#startEval?(...) has the following code:

        // notify query start
        runningQuery.startQuery(msg);
        
        // tell query to consume the initial chunk.
        acceptChunk(msg);

In fact, the problem is not with startQuery(msg) as that is just getting the RunState? of the query setup. (This does increment the #of available messages, which might in fact be a problem if we turn one message into map messages mapped across the cluster).

The problem is that the code directly invokes acceptChunk(msg) rather than mapping the initial chunk across the predicate for the next operator (assuming that the first operator in the query plan is a sharded join).

One way to handle this is to insert a CopyOp? as the first operator in the query plan on the cluster. This will ensure that the initial solution(s) are mapped because the output of the CopyOp? will be mapped. That would also get around a possible fence post in RunState#startQuery?().

Changed 15 months ago by thompsonbry

  • status changed from accepted to closed
  • resolution set to fixed

There was actually some disabled code to add a StartOp? to the front of a query plan on a cluster. I enabled the code and documented it with reference to this issue.

Committed revision r6002.

Note: See TracTickets for help on using tickets.