Welcome, Guest! Log In | Create Account

OGSA-DAI Distributed Query Processing

OGSA-DAI components are either data access components or data integration components. A Distributed Query Processing (DQP) system is an example of a data integration component that can potentially provide effective declarative support for service orchestration as well as data integration. OGSA-DAI's service-based DQP framework described in (1), (2) provides an approach that:

  • supports queries over OGSA-DAI data resources and over other services available on the Grid, thereby combining data access with analysis;
  • adapts techniques from parallel databases to provide implicit parallelism for complex data-intensive requests;

OGSA-DAI's service-based DQP framework consists of the following:

The OGSA-DAI DQP Coordinator
The OGSA-DAI DQP Coordinator, is the main interaction point for the clients. When a coordinator is set up, it obtains the metadata and computational resource information that it needs to compile, optimise, partition and schedule distributed query execution plans over multiple execution nodes in the Grid. The coordinator is currently implemented as a set of OGSA-DAI data resources and activities.
Query Evaluation Service (Evaluator)
The Query Evaluation Service (QES), or evaluator, is used by the coordinator to execute query plans generated by the query compiler, optimiser and scheduler. Each evaluator evaluates a partition of the query execution plan assigned to it by a coordinator. A set of evaluators participating in a query form a tree through which the data flows from leaf evaluators which interact with Grid data services, up the tree to reach its destination.

As well as using OGSA-DAI data resources, the coordinator is itself implemented as an OGSA-DAI data resource, and thus can be invoked in the same way as other OGSA-DAI data resources. Consequently, the Grid stands to benefit from OGSA-DAI's DQP functionality, through the provision of facilities for declarative request formulation that complement existing approaches to service orchestration, via uniform interfaces and interaction semantics.

Figure 1 provides an overview of the interactions during the instantiation and set-up of a OGSA-DQP coordinator as well as those that take place when a query is received and processed via a set of evaluators. The components in this figure and the numbered interactions between each component are now described. The 3-dot sequence in this figure can, as usual, be read as `and so on, up to'. This description of OGSA-DQP is intended to give a high level overview of the system. Setting up and executing queries using OGSA-DQP

Figure 1: Setting up and executing queries using OGSA-DAI DQP

  1. An OGSA-DAI DQP Coordinator consists of two types of OGSA-DAI data resources: DQP Factory Data Resources and DQP Data Resources. Initially, an installed coordinator service will expose only a factory data resource. This data resource is then used to create DQP Data Resources which can be used by a client to execute queries.

In this first step in the interaction between a client and OGSA-DAI DQP, the client uses a deployed DQP Factory Data Resource to create a configured DQP Data Resource. The client interacts with the DQP Factory Data Resource by sending an OGSA-DAI request invoking a DQPFactory activity. The DQPFactory activity interacts with the DQP Factory Data Resource in order to dynamically deploy a DQP Data Resource. The DQPFactory activity is parameterised by an XML document which specifies exactly how the deployed DQP Data Resource should be configured. Configuration parameters include the databases and evaluators which can be utilised by the data resource which is to be created. The result of this interaction is that a DQP Data Resource is created and initialised. The DQP Coordinator now exposes this dynamically deployed DQP Data Resource and it is automatically assigned a resource ID by OGSA-DAI.

  1. During the initialisation of the DQP Data Resource, the schemas of the databases it will use are imported by contacting the OGSA-DAI data resources which wrap these databases.
  1. The client receives the result of the request submitted in step 1. This result contains the resource ID needed by the client to identify the created DQP Data Resource in subsequent interactions with this data resource.

NB steps 1-3 need not take place if a DQP Data Resource already exists which imports the databases and analysis services required by a client (if this is the case, the client should contact the existing DQP Data Resource directly). Each DQP Data Resource is able to process multiple concurrent queries and the DQP Data Resource is not terminated by a client following a query session. Steps 1-3 represent a setup process which is necessary to configure a DQP Data Resource for use by one or more clients.

  1. The client submits a request containing a query. Queries are written in SQL (subset) and are executed by the DQPQueryStatement activity. The DQP Data Resource parses, optimises and schedules the query. A query plan is created, consisting of a number of partitions. Each partition specifies an individual evaluator's role in the query plan.
  1. Query partitions are sent to the relevant evaluator services.
  1. Some evaluators interact directly with OGSA-DAI data resources to obtain data.
  1. Other evaluators may interact with other evaluators to implement their role in the execution of the query.
  1. and
  2. Results propagate back from the evaluators to the coordinator and eventually back to the client.

NB OGSA-DAI's DQP services are also able to invoke Web services from within queries. This is not illustrated in Figure 1 in order to preserve the clarity of the figure and its associated description. Also omitted from the figure are the resource properties made available by the DQP Data Resource. Following initialisation, the DQP Data Resource provides a resource property enabling the client to obtain a description of the database schemas imported by DQP.

References

(1) M. N. Alpdemir, A. Mukherjee, N.W. Paton, P.Watson, A. A. Fernandes, A. Gounaris, and J. Smith. Service-based distributed querying on the grid. In the Proceedings of the First International Conference on Service Oriented Computing, pages 467-482. Springer, 15-18 December 2003.

(2) M.Nedim Alpdemir, Arijit Mukherjee, Norman W. Paton, Paul Watson, Alvaro A.A. Fernandes, Anastasios Gounaris, and Jim Smith. OGSA-DQP: A service-based distributed query processor for the Grid. In Simon J. Cox, editor, Proceedings of UK e-Science All Hands Meeting Nottingham. EPSRC, 24 September 2003.

Attachments