Menu

Distributed_query_processing

Ashutosh Bapat
There is a newer version of this page. You can find it here.

Postgres-XC is capable of distributing or replicating user tables across the datanodes. The query processing engine takes advantage of this distribution or replication to improve query performance. For replicated tables, the operations like join, sorting, grouping etc. are delegated completely to the datanode. By directing different queries to different datanodes, where a relation is replicated, overall throughput can be improved. For distributed tables, these operations are distributed (delegated) across the datanodes in parallel. The delegation of operation is possible if the operations do not involve any unshippable expressions. An expression is unshippable, if it can not be evaluated on the datanode. Following sections describe, distributed processing of these operations.

Coordinator and datanode communicate using SQL + libpq interface, thus the delegation of operations happens by sending queries with corresponding clauses.

Since the discussion applies to tables as well as the results of the queries, we use the word relation, which comprises both the data from a table or result of a query.

Sorting


For a replicated relation, if the sort keys (the expressions by which the result is sorted) are shippable, the sorted result is obtained from one of the datanodes, where the relation is replicated. For distributed relations, if the sort keys are shippable, sorted results are obtained from the datanodes where the relation is distributed. These sorted runs are then merged at the coordinator to get the completely sorted results. In order to delegate sorting, ORDER BY clause with appropriate keys is added to the query being sent to the datanode.


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.