Subclustering GTM is required by some of community members for the following purpose:
GTM is only one central facility to maintain database integrity among the cluster and was considered not suitable to cluster itself into distributed fashion. This article proposes how to clusetr GTM itself with simple assumption.
We assume that following in the application above.
The last assumption is not critical but important to provide specific algorithm to maintain performance reasonably.
We divide Postgres-XC cluseter into more than one sub-clusters. Each of them consists of GTM and other Postgres-XC component.
Now we introduce new global entity called “G-GTM” (global gtm) on top of GTMs. G-GTM provides each GTM the range of GXID. Each GTM requests G-GTM for GXID range, provide GXID values to transactions withing this range. If the range runs out, then GTM asks next range to G-GTM to continue its operation.
Each GTM also maintain transaction status as well as snapshot within its sub-cluster, except for occasional external transaction as described below.
Each transaction should be aware which sub-cluster it should access. Within local sub-cluster, operation is the same as the current implementation. If the transaction should read or write data outside its sub-cluster, it should do the following:
In such way, we can maintain database integrity among sub-clusters. Because most of the transactions are local to its sub-cluster, it is just occasional when a transaction is involved by the data in external sub-clusters and we will not have serious performance penalty by this.
GTM should be extended so that it can receive some range of GXID and renew it. Coordinator should be extended too so that it recognizes what components are within the local subcluster and whichi is not, then connect to GTM for external subcluster to begin/end the transaction. Of course, in this case, implicit 2PC must be used.
Because each subcluster's transaction is controlled by its own GTM, there could be slight difference when commit is visible from subcluster to subcluster. Further analysis may be needed about this influence to applications.
The following capabilities are needed to implement this.
We need further study if GTM information in the other cluster should be provided by coordinator, gtm_proxy or gtm.
You will notice that there is not restriction in distributing and replicating tables among subclusters. You should be careful in designing transaction and tables so that most of the transactions are local to subcluster to maintain performance. DDL may need some extension to handle subclusters.
I believe this can be used as a basis of XC sub-clustering. Hope other community members are involved in this and provide more detailed study/design.
Conceptual diagram is shown below.
[[img src=Subcluster.jpg]]