From: Michael P. <mic...@gm...> - 2010-12-13 01:14:31
|
> > > So here are the main lines I propose to fix that, with an implementation > inside portal. > > First, since DDL synchronize commit, it is possible Coordinators to > interact between themselves, > so the query should be extended to: > -- EXECUTE DIRECT ON (COORDINATOR num | NODE num, ...) query > to be compared to what is in the current code: > -- EXECUTE DIRECT ON (COORDINATOR | NODE num, ...) query > > Sounds good. What about > > EXECUTE DIRECT ON ([COORDINATOR num[,num...]] [NODE num[,num...]]) query > > maybe it is useful to see on all nodes at once with a single command. > EXECUTE DIRECT ON COORDINATOR * query; may also be possible. This way of manipulating multiple node numbers at the same time or even include all the nodes at the same time is already included in gram.y. CLEAN CONNECTION also uses it. > > BTW, in GridSQL we optionally include the source node number in the tuples > returned. We should add something similar at some point (don't need this now > though). Similarly, something like a NODE() function would be nice, to even > be able to do SELECT *,NODE(). > > Are the coordinator numbers and node numbers are separate? That is, we can > have both coordinator 1 and data node 1? > We can have a Coordinator 1 and a Datanode 1. With the registration features that will be added soon, nodes are differenced with their types and their Ids. > Then, we have to modify query analyze in analyze.c. > There is an API in the code called transformExecDirectStmt that transforms > the query and changes its shape. > In the analyze part, you have to check if the query is launched locally or > not. > If it is not local, change the node type to Remote Query to make it run in > ExecRemoteQuery when launching it. > > If it is local, you have to parse the query with parse_query and then to > analyze it with parse_analyze. > After parsing and analyzing, change its node type to Query, to make it > launch locally. > > The difficult part of this implementation does not seem to be the analyze > and parsing part, it is in the planner. > The question is: > Should the query go through pgxc_planner or normal planner if it is local? > Here is my proposal: > pgxc_planner looks to be better but we have to put a flag (when analyzing) > in the query to be sure > to launch the query on the correct nodes when determining execution nodes > in get_plan_nodes. > > Yeah, I think we could go either way, but we know that with EXECUTE DIRECT > it will always be a single step, so I think it is OK to put it in > pgxc_planner. It should be pretty straight-forward though, I think we just > need to additionally set step->exec_nodes, which we know already from > parsing. It may be that we need to extend this though to indicate to > execute on specific Coordinators. > I agree. I had a look at the code and it should not be that complicated to fix finally. The only difficulty, if it is one, is to set correctly the execution node list when analyzing the data. It is also necessary to modify a little bit ExecRemoteQuery to be able to execute on single or multiple Coordinators (not the case yet). -- Michael Paquier http://michaelpq.users.sourceforge.net |