From: Michael P. <mic...@gm...> - 2010-12-10 00:02:53
|
Hi all, I had a look at the code to try to understand how it could be able to fix EXECUTE DIRECT. It is going to be needed to implement some HA features, and I believe users would find this functionality useful if fixed. I am seeing two ways to fix it now. The first one could be to do the implementation out of Portal as we did before moving XC query execution there. This was the first implementation of EXECUTE DIRECT that was done. It looks to be an easy solution, but if we have a look long-term, it does not follow the will to move query execution inside portal. So here are the main lines I propose to fix that, with an implementation inside portal. First, since DDL synchronize commit, it is possible Coordinators to interact between themselves, so the query should be extended to: -- EXECUTE DIRECT ON (COORDINATOR num | NODE num, ...) query to be compared to what is in the current code: -- EXECUTE DIRECT ON (COORDINATOR | NODE num, ...) query Then, we have to modify query analyze in analyze.c. There is an API in the code called transformExecDirectStmt that transforms the query and changes its shape. In the analyze part, you have to check if the query is launched locally or not. If it is not local, change the node type to Remote Query to make it run in ExecRemoteQuery when launching it. If it is local, you have to parse the query with parse_query and then to analyze it with parse_analyze. After parsing and analyzing, change its node type to Query, to make it launch locally. The difficult part of this implementation does not seem to be the analyze and parsing part, it is in the planner. The question is: Should the query go through pgxc_planner or normal planner if it is local? Here is my proposal: pgxc_planner looks to be better but we have to put a flag (when analyzing) in the query to be sure to launch the query on the correct nodes when determining execution nodes in get_plan_nodes. Regards, ------ Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-10 15:35:05
|
On 12/9/10 7:02 PM, Michael Paquier wrote: > Hi all, > > I had a look at the code to try to understand how it could be able to > fix EXECUTE DIRECT. > It is going to be needed to implement some HA features, and I believe > users would find this functionality useful if fixed. > > I am seeing two ways to fix it now. > The first one could be to do the implementation out of Portal as we > did before moving XC query execution there. > This was the first implementation of EXECUTE DIRECT that was done. > It looks to be an easy solution, but if we have a look long-term, it > does not follow the will to move query execution inside portal. I agree- it may be ok to do it outside and grab some old code, but I think we should try and use the current code and make it work. > > So here are the main lines I propose to fix that, with an > implementation inside portal. > > First, since DDL synchronize commit, it is possible Coordinators to > interact between themselves, > so the query should be extended to: > -- EXECUTE DIRECT ON (COORDINATOR num | NODE num, ...) query > to be compared to what is in the current code: > -- EXECUTE DIRECT ON (COORDINATOR | NODE num, ...) query Sounds good. What about EXECUTE DIRECT ON ([COORDINATOR num[,num...]] [NODE num[,num...]]) query maybe it is useful to see on all nodes at once with a single command. BTW, in GridSQL we optionally include the source node number in the tuples returned. We should add something similar at some point (don't need this now though). Similarly, something like a NODE() function would be nice, to even be able to do SELECT *,NODE(). Are the coordinator numbers and node numbers are separate? That is, we can have both coordinator 1 and data node 1? > > Then, we have to modify query analyze in analyze.c. > There is an API in the code called transformExecDirectStmt that > transforms the query and changes its shape. > In the analyze part, you have to check if the query is launched > locally or not. > If it is not local, change the node type to Remote Query to make it > run in ExecRemoteQuery when launching it. > > If it is local, you have to parse the query with parse_query and then > to analyze it with parse_analyze. > After parsing and analyzing, change its node type to Query, to make it > launch locally. > > The difficult part of this implementation does not seem to be the > analyze and parsing part, it is in the planner. > The question is: > Should the query go through pgxc_planner or normal planner if it is local? > Here is my proposal: > pgxc_planner looks to be better but we have to put a flag (when > analyzing) in the query to be sure > to launch the query on the correct nodes when determining execution > nodes in get_plan_nodes. > Yeah, I think we could go either way, but we know that with EXECUTE DIRECT it will always be a single step, so I think it is OK to put it in pgxc_planner. It should be pretty straight-forward though, I think we just need to additionally set step->exec_nodes, which we know already from parsing. It may be that we need to extend this though to indicate to execute on specific Coordinators. Overall, I don't think that this should be difficult to get working again. Mason > Regards, > > ------ > Michael Paquier > http://michaelpq.users.sourceforge.net > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-13 01:14:31
|
> > > So here are the main lines I propose to fix that, with an implementation > inside portal. > > First, since DDL synchronize commit, it is possible Coordinators to > interact between themselves, > so the query should be extended to: > -- EXECUTE DIRECT ON (COORDINATOR num | NODE num, ...) query > to be compared to what is in the current code: > -- EXECUTE DIRECT ON (COORDINATOR | NODE num, ...) query > > Sounds good. What about > > EXECUTE DIRECT ON ([COORDINATOR num[,num...]] [NODE num[,num...]]) query > > maybe it is useful to see on all nodes at once with a single command. > EXECUTE DIRECT ON COORDINATOR * query; may also be possible. This way of manipulating multiple node numbers at the same time or even include all the nodes at the same time is already included in gram.y. CLEAN CONNECTION also uses it. > > BTW, in GridSQL we optionally include the source node number in the tuples > returned. We should add something similar at some point (don't need this now > though). Similarly, something like a NODE() function would be nice, to even > be able to do SELECT *,NODE(). > > Are the coordinator numbers and node numbers are separate? That is, we can > have both coordinator 1 and data node 1? > We can have a Coordinator 1 and a Datanode 1. With the registration features that will be added soon, nodes are differenced with their types and their Ids. > Then, we have to modify query analyze in analyze.c. > There is an API in the code called transformExecDirectStmt that transforms > the query and changes its shape. > In the analyze part, you have to check if the query is launched locally or > not. > If it is not local, change the node type to Remote Query to make it run in > ExecRemoteQuery when launching it. > > If it is local, you have to parse the query with parse_query and then to > analyze it with parse_analyze. > After parsing and analyzing, change its node type to Query, to make it > launch locally. > > The difficult part of this implementation does not seem to be the analyze > and parsing part, it is in the planner. > The question is: > Should the query go through pgxc_planner or normal planner if it is local? > Here is my proposal: > pgxc_planner looks to be better but we have to put a flag (when analyzing) > in the query to be sure > to launch the query on the correct nodes when determining execution nodes > in get_plan_nodes. > > Yeah, I think we could go either way, but we know that with EXECUTE DIRECT > it will always be a single step, so I think it is OK to put it in > pgxc_planner. It should be pretty straight-forward though, I think we just > need to additionally set step->exec_nodes, which we know already from > parsing. It may be that we need to extend this though to indicate to > execute on specific Coordinators. > I agree. I had a look at the code and it should not be that complicated to fix finally. The only difficulty, if it is one, is to set correctly the execution node list when analyzing the data. It is also necessary to modify a little bit ExecRemoteQuery to be able to execute on single or multiple Coordinators (not the case yet). -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-15 06:37:49
Attachments:
executedirfix.patch
|
Please see attached a work-in-progress patch. Avoid to apply it on your code, because it doesn't work yet. I am sending it because I would need some feedback. The patch is decomposed in two parts. First the query is analyzed. In the case of an execute direct being launched on local Coordinator, the query is parsed and analyzed, then it is returned as a normal Query node. As this query is analyzed, it can go through the planner. For an execute direct on a remote node, the query is analyzed to get the command type for pgxc_planner. and the list of nodes is saved in a RemoteQuery node that is returned with Query result using utilityStmt. I tried to change pgxc planner to manage the particular case of EXECUTE DIRECT by keeping in planner the node list set in analyze, but it doesn't seem to be the right way of doing. I am not really an expert of this part of the code, so feedback would be appreciated, particularly on the following points: Is this patch using the correct logic in planner and analyze? Does the query really need to go through the planner? In this case, is setting Query as a CMD_UTILITY with a RemoteQuery node in utilityStmt is enough or not when analyzing? (the patch currently does NOT do it.) Are the Query fields set in analyse correct? Isn't there something missing in the planner that is not set? We rewrite the statement in Query at the end of pg_analyze_rewrite in postgres.c, but it is not the same query for EXECUTE DIRECT? Is this correct to change it directly in XC planner? -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-16 05:11:08
Attachments:
executedirfix2.patch
|
Please see attached a patch fixing EXECUTE DIRECT. It has been extended with Coordinators also, so the SQL synopsis becomes like this: EXECUTE DIRECT on { COORDINATOR num | NODE num[,num]} query; I put the following restrictions in this functionnality: 1) only SELECT queries can be used with EXECUTE DIRECT. This would be perhaps better to allow also queries such as COMMIT PREPARED and ABORT PREPARED. 2) it cannot be launched on multiple Coordinators at the same time. it is possible on multiple nodes though If a query is launched at the same time on local Coordinator and remote Coordinator, XC is not able to merge results well. There is still one bug. In the case of launching EXECUTE DIRECT on local coordinator with a query containing the name of a non-catalog table, this query is launched on nodes. I was looking for a fix in allpaths.c, where RemoteQuery paths are set, but a fix for that looks a little bit tricky. btw, it is not really important for the HA features in short term as EXECUTE DIRECT is planned to be used to have a look on catalog tables on remote Coordinators (and perhaps targeting nodes with COMMIT/ABORT PREPARED queries). Thanks, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-16 05:24:58
Attachments:
executedirfix2.patch
|
I corrected the comments a little bit. Please see latest version attached. -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-16 07:20:38
Attachments:
executedirfix3.patch
|
Hi all, I extended the patch so as to be able to launch utilities on targeted nodes (datanodes and Coordinators). EXECUTE DIRECT is still restricted for UPDATE and DELETE. And it is still not possible to launch a query on the local Coordinator without spreading it to the other nodes. With this patch, in the case of a 2PC transaction that is partially committed or partially aborted in the cluster, EXECUTE DIRECT can be used to target specific nodes where to send a COMMIT PREPARED or ABORT PREPARED. This is definitely useful for HA features and recovery also. Thanks, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-16 22:09:49
|
On 12/16/10 1:51 AM, Michael Paquier wrote: > Hi all, > > I extended the patch so as to be able to launch utilities on targeted > nodes (datanodes and Coordinators). > EXECUTE DIRECT is still restricted for UPDATE and DELETE. > And it is still not possible to launch a query on the local > Coordinator without spreading it to the other nodes. > > With this patch, in the case of a 2PC transaction that is partially > committed or partially aborted in the cluster, > EXECUTE DIRECT can be used to target specific nodes where to send a > COMMIT PREPARED or ABORT PREPARED. > > This is definitely useful for HA features and recovery also. > Michael, in pgxc_planner(), is that block of code for only when executing on a local coordinator? Could it be safely handled above the switch() statement? I mean, if it is EXECUTE DIRECT, we just want to pass down the SQL string and have it executed as is. I ran some brief tests. DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; count ------- 1269 (1 row) DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; count ------- 1332 (1 row) DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; count ------- 2601 (1 row) For this last one, I expected to see two rows. That is, it passes down the exact SQL string, then shows the results of each. It looks like it is hooking into our general planning. We don't want the aggregate managed on the coordinator (hmmm, although it may open up interesting ideas in the future...). Similarly, something is not quite right with group by: DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders group by o_status'; ERROR: unrecognized node type: 656 DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders group by o_status'; o_status | count ----------+------- | 1332 (1 row) Here, too, I think we should just get the results as if 'select o_status, count(*) from orders group by o_status' was executed on each node, all thrown together in the results (long term we could add an optional NODE column, like GridSQL). Perhaps this helps simplify things a bit. Thanks, Mason > Thanks, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Koichi S. <suz...@os...> - 2010-12-17 00:41:01
|
Hmm... I thought it will be reasonable enough just to allow SELECT (and COMMIT/ABORT) statement in EXECUTE DIRECT semantics. Also, because we've changed the infrastructure of aggregate functions, I agree it will not safe enough to run such functions just in the coordinator. We need an infrastructure as Benny pointed out: SELECT count(*) from A, A; Because EXECUTE DIRECT is just for housekeeping usage, I think it will also be reasonable to put some restriction which is sufficient for the dedicated use. In this case, because 2PC recovery does not need aggregate, I think we can have this as is. Regards; --- Koichi (2010年12月17日 07:09), Mason Sharp wrote: > On 12/16/10 1:51 AM, Michael Paquier wrote: >> Hi all, >> >> I extended the patch so as to be able to launch utilities on targeted >> nodes (datanodes and Coordinators). >> EXECUTE DIRECT is still restricted for UPDATE and DELETE. >> And it is still not possible to launch a query on the local >> Coordinator without spreading it to the other nodes. >> >> With this patch, in the case of a 2PC transaction that is partially >> committed or partially aborted in the cluster, >> EXECUTE DIRECT can be used to target specific nodes where to send a >> COMMIT PREPARED or ABORT PREPARED. >> >> This is definitely useful for HA features and recovery also. >> > > Michael, > > in pgxc_planner(), is that block of code for only when executing on a > local coordinator? Could it be safely handled above the switch() > statement? I mean, if it is EXECUTE DIRECT, we just want to pass down > the SQL string and have it executed as is. > > I ran some brief tests. > > DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; > count > ------- > 1269 > (1 row) > > DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; > count > ------- > 1332 > (1 row) > > DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; > count > ------- > 2601 > (1 row) > > > For this last one, I expected to see two rows. That is, it passes down > the exact SQL string, then shows the results of each. It looks like it > is hooking into our general planning. We don't want the aggregate > managed on the coordinator (hmmm, although it may open up interesting > ideas in the future...). > > Similarly, something is not quite right with group by: > > DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders > group by o_status'; > ERROR: unrecognized node type: 656 > > > DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders > group by o_status'; > o_status | count > ----------+------- > | 1332 > (1 row) > > Here, too, I think we should just get the results as if 'select > o_status, count(*) from orders group by o_status' was executed on each > node, all thrown together in the results (long term we could add an > optional NODE column, like GridSQL). > > Perhaps this helps simplify things a bit. > > Thanks, > > Mason > > >> Thanks, >> >> -- >> Michael Paquier >> http://michaelpq.users.sourceforge.net >> > > |
From: Mason S. <mas...@en...> - 2010-12-17 15:25:11
|
On 12/16/10 7:18 PM, Koichi Suzuki wrote: > Hmm... I thought it will be reasonable enough just to allow SELECT > (and COMMIT/ABORT) statement in EXECUTE DIRECT semantics. Also, > because we've changed the infrastructure of aggregate functions, I > agree it will not safe enough to run such functions just in the > coordinator. We need an infrastructure as Benny pointed out: > > SELECT count(*) from A, A; > > Because EXECUTE DIRECT is just for housekeeping usage, I think it will > also be reasonable to put some restriction which is sufficient for the > dedicated use. > > In this case, because 2PC recovery does not need aggregate, I think we > can have this as is. > Aggregate was an example. I am not sure, there may be other unexpected side effects, I just stumbled upon this when I started testing. I think we should really keep this simple and simply pass down the statement as is down to the nodes. That is intuitive. The results I see are kind of weird. It is not simply passing down the statement but somehow trying to parallelize it. I don't think that is what we want, and I am worried about unexpected results for other statements. I really think we should change this. Thanks, Mason > Regards; > --- > Koichi > > (2010年12月17日 07:09), Mason Sharp wrote: >> On 12/16/10 1:51 AM, Michael Paquier wrote: >>> Hi all, >>> >>> I extended the patch so as to be able to launch utilities on targeted >>> nodes (datanodes and Coordinators). >>> EXECUTE DIRECT is still restricted for UPDATE and DELETE. >>> And it is still not possible to launch a query on the local >>> Coordinator without spreading it to the other nodes. >>> >>> With this patch, in the case of a 2PC transaction that is partially >>> committed or partially aborted in the cluster, >>> EXECUTE DIRECT can be used to target specific nodes where to send a >>> COMMIT PREPARED or ABORT PREPARED. >>> >>> This is definitely useful for HA features and recovery also. >>> >> >> Michael, >> >> in pgxc_planner(), is that block of code for only when executing on a >> local coordinator? Could it be safely handled above the switch() >> statement? I mean, if it is EXECUTE DIRECT, we just want to pass down >> the SQL string and have it executed as is. >> >> I ran some brief tests. >> >> DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; >> count >> ------- >> 1269 >> (1 row) >> >> DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; >> count >> ------- >> 1332 >> (1 row) >> >> DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; >> count >> ------- >> 2601 >> (1 row) >> >> >> For this last one, I expected to see two rows. That is, it passes down >> the exact SQL string, then shows the results of each. It looks like it >> is hooking into our general planning. We don't want the aggregate >> managed on the coordinator (hmmm, although it may open up interesting >> ideas in the future...). >> >> Similarly, something is not quite right with group by: >> >> DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders >> group by o_status'; >> ERROR: unrecognized node type: 656 >> >> >> DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders >> group by o_status'; >> o_status | count >> ----------+------- >> | 1332 >> (1 row) >> >> Here, too, I think we should just get the results as if 'select >> o_status, count(*) from orders group by o_status' was executed on each >> node, all thrown together in the results (long term we could add an >> optional NODE column, like GridSQL). >> >> Perhaps this helps simplify things a bit. >> >> Thanks, >> >> Mason >> >> >>> Thanks, >>> >>> -- >>> Michael Paquier >>> http://michaelpq.users.sourceforge.net >>> >> >> > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |