From: Matt W. <MW...@XI...> - 2013-08-29 20:31:31
|
Short version: it looks like there's a problem with ExecNestLoop and ExecProcNode recursively calling each other and getting stuck in a loop that never completes. Details: I've been experimenting with XC using fairly large tables. In this case it's 4 tables, 2 replicated, 2 distributed by hash. The select statement is a mere 31 lines long and contains a group by on 2 columns of one of the tables. The query never completes, even days later. I'm using dtrace and a "git clone" version of XC from a few days ago compiled with the debug flag (-g). I see that ExecNestLoop and ExecProcNode appear to be calling each other heavily, as in thousands of times per second. That is, I am seeing stack traces where one calls the other, but also vice versa. In researching further, I see a note in execRemote.c that seems to indicate that recursively calling ExecProcNode is happening by design: /* * The current implementation of DMLs with RETURNING when run on replicated * tables returns row from one of the datanodes. In order to achieve this * ExecProcNode is repeatedly called saving one tuple and rejecting the rest. * Do we have a DML on replicated table with RETURNING? */ I don't know about the accuracy of the debugging and certainly I'm out of my element when poring through the XC source code, so my guess as to the source of the problem should be questioned. What additional debugging information can I provide to assist with the correct identification and debugging of this problem? Regards, Matt NOTICE OF CONFIDENTIALITY - This material is intended for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable laws. BE FURTHER ADVISED THAT THIS EMAIL MAY CONTAIN PROTECTED HEALTH INFORMATION (PHI). BY ACCEPTING THIS MESSAGE, YOU ACKNOWLEDGE THE FOREGOING, AND AGREE AS FOLLOWS: YOU AGREE TO NOT DISCLOSE TO ANY THIRD PARTY ANY PHI CONTAINED HEREIN, EXCEPT AS EXPRESSLY PERMITTED AND ONLY TO THE EXTENT NECESSARY TO PERFORM YOUR OBLIGATIONS RELATING TO THE RECEIPT OF THIS MESSAGE. If the reader of this email (and attachments) is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. Please notify the sender of the error and delete the e-mail you received. Thank you. |
From: Abbas B. <abb...@en...> - 2013-08-30 07:36:55
|
Can you share the query and table structure that you are using to perform the test? You can obscure the column/table names if they are part of some proprietary application. The comment you mentioned in execRemote.c is relevant for DMLs only but you said that your's is a SELECT, isn't it? On Fri, Aug 30, 2013 at 1:30 AM, Matt Warner <MW...@xi...> wrote: > Short version: it looks like there’s a problem with ExecNestLoop and > ExecProcNode recursively calling each other and getting stuck in a loop > that never completes.**** > > ** ** > > Details:**** > > I’ve been experimenting with XC using fairly large tables. In this case > it’s 4 tables, 2 replicated, 2 distributed by hash. The select statement is > a mere 31 lines long and contains a group by on 2 columns of one of the > tables. The query never completes, even days later.**** > > ** ** > > I’m using dtrace and a “git clone” version of XC from a few days ago > compiled with the debug flag (-g). I see that ExecNestLoop and ExecProcNode > appear to be calling each other heavily, as in thousands of times per > second. That is, I am seeing stack traces where one calls the other, but > also vice versa.**** > > ** ** > > In researching further, I see a note in execRemote.c that seems to > indicate that recursively calling ExecProcNode is happening by design:**** > > ** ** > > /***** > > * The current implementation of DMLs with RETURNING when run on > replicated**** > > * tables returns row from one of the datanodes. In order to > achieve this**** > > * * ExecProcNode is repeatedly called saving one tuple and > rejecting the rest.* > > * Do we have a DML on replicated table with RETURNING?**** > > */**** > > ** ** > > I don’t know about the accuracy of the debugging and certainly I’m out of > my element when poring through the XC source code, so my guess as to the > source of the problem should be questioned.**** > > ** ** > > What additional debugging information can I provide to assist with the > correct identification and debugging of this problem?**** > > ** ** > > Regards,**** > > ** ** > > Matt**** > > ** ** > > NOTICE OF CONFIDENTIALITY - This material is intended for the use of the > individual or entity to which it is addressed, and may contain information > that is privileged, confidential and exempt from disclosure under > applicable laws. BE FURTHER ADVISED THAT THIS EMAIL MAY CONTAIN PROTECTED > HEALTH INFORMATION (PHI). BY ACCEPTING THIS MESSAGE, YOU ACKNOWLEDGE THE > FOREGOING, AND AGREE AS FOLLOWS: YOU AGREE TO NOT DISCLOSE TO ANY THIRD > PARTY ANY PHI CONTAINED HEREIN, EXCEPT AS EXPRESSLY PERMITTED AND ONLY TO > THE EXTENT NECESSARY TO PERFORM YOUR OBLIGATIONS RELATING TO THE RECEIPT OF > THIS MESSAGE. If the reader of this email (and attachments) is not the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. > Please notify the sender of the error and delete the e-mail you received. > Thank you.**** > > ** ** > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> * Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> |
From: Michael P. <mic...@gm...> - 2013-08-30 12:17:57
|
On Fri, Aug 30, 2013 at 4:36 PM, Abbas Butt <abb...@en...> wrote: > Can you share the query and table structure that you are using to perform > the test? You can obscure the column/table names if they are part of some > proprietary application. > The comment you mentioned in execRemote.c is relevant for DMLs only but you > said that your's is a SELECT, isn't it? Adding the output of EXPLAIN VERBOSE could also help to understand the plan your query is using. -- Michael |