From: 鈴木 幸市 <ko...@in...> - 2014-02-14 00:48:56
|
This will be good to have. The issue looks something different. Sometimes canceling the statement with separate connection becomes effective after the call returns and subsequent statements are issued. This is why Masataka tried to add some (small) delay between the two. This delay is needed only at statement canceling. Any more comments on this? --- Koichi Suzuki 2014/02/14 2:30、Andrei Martsinchyk <and...@gm...<mailto:and...@gm...>> のメール: Hello, Postgres establishes separate connection to deliver Cancel command to the target session. On a heavily loaded node it may take fairly long. Longer sleep would help out, but it means longer recovery after an error. Better solution is to remove canceled connection from the pool and therefore do not use it to handle subsequent queries. 2014-02-13 11:10 GMT+02:00 Koichi Suzuki <koi...@gm...<mailto:koi...@gm...>>: I think it hits the point. I tested this patch several times and it seems to work fine. The delay time (at present 10ms) is short enough and it is applied only when we need to cancel a statement. We should check this into all the master and STABLE branches improving magic number with some meaningful name. Any thoughts? --- Koichi Suzuki 2014-01-24 18:25 GMT+09:00 Masataka Saito <pg...@gm...<mailto:pg...@gm...>>: > Hello, > > As I've been exasperated by random failures, I'm willing to whip the cause > of the issue. > > This issue is related to cancel of the failed query. > When a datanode reports an error of a query, a coordinator sends a cancel > request to non-idle nodes, waits the node to get ready and requests nodes to > rollback the transaction. > > Where's the problem? Consider the next case. > 1. Datanode A (PID 1) reports an error to coordinator A. ([1] 'E' message) > 2. Coordinator A receives [1] and reports an error to a frontend. ([2] 'E' > message) > 3. Coordinator A starts aborting process and it thinks datanode A (PID 1) is > not idle. > 4. Coordinator A sends a cancel request about PID 1 to datanode A (PID 2). > ([3] cancel message) > 5. Datanode A (PID 1) reports ready to coordinator A. ([4] 'Z' message) > 6. Coordinator A receives [4] and sends "ROLLBACK TRANSACTION" immediately. > ([5] 'Q' message) > 7. Datanode A (PID 1) receives [5] and starts processing the query. > 8. Datanode A (PID 2) receives [3]. > 9. Datanode A (PID 2) notify PID 1 of [3]. > 10. Datanode A (PID 1) cancel processing [5] and reports an error to > Coordinator A. ([6] 'E' message) > 11. Coordinator A receives [6] and reports an error to a frontend. ([7] 'E' > message) > > [7] makes unexpected output and a test fails. > > Saying an extreme thing, it could occur that the next query of [5] is > cancelled by [3]. > > As far as I know, there's no way to know when to the cancel request get to > be processed, I think we can't not wait an experimental duration after > cancelling like the attached patch. > > Does anyone have another cool idea to solve this issue? > > Regards. > > ------------------------------------------------------------------------------ > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li...<mailto:Pos...@li...> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > ------------------------------------------------------------------------------ Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk _______________________________________________ Postgres-xc-developers mailing list Pos...@li...<mailto:Pos...@li...> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- Andrei Martsinchyk StormDB - http://www.stormdb.com<http://www.stormdb.com/> The Database Cloud ------------------------------------------------------------------------------ Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk_______________________________________________ Postgres-xc-developers mailing list Pos...@li... https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |