From: 鈴木 幸市 <ko...@in...> - 2014-02-17 07:11:06
|
2014/02/14 18:35、Andrei Martsinchyk <and...@gm...<mailto:and...@gm...>> のメール: 2014-02-14 9:04 GMT+02:00 Masataka Saito <pg...@gm...<mailto:pg...@gm...>>: Thank you for your clever suggestion. > - Make Cancel more selective and affect only specific query. That means an ID for each query to introduce, that should be known to client and way to deliver it. > - Introduce procedure of changing backend key. Old cancel won't affect such backend. I prefer the 2nd idea. But these ideas seem to require touching libpq infrastructure and if I understand correctly, they are used not only the inter node communication but also a coordinator and a frontend communication. Unless we can separate them, I think better not to change it. XC is already extending PG client-server protocol and use the extension in internode communications. The suggested feature do not have to be available to external client and therefore no need to be supported by libpq. > - Before starting new query, check if there is pending cancel and remove it. It sounds ridiculous "cancel cancel" but may work, if queries and cancels are issued synchronously from single source. I'm afraid of the wrong hypothesis. As I suggested first, cancel and subsequent request are not serialized at the target node. It means that if the query started with no pending cancel, it could be interrupted by cancel request. I am not sure how exactly Cancel request is handled. If server creates a session and sends back an acknowledgement before PGcancel returns it is synchronous enough. Node sends next command after the PGcancel returns, so the respective session either already placed the interrupt request or can be found in the Proc array. Either can be cleaned. If the Cancel is not synchronous enough, OK - just another bad idea, ignore it. Unfortunately, it does not happen. So far, cancel is not synchronous. It could be effective after the background receives the next statement. This is what Masataka’s patch is improving. Wait duration, at present 10millisecond, could be new GUC parameter. At least, this looks to work fine with our buildfarm. Regards; — Koichi Suzuki Regards. On 14 February 2014 14:06, Andrei Martsinchyk <and...@gm...<mailto:and...@gm...>> wrote: > > You are right, the temp objects are problem. > On the one hand if we run a long query and there was an error on one node we want to cancel it on others to avoid unnecessary waiting. On the other hand the query may be near its natural end and the cancel may be late and hit the next query. > Just throwing out ideas: > - Make Cancel more selective and affect only specific query. That means an ID for each query to introduce, that should be known to client and way to deliver it. > - Introduce procedure of changing backend key. Old cancel won't affect such backend. > - Before starting new query, check if there is pending cancel and remove it. It sounds ridiculous "cancel cancel" but may work, if queries and cancels are issued synchronously from single source. > > 14.02.2014 4:07 пользователь "Koichi Suzuki" <koi...@gm...<mailto:koi...@gm...>> написал: > >> I misunderstand the implication. Anyway additional wait is separate >> from your suggestion. >> >> Disconnecting the connection as you suggested will bring another >> problem such as TEMPORARY object in the subsequent queries. We do >> not support TEMPORARY object but I believe we should be consistent on >> this for future releases. >> >> Thoughts? >> --- >> Koichi Suzuki >> >> >> 2014-02-14 2:30 GMT+09:00 Andrei Martsinchyk <and...@gm...<mailto:and...@gm...>>: >> > Hello, >> > >> > Postgres establishes separate connection to deliver Cancel command to the >> > target session. >> > On a heavily loaded node it may take fairly long. Longer sleep would help >> > out, but it means longer recovery after an error. >> > Better solution is to remove canceled connection from the pool and therefore >> > do not use it to handle subsequent queries. >> > >> > >> > >> > 2014-02-13 11:10 GMT+02:00 Koichi Suzuki <koi...@gm...<mailto:koi...@gm...>>: >> >> >> >> I think it hits the point. I tested this patch several times and it >> >> seems to work fine. The delay time (at present 10ms) is short enough >> >> and it is applied only when we need to cancel a statement. >> >> >> >> We should check this into all the master and STABLE branches improving >> >> magic number with some meaningful name. >> >> >> >> Any thoughts? >> >> --- >> >> Koichi Suzuki >> >> >> >> >> >> 2014-01-24 18:25 GMT+09:00 Masataka Saito <pg...@gm...<mailto:pg...@gm...>>: >> >> > Hello, >> >> > >> >> > As I've been exasperated by random failures, I'm willing to whip the >> >> > cause >> >> > of the issue. >> >> > >> >> > This issue is related to cancel of the failed query. >> >> > When a datanode reports an error of a query, a coordinator sends a >> >> > cancel >> >> > request to non-idle nodes, waits the node to get ready and requests >> >> > nodes to >> >> > rollback the transaction. >> >> > >> >> > Where's the problem? Consider the next case. >> >> > 1. Datanode A (PID 1) reports an error to coordinator A. ([1] 'E' >> >> > message) >> >> > 2. Coordinator A receives [1] and reports an error to a frontend. ([2] >> >> > 'E' >> >> > message) >> >> > 3. Coordinator A starts aborting process and it thinks datanode A (PID >> >> > 1) is >> >> > not idle. >> >> > 4. Coordinator A sends a cancel request about PID 1 to datanode A (PID >> >> > 2). >> >> > ([3] cancel message) >> >> > 5. Datanode A (PID 1) reports ready to coordinator A. ([4] 'Z' message) >> >> > 6. Coordinator A receives [4] and sends "ROLLBACK TRANSACTION" >> >> > immediately. >> >> > ([5] 'Q' message) >> >> > 7. Datanode A (PID 1) receives [5] and starts processing the query. >> >> > 8. Datanode A (PID 2) receives [3]. >> >> > 9. Datanode A (PID 2) notify PID 1 of [3]. >> >> > 10. Datanode A (PID 1) cancel processing [5] and reports an error to >> >> > Coordinator A. ([6] 'E' message) >> >> > 11. Coordinator A receives [6] and reports an error to a frontend. ([7] >> >> > 'E' >> >> > message) >> >> > >> >> > [7] makes unexpected output and a test fails. >> >> > >> >> > Saying an extreme thing, it could occur that the next query of [5] is >> >> > cancelled by [3]. >> >> > >> >> > As far as I know, there's no way to know when to the cancel request get >> >> > to >> >> > be processed, I think we can't not wait an experimental duration after >> >> > cancelling like the attached patch. >> >> > >> >> > Does anyone have another cool idea to solve this issue? >> >> > >> >> > Regards. >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> >> > Learn Why More Businesses Are Choosing CenturyLink Cloud For >> >> > Critical Workloads, Development Environments & Everything In Between. >> >> > Get a Quote or Start a Free Trial Today. >> >> > >> >> > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> >> > _______________________________________________ >> >> > Postgres-xc-developers mailing list >> >> > Pos...@li...<mailto:Pos...@li...> >> >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Android apps run on BlackBerry 10 >> >> Introducing the new BlackBerry 10.2.1 Runtime for Android apps. >> >> Now with support for Jelly Bean, Bluetooth, Mapview and more. >> >> Get your Android app in front of a whole new audience. Start now. >> >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk >> >> >> >> _______________________________________________ >> >> Postgres-xc-developers mailing list >> >> Pos...@li...<mailto:Pos...@li...> >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > >> > >> > >> > -- >> > Andrei Martsinchyk >> > >> > StormDB - http://www.stormdb.com<http://www.stormdb.com/> >> > The Database Cloud >> > > > > ------------------------------------------------------------------------------ > Android apps run on BlackBerry 10 > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > Now with support for Jelly Bean, Bluetooth, Mapview and more. > Get your Android app in front of a whole new audience. Start now. > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li...<mailto:Pos...@li...> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- Andrei Martsinchyk StormDB - http://www.stormdb.com<http://www.stormdb.com/> The Database Cloud ------------------------------------------------------------------------------ Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk_______________________________________________ Postgres-xc-developers mailing list Pos...@li... https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |