From: Andrei M. <and...@gm...> - 2014-02-14 09:35:26
|
2014-02-14 9:04 GMT+02:00 Masataka Saito <pg...@gm...>: > Thank you for your clever suggestion. > > > - Make Cancel more selective and affect only specific query. That means > an ID for each query to introduce, that should be known to client and way > to deliver it. > > - Introduce procedure of changing backend key. Old cancel won't affect > such backend. > > I prefer the 2nd idea. But these ideas seem to require touching libpq > infrastructure and if I understand correctly, they are used not only > the inter node communication but also a coordinator and a frontend > communication. Unless we can separate them, I think better not to > change it. > > XC is already extending PG client-server protocol and use the extension in internode communications. The suggested feature do not have to be available to external client and therefore no need to be supported by libpq. > > - Before starting new query, check if there is pending cancel and remove > it. It sounds ridiculous "cancel cancel" but may work, if queries and > cancels are issued synchronously from single source. > > I'm afraid of the wrong hypothesis. As I suggested first, cancel and > subsequent request are not serialized at the target node. It means > that if the query started with no pending cancel, it could be > interrupted by cancel request. > > I am not sure how exactly Cancel request is handled. If server creates a session and sends back an acknowledgement before PGcancel returns it is synchronous enough. Node sends next command after the PGcancel returns, so the respective session either already placed the interrupt request or can be found in the Proc array. Either can be cleaned. If the Cancel is not synchronous enough, OK - just another bad idea, ignore it. > Regards. > > > On 14 February 2014 14:06, Andrei Martsinchyk > <and...@gm...> wrote: > > > > You are right, the temp objects are problem. > > On the one hand if we run a long query and there was an error on one > node we want to cancel it on others to avoid unnecessary waiting. On the > other hand the query may be near its natural end and the cancel may be late > and hit the next query. > > Just throwing out ideas: > > - Make Cancel more selective and affect only specific query. That means > an ID for each query to introduce, that should be known to client and way > to deliver it. > > - Introduce procedure of changing backend key. Old cancel won't affect > such backend. > > - Before starting new query, check if there is pending cancel and remove > it. It sounds ridiculous "cancel cancel" but may work, if queries and > cancels are issued synchronously from single source. > > > > 14.02.2014 4:07 пользователь "Koichi Suzuki" <koi...@gm...> > написал: > > > >> I misunderstand the implication. Anyway additional wait is separate > >> from your suggestion. > >> > >> Disconnecting the connection as you suggested will bring another > >> problem such as TEMPORARY object in the subsequent queries. We do > >> not support TEMPORARY object but I believe we should be consistent on > >> this for future releases. > >> > >> Thoughts? > >> --- > >> Koichi Suzuki > >> > >> > >> 2014-02-14 2:30 GMT+09:00 Andrei Martsinchyk < > and...@gm...>: > >> > Hello, > >> > > >> > Postgres establishes separate connection to deliver Cancel command to > the > >> > target session. > >> > On a heavily loaded node it may take fairly long. Longer sleep would > help > >> > out, but it means longer recovery after an error. > >> > Better solution is to remove canceled connection from the pool and > therefore > >> > do not use it to handle subsequent queries. > >> > > >> > > >> > > >> > 2014-02-13 11:10 GMT+02:00 Koichi Suzuki <koi...@gm...>: > >> >> > >> >> I think it hits the point. I tested this patch several times and it > >> >> seems to work fine. The delay time (at present 10ms) is short > enough > >> >> and it is applied only when we need to cancel a statement. > >> >> > >> >> We should check this into all the master and STABLE branches > improving > >> >> magic number with some meaningful name. > >> >> > >> >> Any thoughts? > >> >> --- > >> >> Koichi Suzuki > >> >> > >> >> > >> >> 2014-01-24 18:25 GMT+09:00 Masataka Saito <pg...@gm...>: > >> >> > Hello, > >> >> > > >> >> > As I've been exasperated by random failures, I'm willing to whip > the > >> >> > cause > >> >> > of the issue. > >> >> > > >> >> > This issue is related to cancel of the failed query. > >> >> > When a datanode reports an error of a query, a coordinator sends a > >> >> > cancel > >> >> > request to non-idle nodes, waits the node to get ready and requests > >> >> > nodes to > >> >> > rollback the transaction. > >> >> > > >> >> > Where's the problem? Consider the next case. > >> >> > 1. Datanode A (PID 1) reports an error to coordinator A. ([1] 'E' > >> >> > message) > >> >> > 2. Coordinator A receives [1] and reports an error to a frontend. > ([2] > >> >> > 'E' > >> >> > message) > >> >> > 3. Coordinator A starts aborting process and it thinks datanode A > (PID > >> >> > 1) is > >> >> > not idle. > >> >> > 4. Coordinator A sends a cancel request about PID 1 to datanode A > (PID > >> >> > 2). > >> >> > ([3] cancel message) > >> >> > 5. Datanode A (PID 1) reports ready to coordinator A. ([4] 'Z' > message) > >> >> > 6. Coordinator A receives [4] and sends "ROLLBACK TRANSACTION" > >> >> > immediately. > >> >> > ([5] 'Q' message) > >> >> > 7. Datanode A (PID 1) receives [5] and starts processing the query. > >> >> > 8. Datanode A (PID 2) receives [3]. > >> >> > 9. Datanode A (PID 2) notify PID 1 of [3]. > >> >> > 10. Datanode A (PID 1) cancel processing [5] and reports an error > to > >> >> > Coordinator A. ([6] 'E' message) > >> >> > 11. Coordinator A receives [6] and reports an error to a frontend. > ([7] > >> >> > 'E' > >> >> > message) > >> >> > > >> >> > [7] makes unexpected output and a test fails. > >> >> > > >> >> > Saying an extreme thing, it could occur that the next query of [5] > is > >> >> > cancelled by [3]. > >> >> > > >> >> > As far as I know, there's no way to know when to the cancel > request get > >> >> > to > >> >> > be processed, I think we can't not wait an experimental duration > after > >> >> > cancelling like the attached patch. > >> >> > > >> >> > Does anyone have another cool idea to solve this issue? > >> >> > > >> >> > Regards. > >> >> > > >> >> > > >> >> > > ------------------------------------------------------------------------------ > >> >> > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > >> >> > Learn Why More Businesses Are Choosing CenturyLink Cloud For > >> >> > Critical Workloads, Development Environments & Everything In > Between. > >> >> > Get a Quote or Start a Free Trial Today. > >> >> > > >> >> > > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > >> >> > _______________________________________________ > >> >> > Postgres-xc-developers mailing list > >> >> > Pos...@li... > >> >> > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> >> > > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ > >> >> Android apps run on BlackBerry 10 > >> >> Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > >> >> Now with support for Jelly Bean, Bluetooth, Mapview and more. > >> >> Get your Android app in front of a whole new audience. Start now. > >> >> > >> >> > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > >> >> > >> >> _______________________________________________ > >> >> Postgres-xc-developers mailing list > >> >> Pos...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > >> > > >> > > >> > > >> > -- > >> > Andrei Martsinchyk > >> > > >> > StormDB - http://www.stormdb.com > >> > The Database Cloud > >> > > > > > > > > ------------------------------------------------------------------------------ > > Android apps run on BlackBerry 10 > > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > > Now with support for Jelly Bean, Bluetooth, Mapview and more. > > Get your Android app in front of a whole new audience. Start now. > > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |