From: Koichi S. <koi...@gm...> - 2014-02-14 05:21:18
|
It seems to be an issue of PG itself, doesn't it? --- Koichi Suzuki 2014-02-14 14:06 GMT+09:00 Andrei Martsinchyk <and...@gm...>: > You are right, the temp objects are problem. > On the one hand if we run a long query and there was an error on one node we > want to cancel it on others to avoid unnecessary waiting. On the other hand > the query may be near its natural end and the cancel may be late and hit the > next query. > Just throwing out ideas: > - Make Cancel more selective and affect only specific query. That means an > ID for each query to introduce, that should be known to client and way to > deliver it. > - Introduce procedure of changing backend key. Old cancel won't affect such > backend. > - Before starting new query, check if there is pending cancel and remove it. > It sounds ridiculous "cancel cancel" but may work, if queries and cancels > are issued synchronously from single source. > > 14.02.2014 4:07 пользователь "Koichi Suzuki" <koi...@gm...> > написал: > >> I misunderstand the implication. Anyway additional wait is separate >> from your suggestion. >> >> Disconnecting the connection as you suggested will bring another >> problem such as TEMPORARY object in the subsequent queries. We do >> not support TEMPORARY object but I believe we should be consistent on >> this for future releases. >> >> Thoughts? >> --- >> Koichi Suzuki >> >> >> 2014-02-14 2:30 GMT+09:00 Andrei Martsinchyk >> <and...@gm...>: >> > Hello, >> > >> > Postgres establishes separate connection to deliver Cancel command to >> > the >> > target session. >> > On a heavily loaded node it may take fairly long. Longer sleep would >> > help >> > out, but it means longer recovery after an error. >> > Better solution is to remove canceled connection from the pool and >> > therefore >> > do not use it to handle subsequent queries. >> > >> > >> > >> > 2014-02-13 11:10 GMT+02:00 Koichi Suzuki <koi...@gm...>: >> >> >> >> I think it hits the point. I tested this patch several times and it >> >> seems to work fine. The delay time (at present 10ms) is short enough >> >> and it is applied only when we need to cancel a statement. >> >> >> >> We should check this into all the master and STABLE branches improving >> >> magic number with some meaningful name. >> >> >> >> Any thoughts? >> >> --- >> >> Koichi Suzuki >> >> >> >> >> >> 2014-01-24 18:25 GMT+09:00 Masataka Saito <pg...@gm...>: >> >> > Hello, >> >> > >> >> > As I've been exasperated by random failures, I'm willing to whip the >> >> > cause >> >> > of the issue. >> >> > >> >> > This issue is related to cancel of the failed query. >> >> > When a datanode reports an error of a query, a coordinator sends a >> >> > cancel >> >> > request to non-idle nodes, waits the node to get ready and requests >> >> > nodes to >> >> > rollback the transaction. >> >> > >> >> > Where's the problem? Consider the next case. >> >> > 1. Datanode A (PID 1) reports an error to coordinator A. ([1] 'E' >> >> > message) >> >> > 2. Coordinator A receives [1] and reports an error to a frontend. >> >> > ([2] >> >> > 'E' >> >> > message) >> >> > 3. Coordinator A starts aborting process and it thinks datanode A >> >> > (PID >> >> > 1) is >> >> > not idle. >> >> > 4. Coordinator A sends a cancel request about PID 1 to datanode A >> >> > (PID >> >> > 2). >> >> > ([3] cancel message) >> >> > 5. Datanode A (PID 1) reports ready to coordinator A. ([4] 'Z' >> >> > message) >> >> > 6. Coordinator A receives [4] and sends "ROLLBACK TRANSACTION" >> >> > immediately. >> >> > ([5] 'Q' message) >> >> > 7. Datanode A (PID 1) receives [5] and starts processing the query. >> >> > 8. Datanode A (PID 2) receives [3]. >> >> > 9. Datanode A (PID 2) notify PID 1 of [3]. >> >> > 10. Datanode A (PID 1) cancel processing [5] and reports an error to >> >> > Coordinator A. ([6] 'E' message) >> >> > 11. Coordinator A receives [6] and reports an error to a frontend. >> >> > ([7] >> >> > 'E' >> >> > message) >> >> > >> >> > [7] makes unexpected output and a test fails. >> >> > >> >> > Saying an extreme thing, it could occur that the next query of [5] is >> >> > cancelled by [3]. >> >> > >> >> > As far as I know, there's no way to know when to the cancel request >> >> > get >> >> > to >> >> > be processed, I think we can't not wait an experimental duration >> >> > after >> >> > cancelling like the attached patch. >> >> > >> >> > Does anyone have another cool idea to solve this issue? >> >> > >> >> > Regards. >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> >> > Learn Why More Businesses Are Choosing CenturyLink Cloud For >> >> > Critical Workloads, Development Environments & Everything In Between. >> >> > Get a Quote or Start a Free Trial Today. >> >> > >> >> > >> >> > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> >> > _______________________________________________ >> >> > Postgres-xc-developers mailing list >> >> > Pos...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Android apps run on BlackBerry 10 >> >> Introducing the new BlackBerry 10.2.1 Runtime for Android apps. >> >> Now with support for Jelly Bean, Bluetooth, Mapview and more. >> >> Get your Android app in front of a whole new audience. Start now. >> >> >> >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk >> >> >> >> _______________________________________________ >> >> Postgres-xc-developers mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > >> > >> > >> > -- >> > Andrei Martsinchyk >> > >> > StormDB - http://www.stormdb.com >> > The Database Cloud >> > |