Re: [Postgres-xc-developers] Random failures in the regression test.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

It seems to be an issue of PG itself, doesn't it?
---
Koichi Suzuki


2014-02-14 14:06 GMT+09:00 Andrei Martsinchyk <and...@gm...>:
> You are right, the temp objects are problem.
> On the one hand if we run a long query and there was an error on one node we
> want to cancel it on others to avoid unnecessary waiting. On the other hand
> the query may be near its natural end and the cancel may be late and hit the
> next query.
> Just throwing out ideas:
> - Make Cancel more selective and affect only specific query. That means an
> ID for each query to introduce, that should be known to client and way to
> deliver it.
> - Introduce procedure of changing backend key. Old cancel won't affect such
> backend.
> - Before starting new query, check if there is pending cancel and remove it.
> It sounds ridiculous "cancel cancel" but may work, if queries and cancels
> are issued synchronously from single source.
>
> 14.02.2014 4:07 пользователь "Koichi Suzuki" <koi...@gm...>
> написал:
>
>> I misunderstand the implication.   Anyway additional wait is separate
>> from your suggestion.
>>
>> Disconnecting the connection as you suggested will bring another
>> problem such as TEMPORARY object in the subsequent queries.   We do
>> not support TEMPORARY object but I believe we should be consistent on
>> this for future releases.
>>
>> Thoughts?
>> ---
>> Koichi Suzuki
>>
>>
>> 2014-02-14 2:30 GMT+09:00 Andrei Martsinchyk
>> <and...@gm...>:
>> > Hello,
>> >
>> > Postgres establishes separate connection to deliver Cancel command to
>> > the
>> > target session.
>> > On a heavily loaded node it may take fairly long. Longer sleep would
>> > help
>> > out, but it means longer recovery after an error.
>> > Better solution is to remove canceled connection from the pool and
>> > therefore
>> > do not use it to handle subsequent queries.
>> >
>> >
>> >
>> > 2014-02-13 11:10 GMT+02:00 Koichi Suzuki <koi...@gm...>:
>> >>
>> >> I think it hits the point.   I tested this patch several times and it
>> >> seems to work fine.   The delay time (at present 10ms) is short enough
>> >> and it is applied only when we need to cancel a statement.
>> >>
>> >> We should check this into all the master and STABLE branches improving
>> >> magic number with some meaningful name.
>> >>
>> >> Any thoughts?
>> >> ---
>> >> Koichi Suzuki
>> >>
>> >>
>> >> 2014-01-24 18:25 GMT+09:00 Masataka Saito <pg...@gm...>:
>> >> > Hello,
>> >> >
>> >> > As I've been exasperated by random failures, I'm willing to whip the
>> >> > cause
>> >> > of the issue.
>> >> >
>> >> > This issue is related to cancel of the failed query.
>> >> > When a datanode reports an error of a query, a coordinator sends a
>> >> > cancel
>> >> > request to non-idle nodes, waits the node to get ready and requests
>> >> > nodes to
>> >> > rollback the transaction.
>> >> >
>> >> > Where's the problem? Consider the next case.
>> >> > 1. Datanode A (PID 1) reports an error to coordinator A. ([1] 'E'
>> >> > message)
>> >> > 2. Coordinator A receives [1] and reports an error to a frontend.
>> >> > ([2]
>> >> > 'E'
>> >> > message)
>> >> > 3. Coordinator A starts aborting process and it thinks datanode A
>> >> > (PID
>> >> > 1) is
>> >> > not idle.
>> >> > 4. Coordinator A sends a cancel request about PID 1 to datanode A
>> >> > (PID
>> >> > 2).
>> >> > ([3] cancel message)
>> >> > 5. Datanode A (PID 1) reports ready to coordinator A. ([4] 'Z'
>> >> > message)
>> >> > 6. Coordinator A receives [4] and sends "ROLLBACK TRANSACTION"
>> >> > immediately.
>> >> > ([5] 'Q' message)
>> >> > 7. Datanode A (PID 1) receives [5] and starts processing the query.
>> >> > 8. Datanode A (PID 2) receives [3].
>> >> > 9. Datanode A (PID 2) notify PID 1 of [3].
>> >> > 10. Datanode A (PID 1) cancel processing [5] and reports an error to
>> >> > Coordinator A. ([6] 'E' message)
>> >> > 11. Coordinator A receives [6] and reports an error to a frontend.
>> >> > ([7]
>> >> > 'E'
>> >> > message)
>> >> >
>> >> > [7] makes unexpected output and a test fails.
>> >> >
>> >> > Saying an extreme thing, it could occur that the next query of [5] is
>> >> > cancelled by [3].
>> >> >
>> >> > As far as I know, there's no way to know when to the cancel request
>> >> > get
>> >> > to
>> >> > be processed, I think we can't not wait an experimental duration
>> >> > after
>> >> > cancelling like the attached patch.
>> >> >
>> >> > Does anyone have another cool idea to solve this issue?
>> >> >
>> >> > Regards.
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> >> > Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> >> > Critical Workloads, Development Environments & Everything In Between.
>> >> > Get a Quote or Start a Free Trial Today.
>> >> >
>> >> >
>> >> > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> >> > _______________________________________________
>> >> > Postgres-xc-developers mailing list
>> >> > Pos...@li...
>> >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>> >> >
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Android apps run on BlackBerry 10
>> >> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
>> >> Now with support for Jelly Bean, Bluetooth, Mapview and more.
>> >> Get your Android app in front of a whole new audience.  Start now.
>> >>
>> >>
>> >> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
>> >>
>> >> _______________________________________________
>> >> Postgres-xc-developers mailing list
>> >> Pos...@li...
>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>> >
>> >
>> >
>> >
>> > --
>> > Andrei Martsinchyk
>> >
>> > StormDB - http://www.stormdb.com
>> > The Database Cloud
>> >