Re: [Postgres-xc-developers] Implementation details for BEFORE and AFTER ROW trigger support

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Amit,
Given the magnitude of the change, I think we should break this work into
smaller self-sufficient patches and commit them (either on master or in a
separate branch for trigger work). This will allow us to review and commit
small amount of work and set it aside, rather than going over everything in
every round.

On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar <
ami...@en...> wrote:

>
>
>
> On 26 March 2013 15:53, Ashutosh Bapat <ash...@en...>wrote:
>
>>
>>
>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar <
>> ami...@en...> wrote:
>>
>>>
>>>
>>> On 4 March 2013 11:11, Amit Khandekar <ami...@en...>wrote:
>>>
>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote:
>>>> >>
>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD row)
>>>> and not
>>>> >> just ctid and node_id and required columns and store it at the
>>>> coordinator
>>>> >> for the processing  OR whether we should fetch each row (OLD and NEW
>>>> >> variants) while processing each row.
>>>> >>
>>>> >> Both of them have performance impacts - the first one has disk
>>>> impact for
>>>> >> large number of rows whereas the second has network impact for
>>>> querying
>>>> >> rows. Is it possible to do some analytical assessment as to which of
>>>> them
>>>> >> would be better? If you can come up with something concrete (may be
>>>> numbers
>>>> >> or formulae) we will be able to judge better as to which one to pick
>>>> up.
>>>>
>>>> Will check if we can come up with some sensible analysis or figures.
>>>>
>>>>
>>> I have done some analysis on both of these approaches here:
>>>
>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing
>>>
>>> In practical terms, we anyways would need to implement (B). The reason
>>> is because when the trigger has conditional execution(WHEN clause) we
>>> *have* to fetch the rows beforehand, so there is no point in fetching all
>>> of them again at the end of the statement when we already have them
>>> locally. So may be it would be too ambitious to have have both
>>> implementations, at least for this release.
>>>
>>>
>> I agree here. We can certainly optimize for various cases later, but we
>> should have something which would give all the functionality (albeit at a
>> lower performance for now).
>>
>>
>>> So I am focussing on (B) right now. We have two options:
>>>
>>> 1. Store all rows in palloced memory, and save the HeapTuple pointers in
>>> the trigger queue, and directly access the OLD and NEW rows using these
>>> pointers when needed. Here we will have no control over how much memory we
>>> should use for the old and new records, and this might even hamper system
>>> performance, let alone XC performance.
>>> 2. Other option is to use tuplestore. Here, we need to store the
>>> positions of the records in the tuplestore. So for a particular tigger
>>> event, fetch by the position. From what I understand, tuplestore can be
>>> advanced only sequentially in either direction. So when the read pointer is
>>> at position 6 and we need to fetch a record at position 10, we need to call
>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead
>>> because it calls tuplestore_gettuple(). But the trigger records are not
>>> distributed so randomly. In fact a set of trigger events for a particular
>>> event id are accessed in the same order as the order in which they are
>>> queued. So for a particular event id, only the first access call will
>>> require random access. tuplestore supports multiple read pointers, so may
>>> be we can make use of that to access the first record using the closest
>>> read pointer.
>>>
>>>
>> Using palloc will be a problem if the size of data fetched is more that
>> what could fit in memory. Also pallocing frequently is going to be
>> performance problem. Let's see how does the tuple store approach go.
>>
>
> While I am working on the AFTER ROW optimization, here's a patch that has
> only BEFORE ROW trigger support, so that it can get you started with first
> round of review. The regression is not analyzed fully yet. Besides the AR
> trigger related changes, I have also stripped the logic of whether to run
> the trigger on datanode or coordinator; this logic depends on both before
> and after triggers.
>
>
>>
>>
>>>
>>>
>>>>  >>
>>>> >
>>>> > Or we can consider a hybrid approach of getting the rows in batches of
>>>> > 1000 or so if possible as well. That ways they get into coordinator
>>>> > memory in one shot and can be processed in batches. Obviously this
>>>> > should be considered if it's not going to be a complicated
>>>> > implementation.
>>>>
>>>> It just occurred to me that it would not be that hard to optimize the
>>>> row-fetching-by-ctid as shown below:
>>>> 1. When it is time to fire the queued triggers at the
>>>> statement/transaction end, initialize cursors - one cursor per
>>>> datanode - which would do: SELECT remote_heap_fetch(table_name,
>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even list.
>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT using
>>>> the appropriate cursor name according to the datanode id to which the
>>>> trigger entry belongs.
>>>>
>>>> >
>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have
>>>> >>> created another patch to fetch only the required attribtues, but
>>>> have
>>>> >>> not merged that into this patch.
>>>> >
>>>> > Do we have other places where we unnecessary fetch all attributes?
>>>> > ISTM, this should be fixed as a performance improvement first ahead of
>>>> > everything else.
>>>>
>>>> I believe DML subplan is the only remaining place where we fetch all
>>>> attributes. And yes, this is a must-have for triggers, otherwise, the
>>>> other optimizations would be of no use.
>>>>
>>>> >
>>>> >>> 2.  One important TODO for BEFORE trigger is this: Just before
>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked
>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is fetched
>>>> >>> from the table. So it is made sure that while all the triggers for
>>>> >>> that table are executed, no one can update that particular row.
>>>> >>> In the patch, we haven't locked the row. We need to lock it either
>>>> by
>>>> >>> executing :
>>>> >>>    1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and
>>>> then
>>>> >>> use the returned ROW as the OLD row.
>>>> >>> OR
>>>> >>>    2. The UPDATE subplan itself should have SELECT for UPDATE so
>>>> that
>>>> >>> the row is already locked, and we don't have to lock it again.
>>>> >>> #2 is simple though it might cause some amount of longer waits in
>>>> general.
>>>> >>> Using #1, though the locks would be acquired only when the
>>>> particular
>>>> >>> row is updated, the locks would be released only after transaction
>>>> >>> end, so #1 might not be worth implementing.
>>>> >>> Also #1 requires another explicit remote fetch for the
>>>> >>> lock-and-get-latest-version operation.
>>>> >>> I am more inclined towards #2.
>>>> >>>
>>>> >> The option #2 however, has problem of locking too many rows if there
>>>> are
>>>> >> coordinator quals in the subplans IOW the number of rows finally
>>>> updated are
>>>> >> lesser than the number of rows fetched from the datanode. It can
>>>> cause
>>>> >> unwanted deadlocks. Unless there is a way to release these extra
>>>> locks, I am
>>>> >> afraid this option will be a problem.
>>>>
>>>> True. Regardless of anything else - whether it is deadlocks or longer
>>>> waits, we should not lock rows that are not to be updated.
>>>>
>>>> There is a more general row-locking issue that we need to solve first
>>>> : 3606317. I anticipate that solving this will solve the trigger
>>>> specific lock issue. So for triggers, this is a must-have, and I am
>>>> going to solve this issue as part of this bug 3606317.
>>>>
>>>> >>
>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I do
>>>> > not see deadlock scenarios..
>>>> >
>>>> > With the FQS shipping work being done by Ashutosh, will we also ship
>>>> > major chunks of subplans to the datanodes? If yes, then row locking
>>>> > will only involve required tuples (hopefully) from the coordinator's
>>>> > point of view.
>>>> >
>>>> > Also, something radical is can be invent a new type of FOR [NODE]
>>>> > UPDATE type lock to minimize the impact of such locking of rows on
>>>> > datanodes?
>>>> >
>>>> > Regards,
>>>> > Nikhils
>>>> >
>>>> >>>
>>>> >>> 3.  The BEFORE trigger function can change the distribution column
>>>> >>> itself. We need to add a check at the end of the trigger executions.
>>>> >>>
>>>> >>
>>>> >> Good, you thought about that. Yes we should check it.
>>>> >>
>>>> >>>
>>>> >>> 4. Fetching OLD row for WHEN clause handling.
>>>> >>>
>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers
>>>> >>>
>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to work
>>>> >>> without significant changes, but they are yet to be tested.
>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been done
>>>> while
>>>> >>> implementing UPDATE triggers. But some changes specific to INSERT
>>>> and
>>>> >>> DELETE are yet to be done.
>>>> >>> Deferred triggers to be tested.
>>>> >>>
>>>> >>> 7. Regression analysis. There are some new failures. Will post
>>>> another
>>>> >>> fair version of the patch after regression analysis and fixing
>>>> various
>>>> >>> TODOs.
>>>> >>>
>>>> >>> Comments welcome.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> ------------------------------------------------------------------------------
>>>> >>> Everyone hates slow websites. So do we.
>>>> >>> Make your web apps faster with AppDynamics
>>>> >>> Download AppDynamics Lite for free today:
>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb
>>>> >>> _______________________________________________
>>>> >>> Postgres-xc-developers mailing list
>>>> >>> Pos...@li...
>>>> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Best Wishes,
>>>> >> Ashutosh Bapat
>>>> >> EntepriseDB Corporation
>>>> >> The Enterprise Postgres Company
>>>> >>
>>>> >>
>>>> ------------------------------------------------------------------------------
>>>> >> Everyone hates slow websites. So do we.
>>>> >> Make your web apps faster with AppDynamics
>>>> >> Download AppDynamics Lite for free today:
>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb
>>>> >> _______________________________________________
>>>> >> Postgres-xc-developers mailing list
>>>> >> Pos...@li...
>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > StormDB - http://www.stormdb.com
>>>> > The Database Cloud
>>>> > Postgres-XC Support and Service
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company