Re: [Postgres-xc-developers] Implementation details for BEFORE and AFTER ROW trigger support

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Following are no problem commits, I won't be sending separate mails for
these
commit d2a982cc1e188f6701db79affe138e186132f137
Author: Amit Khandekar <ami...@en...>
Date:   Thu Apr 4 12:03:24 2013 +0545

    Prototype change for ExecProcNodeDMLInXC()

commit e65e2daf2e8d9a2eeb2160963c6e04a6b9316df9
Author: Amit Khandekar <ami...@en...>
Date:   Thu Apr 4 12:22:28 2013 +0545

    Add the missing trigger-function header changes.

commit bddc56a2a3961ac52266b22d84e27cdb08e19017
Author: Amit Khandekar <ami...@en...>
Date:   Thu Apr 4 12:24:26 2013 +0545

    Added missing jf_xc_node_id and jf_xc_wholerow fields in JunkFilter
structure.
        modified:   src/include/nodes/execnodes.h

commit 93dd6f9ff3d535856772181929dd41ef3e25f506
Author: Amit Khandekar <ami...@en...>
Date:   Thu Apr 4 12:28:24 2013 +0545

    Enable trigger support by removing the
"ROW-trigger-feature-not-supported" message.


On Fri, Apr 5, 2013 at 12:38 PM, Ashutosh Bapat <
ash...@en...> wrote:

> Hi Amit,
> This is a general comment about trigger execution on coordinator.
>
> Catalog tables are local to coordinator and any modifcations to those
> happen at coordinator in the same way as it would in PostgreSQL. I am not
> sure if one can create a trigger on catalog table ( I think not). In case
> we could create triggers on catalog tables, we will need to be careful to
> make sure that they execute the same way as PostgreSQL.
>
>
> On Thu, Apr 4, 2013 at 7:48 PM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>> Hi Amit,
>> Thanks for creating the branch and the commits.
>>
>> I will give my comments on each of the commits in separate mail. I am
>> starting with 1dc081ebe097e63009bab219231e55679db6fae0. Is that the correct
>> one?
>>
>>
>> On Thu, Apr 4, 2013 at 12:26 PM, Amit Khandekar <
>> ami...@en...> wrote:
>>
>>>
>>>
>>>
>>> On 3 April 2013 15:10, Ashutosh Bapat <ash...@en...>wrote:
>>>
>>>> Hi Amit,
>>>> Given the magnitude of the change, I think we should break this work
>>>> into smaller self-sufficient patches and commit them (either on master or
>>>> in a separate branch for trigger work). This will allow us to review and
>>>> commit small amount of work and set it aside, rather than going over
>>>> everything in every round.
>>>>
>>>
>>> I have created a new branch "rowtriggers" in
>>> postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc, where
>>> I have dumped incremental changes.
>>>
>>>
>>>>
>>>> On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar <
>>>> ami...@en...> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 26 March 2013 15:53, Ashutosh Bapat <
>>>>> ash...@en...> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar <
>>>>>> ami...@en...> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4 March 2013 11:11, Amit Khandekar <
>>>>>>> ami...@en...> wrote:
>>>>>>>
>>>>>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote:
>>>>>>>> >>
>>>>>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD
>>>>>>>> row) and not
>>>>>>>> >> just ctid and node_id and required columns and store it at the
>>>>>>>> coordinator
>>>>>>>> >> for the processing  OR whether we should fetch each row (OLD and
>>>>>>>> NEW
>>>>>>>> >> variants) while processing each row.
>>>>>>>> >>
>>>>>>>> >> Both of them have performance impacts - the first one has disk
>>>>>>>> impact for
>>>>>>>> >> large number of rows whereas the second has network impact for
>>>>>>>> querying
>>>>>>>> >> rows. Is it possible to do some analytical assessment as to
>>>>>>>> which of them
>>>>>>>> >> would be better? If you can come up with something concrete (may
>>>>>>>> be numbers
>>>>>>>> >> or formulae) we will be able to judge better as to which one to
>>>>>>>> pick up.
>>>>>>>>
>>>>>>>> Will check if we can come up with some sensible analysis or figures.
>>>>>>>>
>>>>>>>>
>>>>>>> I have done some analysis on both of these approaches here:
>>>>>>>
>>>>>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing
>>>>>>>
>>>>>>> In practical terms, we anyways would need to implement (B). The
>>>>>>> reason is because when the trigger has conditional execution(WHEN clause)
>>>>>>> we *have* to fetch the rows beforehand, so there is no point in fetching
>>>>>>> all of them again at the end of the statement when we already have them
>>>>>>> locally. So may be it would be too ambitious to have have both
>>>>>>> implementations, at least for this release.
>>>>>>>
>>>>>>>
>>>>>> I agree here. We can certainly optimize for various cases later, but
>>>>>> we should have something which would give all the functionality (albeit at
>>>>>> a lower performance for now).
>>>>>>
>>>>>>
>>>>>>> So I am focussing on (B) right now. We have two options:
>>>>>>>
>>>>>>> 1. Store all rows in palloced memory, and save the HeapTuple
>>>>>>> pointers in the trigger queue, and directly access the OLD and NEW rows
>>>>>>> using these pointers when needed. Here we will have no control over how
>>>>>>> much memory we should use for the old and new records, and this might even
>>>>>>> hamper system performance, let alone XC performance.
>>>>>>> 2. Other option is to use tuplestore. Here, we need to store the
>>>>>>> positions of the records in the tuplestore. So for a particular tigger
>>>>>>> event, fetch by the position. From what I understand, tuplestore can be
>>>>>>> advanced only sequentially in either direction. So when the read pointer is
>>>>>>> at position 6 and we need to fetch a record at position 10, we need to call
>>>>>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead
>>>>>>> because it calls tuplestore_gettuple(). But the trigger records are not
>>>>>>> distributed so randomly. In fact a set of trigger events for a particular
>>>>>>> event id are accessed in the same order as the order in which they are
>>>>>>> queued. So for a particular event id, only the first access call will
>>>>>>> require random access. tuplestore supports multiple read pointers, so may
>>>>>>> be we can make use of that to access the first record using the closest
>>>>>>> read pointer.
>>>>>>>
>>>>>>>
>>>>>> Using palloc will be a problem if the size of data fetched is more
>>>>>> that what could fit in memory. Also pallocing frequently is going to be
>>>>>> performance problem. Let's see how does the tuple store approach go.
>>>>>>
>>>>>
>>>>> While I am working on the AFTER ROW optimization, here's a patch that
>>>>> has only BEFORE ROW trigger support, so that it can get you started with
>>>>> first round of review. The regression is not analyzed fully yet. Besides
>>>>> the AR trigger related changes, I have also stripped the logic of whether
>>>>> to run the trigger on datanode or coordinator; this logic depends on both
>>>>> before and after triggers.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>  >>
>>>>>>>> >
>>>>>>>> > Or we can consider a hybrid approach of getting the rows in
>>>>>>>> batches of
>>>>>>>> > 1000 or so if possible as well. That ways they get into
>>>>>>>> coordinator
>>>>>>>> > memory in one shot and can be processed in batches. Obviously this
>>>>>>>> > should be considered if it's not going to be a complicated
>>>>>>>> > implementation.
>>>>>>>>
>>>>>>>> It just occurred to me that it would not be that hard to optimize
>>>>>>>> the
>>>>>>>> row-fetching-by-ctid as shown below:
>>>>>>>> 1. When it is time to fire the queued triggers at the
>>>>>>>> statement/transaction end, initialize cursors - one cursor per
>>>>>>>> datanode - which would do: SELECT remote_heap_fetch(table_name,
>>>>>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even
>>>>>>>> list.
>>>>>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT
>>>>>>>> using
>>>>>>>> the appropriate cursor name according to the datanode id to which
>>>>>>>> the
>>>>>>>> trigger entry belongs.
>>>>>>>>
>>>>>>>> >
>>>>>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have
>>>>>>>> >>> created another patch to fetch only the required attribtues,
>>>>>>>> but have
>>>>>>>> >>> not merged that into this patch.
>>>>>>>> >
>>>>>>>> > Do we have other places where we unnecessary fetch all attributes?
>>>>>>>> > ISTM, this should be fixed as a performance improvement first
>>>>>>>> ahead of
>>>>>>>> > everything else.
>>>>>>>>
>>>>>>>> I believe DML subplan is the only remaining place where we fetch all
>>>>>>>> attributes. And yes, this is a must-have for triggers, otherwise,
>>>>>>>> the
>>>>>>>> other optimizations would be of no use.
>>>>>>>>
>>>>>>>> >
>>>>>>>> >>> 2.  One important TODO for BEFORE trigger is this: Just before
>>>>>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked
>>>>>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is
>>>>>>>> fetched
>>>>>>>> >>> from the table. So it is made sure that while all the triggers
>>>>>>>> for
>>>>>>>> >>> that table are executed, no one can update that particular row.
>>>>>>>> >>> In the patch, we haven't locked the row. We need to lock it
>>>>>>>> either by
>>>>>>>> >>> executing :
>>>>>>>> >>>    1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE,
>>>>>>>> and then
>>>>>>>> >>> use the returned ROW as the OLD row.
>>>>>>>> >>> OR
>>>>>>>> >>>    2. The UPDATE subplan itself should have SELECT for UPDATE
>>>>>>>> so that
>>>>>>>> >>> the row is already locked, and we don't have to lock it again.
>>>>>>>> >>> #2 is simple though it might cause some amount of longer waits
>>>>>>>> in general.
>>>>>>>> >>> Using #1, though the locks would be acquired only when the
>>>>>>>> particular
>>>>>>>> >>> row is updated, the locks would be released only after
>>>>>>>> transaction
>>>>>>>> >>> end, so #1 might not be worth implementing.
>>>>>>>> >>> Also #1 requires another explicit remote fetch for the
>>>>>>>> >>> lock-and-get-latest-version operation.
>>>>>>>> >>> I am more inclined towards #2.
>>>>>>>> >>>
>>>>>>>> >> The option #2 however, has problem of locking too many rows if
>>>>>>>> there are
>>>>>>>> >> coordinator quals in the subplans IOW the number of rows finally
>>>>>>>> updated are
>>>>>>>> >> lesser than the number of rows fetched from the datanode. It can
>>>>>>>> cause
>>>>>>>> >> unwanted deadlocks. Unless there is a way to release these extra
>>>>>>>> locks, I am
>>>>>>>> >> afraid this option will be a problem.
>>>>>>>>
>>>>>>>> True. Regardless of anything else - whether it is deadlocks or
>>>>>>>> longer
>>>>>>>> waits, we should not lock rows that are not to be updated.
>>>>>>>>
>>>>>>>> There is a more general row-locking issue that we need to solve
>>>>>>>> first
>>>>>>>> : 3606317. I anticipate that solving this will solve the trigger
>>>>>>>> specific lock issue. So for triggers, this is a must-have, and I am
>>>>>>>> going to solve this issue as part of this bug 3606317.
>>>>>>>>
>>>>>>>> >>
>>>>>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I
>>>>>>>> do
>>>>>>>> > not see deadlock scenarios..
>>>>>>>> >
>>>>>>>> > With the FQS shipping work being done by Ashutosh, will we also
>>>>>>>> ship
>>>>>>>> > major chunks of subplans to the datanodes? If yes, then row
>>>>>>>> locking
>>>>>>>> > will only involve required tuples (hopefully) from the
>>>>>>>> coordinator's
>>>>>>>> > point of view.
>>>>>>>> >
>>>>>>>> > Also, something radical is can be invent a new type of FOR [NODE]
>>>>>>>> > UPDATE type lock to minimize the impact of such locking of rows on
>>>>>>>> > datanodes?
>>>>>>>> >
>>>>>>>> > Regards,
>>>>>>>> > Nikhils
>>>>>>>> >
>>>>>>>> >>>
>>>>>>>> >>> 3.  The BEFORE trigger function can change the distribution
>>>>>>>> column
>>>>>>>> >>> itself. We need to add a check at the end of the trigger
>>>>>>>> executions.
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >> Good, you thought about that. Yes we should check it.
>>>>>>>> >>
>>>>>>>> >>>
>>>>>>>> >>> 4. Fetching OLD row for WHEN clause handling.
>>>>>>>> >>>
>>>>>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers
>>>>>>>> >>>
>>>>>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to
>>>>>>>> work
>>>>>>>> >>> without significant changes, but they are yet to be tested.
>>>>>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been
>>>>>>>> done while
>>>>>>>> >>> implementing UPDATE triggers. But some changes specific to
>>>>>>>> INSERT and
>>>>>>>> >>> DELETE are yet to be done.
>>>>>>>> >>> Deferred triggers to be tested.
>>>>>>>> >>>
>>>>>>>> >>> 7. Regression analysis. There are some new failures. Will post
>>>>>>>> another
>>>>>>>> >>> fair version of the patch after regression analysis and fixing
>>>>>>>> various
>>>>>>>> >>> TODOs.
>>>>>>>> >>>
>>>>>>>> >>> Comments welcome.
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> >>> Everyone hates slow websites. So do we.
>>>>>>>> >>> Make your web apps faster with AppDynamics
>>>>>>>> >>> Download AppDynamics Lite for free today:
>>>>>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb
>>>>>>>> >>> _______________________________________________
>>>>>>>> >>> Postgres-xc-developers mailing list
>>>>>>>> >>> Pos...@li...
>>>>>>>> >>>
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Best Wishes,
>>>>>>>> >> Ashutosh Bapat
>>>>>>>> >> EntepriseDB Corporation
>>>>>>>> >> The Enterprise Postgres Company
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> >> Everyone hates slow websites. So do we.
>>>>>>>> >> Make your web apps faster with AppDynamics
>>>>>>>> >> Download AppDynamics Lite for free today:
>>>>>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb
>>>>>>>> >> _______________________________________________
>>>>>>>> >> Postgres-xc-developers mailing list
>>>>>>>> >> Pos...@li...
>>>>>>>> >>
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > StormDB - http://www.stormdb.com
>>>>>>>> > The Database Cloud
>>>>>>>> > Postgres-XC Support and Service
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes,
>>>>>> Ashutosh Bapat
>>>>>> EntepriseDB Corporation
>>>>>> The Enterprise Postgres Company
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes,
>>>> Ashutosh Bapat
>>>> EntepriseDB Corporation
>>>> The Enterprise Postgres Company
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>


-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company