From: Ashutosh B. <ash...@en...> - 2013-04-10 12:16:35
|
Following are no problem commits, I won't be sending separate mails for these commit d2a982cc1e188f6701db79affe138e186132f137 Author: Amit Khandekar <ami...@en...> Date: Thu Apr 4 12:03:24 2013 +0545 Prototype change for ExecProcNodeDMLInXC() commit e65e2daf2e8d9a2eeb2160963c6e04a6b9316df9 Author: Amit Khandekar <ami...@en...> Date: Thu Apr 4 12:22:28 2013 +0545 Add the missing trigger-function header changes. commit bddc56a2a3961ac52266b22d84e27cdb08e19017 Author: Amit Khandekar <ami...@en...> Date: Thu Apr 4 12:24:26 2013 +0545 Added missing jf_xc_node_id and jf_xc_wholerow fields in JunkFilter structure. modified: src/include/nodes/execnodes.h commit 93dd6f9ff3d535856772181929dd41ef3e25f506 Author: Amit Khandekar <ami...@en...> Date: Thu Apr 4 12:28:24 2013 +0545 Enable trigger support by removing the "ROW-trigger-feature-not-supported" message. On Fri, Apr 5, 2013 at 12:38 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Amit, > This is a general comment about trigger execution on coordinator. > > Catalog tables are local to coordinator and any modifcations to those > happen at coordinator in the same way as it would in PostgreSQL. I am not > sure if one can create a trigger on catalog table ( I think not). In case > we could create triggers on catalog tables, we will need to be careful to > make sure that they execute the same way as PostgreSQL. > > > On Thu, Apr 4, 2013 at 7:48 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi Amit, >> Thanks for creating the branch and the commits. >> >> I will give my comments on each of the commits in separate mail. I am >> starting with 1dc081ebe097e63009bab219231e55679db6fae0. Is that the correct >> one? >> >> >> On Thu, Apr 4, 2013 at 12:26 PM, Amit Khandekar < >> ami...@en...> wrote: >> >>> >>> >>> >>> On 3 April 2013 15:10, Ashutosh Bapat <ash...@en...>wrote: >>> >>>> Hi Amit, >>>> Given the magnitude of the change, I think we should break this work >>>> into smaller self-sufficient patches and commit them (either on master or >>>> in a separate branch for trigger work). This will allow us to review and >>>> commit small amount of work and set it aside, rather than going over >>>> everything in every round. >>>> >>> >>> I have created a new branch "rowtriggers" in >>> postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc, where >>> I have dumped incremental changes. >>> >>> >>>> >>>> On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar < >>>> ami...@en...> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On 26 March 2013 15:53, Ashutosh Bapat < >>>>> ash...@en...> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < >>>>>> ami...@en...> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On 4 March 2013 11:11, Amit Khandekar < >>>>>>> ami...@en...> wrote: >>>>>>> >>>>>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >>>>>>>> >> >>>>>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD >>>>>>>> row) and not >>>>>>>> >> just ctid and node_id and required columns and store it at the >>>>>>>> coordinator >>>>>>>> >> for the processing OR whether we should fetch each row (OLD and >>>>>>>> NEW >>>>>>>> >> variants) while processing each row. >>>>>>>> >> >>>>>>>> >> Both of them have performance impacts - the first one has disk >>>>>>>> impact for >>>>>>>> >> large number of rows whereas the second has network impact for >>>>>>>> querying >>>>>>>> >> rows. Is it possible to do some analytical assessment as to >>>>>>>> which of them >>>>>>>> >> would be better? If you can come up with something concrete (may >>>>>>>> be numbers >>>>>>>> >> or formulae) we will be able to judge better as to which one to >>>>>>>> pick up. >>>>>>>> >>>>>>>> Will check if we can come up with some sensible analysis or figures. >>>>>>>> >>>>>>>> >>>>>>> I have done some analysis on both of these approaches here: >>>>>>> >>>>>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing >>>>>>> >>>>>>> In practical terms, we anyways would need to implement (B). The >>>>>>> reason is because when the trigger has conditional execution(WHEN clause) >>>>>>> we *have* to fetch the rows beforehand, so there is no point in fetching >>>>>>> all of them again at the end of the statement when we already have them >>>>>>> locally. So may be it would be too ambitious to have have both >>>>>>> implementations, at least for this release. >>>>>>> >>>>>>> >>>>>> I agree here. We can certainly optimize for various cases later, but >>>>>> we should have something which would give all the functionality (albeit at >>>>>> a lower performance for now). >>>>>> >>>>>> >>>>>>> So I am focussing on (B) right now. We have two options: >>>>>>> >>>>>>> 1. Store all rows in palloced memory, and save the HeapTuple >>>>>>> pointers in the trigger queue, and directly access the OLD and NEW rows >>>>>>> using these pointers when needed. Here we will have no control over how >>>>>>> much memory we should use for the old and new records, and this might even >>>>>>> hamper system performance, let alone XC performance. >>>>>>> 2. Other option is to use tuplestore. Here, we need to store the >>>>>>> positions of the records in the tuplestore. So for a particular tigger >>>>>>> event, fetch by the position. From what I understand, tuplestore can be >>>>>>> advanced only sequentially in either direction. So when the read pointer is >>>>>>> at position 6 and we need to fetch a record at position 10, we need to call >>>>>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead >>>>>>> because it calls tuplestore_gettuple(). But the trigger records are not >>>>>>> distributed so randomly. In fact a set of trigger events for a particular >>>>>>> event id are accessed in the same order as the order in which they are >>>>>>> queued. So for a particular event id, only the first access call will >>>>>>> require random access. tuplestore supports multiple read pointers, so may >>>>>>> be we can make use of that to access the first record using the closest >>>>>>> read pointer. >>>>>>> >>>>>>> >>>>>> Using palloc will be a problem if the size of data fetched is more >>>>>> that what could fit in memory. Also pallocing frequently is going to be >>>>>> performance problem. Let's see how does the tuple store approach go. >>>>>> >>>>> >>>>> While I am working on the AFTER ROW optimization, here's a patch that >>>>> has only BEFORE ROW trigger support, so that it can get you started with >>>>> first round of review. The regression is not analyzed fully yet. Besides >>>>> the AR trigger related changes, I have also stripped the logic of whether >>>>> to run the trigger on datanode or coordinator; this logic depends on both >>>>> before and after triggers. >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> >> >>>>>>>> > >>>>>>>> > Or we can consider a hybrid approach of getting the rows in >>>>>>>> batches of >>>>>>>> > 1000 or so if possible as well. That ways they get into >>>>>>>> coordinator >>>>>>>> > memory in one shot and can be processed in batches. Obviously this >>>>>>>> > should be considered if it's not going to be a complicated >>>>>>>> > implementation. >>>>>>>> >>>>>>>> It just occurred to me that it would not be that hard to optimize >>>>>>>> the >>>>>>>> row-fetching-by-ctid as shown below: >>>>>>>> 1. When it is time to fire the queued triggers at the >>>>>>>> statement/transaction end, initialize cursors - one cursor per >>>>>>>> datanode - which would do: SELECT remote_heap_fetch(table_name, >>>>>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even >>>>>>>> list. >>>>>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT >>>>>>>> using >>>>>>>> the appropriate cursor name according to the datanode id to which >>>>>>>> the >>>>>>>> trigger entry belongs. >>>>>>>> >>>>>>>> > >>>>>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have >>>>>>>> >>> created another patch to fetch only the required attribtues, >>>>>>>> but have >>>>>>>> >>> not merged that into this patch. >>>>>>>> > >>>>>>>> > Do we have other places where we unnecessary fetch all attributes? >>>>>>>> > ISTM, this should be fixed as a performance improvement first >>>>>>>> ahead of >>>>>>>> > everything else. >>>>>>>> >>>>>>>> I believe DML subplan is the only remaining place where we fetch all >>>>>>>> attributes. And yes, this is a must-have for triggers, otherwise, >>>>>>>> the >>>>>>>> other optimizations would be of no use. >>>>>>>> >>>>>>>> > >>>>>>>> >>> 2. One important TODO for BEFORE trigger is this: Just before >>>>>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked >>>>>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is >>>>>>>> fetched >>>>>>>> >>> from the table. So it is made sure that while all the triggers >>>>>>>> for >>>>>>>> >>> that table are executed, no one can update that particular row. >>>>>>>> >>> In the patch, we haven't locked the row. We need to lock it >>>>>>>> either by >>>>>>>> >>> executing : >>>>>>>> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, >>>>>>>> and then >>>>>>>> >>> use the returned ROW as the OLD row. >>>>>>>> >>> OR >>>>>>>> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE >>>>>>>> so that >>>>>>>> >>> the row is already locked, and we don't have to lock it again. >>>>>>>> >>> #2 is simple though it might cause some amount of longer waits >>>>>>>> in general. >>>>>>>> >>> Using #1, though the locks would be acquired only when the >>>>>>>> particular >>>>>>>> >>> row is updated, the locks would be released only after >>>>>>>> transaction >>>>>>>> >>> end, so #1 might not be worth implementing. >>>>>>>> >>> Also #1 requires another explicit remote fetch for the >>>>>>>> >>> lock-and-get-latest-version operation. >>>>>>>> >>> I am more inclined towards #2. >>>>>>>> >>> >>>>>>>> >> The option #2 however, has problem of locking too many rows if >>>>>>>> there are >>>>>>>> >> coordinator quals in the subplans IOW the number of rows finally >>>>>>>> updated are >>>>>>>> >> lesser than the number of rows fetched from the datanode. It can >>>>>>>> cause >>>>>>>> >> unwanted deadlocks. Unless there is a way to release these extra >>>>>>>> locks, I am >>>>>>>> >> afraid this option will be a problem. >>>>>>>> >>>>>>>> True. Regardless of anything else - whether it is deadlocks or >>>>>>>> longer >>>>>>>> waits, we should not lock rows that are not to be updated. >>>>>>>> >>>>>>>> There is a more general row-locking issue that we need to solve >>>>>>>> first >>>>>>>> : 3606317. I anticipate that solving this will solve the trigger >>>>>>>> specific lock issue. So for triggers, this is a must-have, and I am >>>>>>>> going to solve this issue as part of this bug 3606317. >>>>>>>> >>>>>>>> >> >>>>>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I >>>>>>>> do >>>>>>>> > not see deadlock scenarios.. >>>>>>>> > >>>>>>>> > With the FQS shipping work being done by Ashutosh, will we also >>>>>>>> ship >>>>>>>> > major chunks of subplans to the datanodes? If yes, then row >>>>>>>> locking >>>>>>>> > will only involve required tuples (hopefully) from the >>>>>>>> coordinator's >>>>>>>> > point of view. >>>>>>>> > >>>>>>>> > Also, something radical is can be invent a new type of FOR [NODE] >>>>>>>> > UPDATE type lock to minimize the impact of such locking of rows on >>>>>>>> > datanodes? >>>>>>>> > >>>>>>>> > Regards, >>>>>>>> > Nikhils >>>>>>>> > >>>>>>>> >>> >>>>>>>> >>> 3. The BEFORE trigger function can change the distribution >>>>>>>> column >>>>>>>> >>> itself. We need to add a check at the end of the trigger >>>>>>>> executions. >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> Good, you thought about that. Yes we should check it. >>>>>>>> >> >>>>>>>> >>> >>>>>>>> >>> 4. Fetching OLD row for WHEN clause handling. >>>>>>>> >>> >>>>>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>>>>>>> >>> >>>>>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to >>>>>>>> work >>>>>>>> >>> without significant changes, but they are yet to be tested. >>>>>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been >>>>>>>> done while >>>>>>>> >>> implementing UPDATE triggers. But some changes specific to >>>>>>>> INSERT and >>>>>>>> >>> DELETE are yet to be done. >>>>>>>> >>> Deferred triggers to be tested. >>>>>>>> >>> >>>>>>>> >>> 7. Regression analysis. There are some new failures. Will post >>>>>>>> another >>>>>>>> >>> fair version of the patch after regression analysis and fixing >>>>>>>> various >>>>>>>> >>> TODOs. >>>>>>>> >>> >>>>>>>> >>> Comments welcome. >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>> Everyone hates slow websites. So do we. >>>>>>>> >>> Make your web apps faster with AppDynamics >>>>>>>> >>> Download AppDynamics Lite for free today: >>>>>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>>>> >>> _______________________________________________ >>>>>>>> >>> Postgres-xc-developers mailing list >>>>>>>> >>> Pos...@li... >>>>>>>> >>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> -- >>>>>>>> >> Best Wishes, >>>>>>>> >> Ashutosh Bapat >>>>>>>> >> EntepriseDB Corporation >>>>>>>> >> The Enterprise Postgres Company >>>>>>>> >> >>>>>>>> >> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >> Everyone hates slow websites. So do we. >>>>>>>> >> Make your web apps faster with AppDynamics >>>>>>>> >> Download AppDynamics Lite for free today: >>>>>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>>>> >> _______________________________________________ >>>>>>>> >> Postgres-xc-developers mailing list >>>>>>>> >> Pos...@li... >>>>>>>> >> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>> >> >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > -- >>>>>>>> > StormDB - http://www.stormdb.com >>>>>>>> > The Database Cloud >>>>>>>> > Postgres-XC Support and Service >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Wishes, >>>>>> Ashutosh Bapat >>>>>> EntepriseDB Corporation >>>>>> The Enterprise Postgres Company >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |