From: Amit K. <ami...@en...> - 2013-04-04 06:57:02
|
On 3 April 2013 15:10, Ashutosh Bapat <ash...@en...>wrote: > Hi Amit, > Given the magnitude of the change, I think we should break this work into > smaller self-sufficient patches and commit them (either on master or in a > separate branch for trigger work). This will allow us to review and commit > small amount of work and set it aside, rather than going over everything in > every round. > I have created a new branch "rowtriggers" in postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc, where I have dumped incremental changes. > > On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar < > ami...@en...> wrote: > >> >> >> >> On 26 March 2013 15:53, Ashutosh Bapat <ash...@en...>wrote: >> >>> >>> >>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < >>> ami...@en...> wrote: >>> >>>> >>>> >>>> On 4 March 2013 11:11, Amit Khandekar <ami...@en...>wrote: >>>> >>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >>>>> >> >>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD >>>>> row) and not >>>>> >> just ctid and node_id and required columns and store it at the >>>>> coordinator >>>>> >> for the processing OR whether we should fetch each row (OLD and NEW >>>>> >> variants) while processing each row. >>>>> >> >>>>> >> Both of them have performance impacts - the first one has disk >>>>> impact for >>>>> >> large number of rows whereas the second has network impact for >>>>> querying >>>>> >> rows. Is it possible to do some analytical assessment as to which >>>>> of them >>>>> >> would be better? If you can come up with something concrete (may be >>>>> numbers >>>>> >> or formulae) we will be able to judge better as to which one to >>>>> pick up. >>>>> >>>>> Will check if we can come up with some sensible analysis or figures. >>>>> >>>>> >>>> I have done some analysis on both of these approaches here: >>>> >>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing >>>> >>>> In practical terms, we anyways would need to implement (B). The reason >>>> is because when the trigger has conditional execution(WHEN clause) we >>>> *have* to fetch the rows beforehand, so there is no point in fetching all >>>> of them again at the end of the statement when we already have them >>>> locally. So may be it would be too ambitious to have have both >>>> implementations, at least for this release. >>>> >>>> >>> I agree here. We can certainly optimize for various cases later, but we >>> should have something which would give all the functionality (albeit at a >>> lower performance for now). >>> >>> >>>> So I am focussing on (B) right now. We have two options: >>>> >>>> 1. Store all rows in palloced memory, and save the HeapTuple pointers >>>> in the trigger queue, and directly access the OLD and NEW rows using these >>>> pointers when needed. Here we will have no control over how much memory we >>>> should use for the old and new records, and this might even hamper system >>>> performance, let alone XC performance. >>>> 2. Other option is to use tuplestore. Here, we need to store the >>>> positions of the records in the tuplestore. So for a particular tigger >>>> event, fetch by the position. From what I understand, tuplestore can be >>>> advanced only sequentially in either direction. So when the read pointer is >>>> at position 6 and we need to fetch a record at position 10, we need to call >>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead >>>> because it calls tuplestore_gettuple(). But the trigger records are not >>>> distributed so randomly. In fact a set of trigger events for a particular >>>> event id are accessed in the same order as the order in which they are >>>> queued. So for a particular event id, only the first access call will >>>> require random access. tuplestore supports multiple read pointers, so may >>>> be we can make use of that to access the first record using the closest >>>> read pointer. >>>> >>>> >>> Using palloc will be a problem if the size of data fetched is more that >>> what could fit in memory. Also pallocing frequently is going to be >>> performance problem. Let's see how does the tuple store approach go. >>> >> >> While I am working on the AFTER ROW optimization, here's a patch that has >> only BEFORE ROW trigger support, so that it can get you started with first >> round of review. The regression is not analyzed fully yet. Besides the AR >> trigger related changes, I have also stripped the logic of whether to run >> the trigger on datanode or coordinator; this logic depends on both before >> and after triggers. >> >> >>> >>> >>>> >>>> >>>>> >> >>>>> > >>>>> > Or we can consider a hybrid approach of getting the rows in batches >>>>> of >>>>> > 1000 or so if possible as well. That ways they get into coordinator >>>>> > memory in one shot and can be processed in batches. Obviously this >>>>> > should be considered if it's not going to be a complicated >>>>> > implementation. >>>>> >>>>> It just occurred to me that it would not be that hard to optimize the >>>>> row-fetching-by-ctid as shown below: >>>>> 1. When it is time to fire the queued triggers at the >>>>> statement/transaction end, initialize cursors - one cursor per >>>>> datanode - which would do: SELECT remote_heap_fetch(table_name, >>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even list. >>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT using >>>>> the appropriate cursor name according to the datanode id to which the >>>>> trigger entry belongs. >>>>> >>>>> > >>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have >>>>> >>> created another patch to fetch only the required attribtues, but >>>>> have >>>>> >>> not merged that into this patch. >>>>> > >>>>> > Do we have other places where we unnecessary fetch all attributes? >>>>> > ISTM, this should be fixed as a performance improvement first ahead >>>>> of >>>>> > everything else. >>>>> >>>>> I believe DML subplan is the only remaining place where we fetch all >>>>> attributes. And yes, this is a must-have for triggers, otherwise, the >>>>> other optimizations would be of no use. >>>>> >>>>> > >>>>> >>> 2. One important TODO for BEFORE trigger is this: Just before >>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked >>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >>>>> >>> from the table. So it is made sure that while all the triggers for >>>>> >>> that table are executed, no one can update that particular row. >>>>> >>> In the patch, we haven't locked the row. We need to lock it either >>>>> by >>>>> >>> executing : >>>>> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and >>>>> then >>>>> >>> use the returned ROW as the OLD row. >>>>> >>> OR >>>>> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so >>>>> that >>>>> >>> the row is already locked, and we don't have to lock it again. >>>>> >>> #2 is simple though it might cause some amount of longer waits in >>>>> general. >>>>> >>> Using #1, though the locks would be acquired only when the >>>>> particular >>>>> >>> row is updated, the locks would be released only after transaction >>>>> >>> end, so #1 might not be worth implementing. >>>>> >>> Also #1 requires another explicit remote fetch for the >>>>> >>> lock-and-get-latest-version operation. >>>>> >>> I am more inclined towards #2. >>>>> >>> >>>>> >> The option #2 however, has problem of locking too many rows if >>>>> there are >>>>> >> coordinator quals in the subplans IOW the number of rows finally >>>>> updated are >>>>> >> lesser than the number of rows fetched from the datanode. It can >>>>> cause >>>>> >> unwanted deadlocks. Unless there is a way to release these extra >>>>> locks, I am >>>>> >> afraid this option will be a problem. >>>>> >>>>> True. Regardless of anything else - whether it is deadlocks or longer >>>>> waits, we should not lock rows that are not to be updated. >>>>> >>>>> There is a more general row-locking issue that we need to solve first >>>>> : 3606317. I anticipate that solving this will solve the trigger >>>>> specific lock issue. So for triggers, this is a must-have, and I am >>>>> going to solve this issue as part of this bug 3606317. >>>>> >>>>> >> >>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I do >>>>> > not see deadlock scenarios.. >>>>> > >>>>> > With the FQS shipping work being done by Ashutosh, will we also ship >>>>> > major chunks of subplans to the datanodes? If yes, then row locking >>>>> > will only involve required tuples (hopefully) from the coordinator's >>>>> > point of view. >>>>> > >>>>> > Also, something radical is can be invent a new type of FOR [NODE] >>>>> > UPDATE type lock to minimize the impact of such locking of rows on >>>>> > datanodes? >>>>> > >>>>> > Regards, >>>>> > Nikhils >>>>> > >>>>> >>> >>>>> >>> 3. The BEFORE trigger function can change the distribution column >>>>> >>> itself. We need to add a check at the end of the trigger >>>>> executions. >>>>> >>> >>>>> >> >>>>> >> Good, you thought about that. Yes we should check it. >>>>> >> >>>>> >>> >>>>> >>> 4. Fetching OLD row for WHEN clause handling. >>>>> >>> >>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>>>> >>> >>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to >>>>> work >>>>> >>> without significant changes, but they are yet to be tested. >>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been done >>>>> while >>>>> >>> implementing UPDATE triggers. But some changes specific to INSERT >>>>> and >>>>> >>> DELETE are yet to be done. >>>>> >>> Deferred triggers to be tested. >>>>> >>> >>>>> >>> 7. Regression analysis. There are some new failures. Will post >>>>> another >>>>> >>> fair version of the patch after regression analysis and fixing >>>>> various >>>>> >>> TODOs. >>>>> >>> >>>>> >>> Comments welcome. >>>>> >>> >>>>> >>> >>>>> >>> >>>>> ------------------------------------------------------------------------------ >>>>> >>> Everyone hates slow websites. So do we. >>>>> >>> Make your web apps faster with AppDynamics >>>>> >>> Download AppDynamics Lite for free today: >>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb >>>>> >>> _______________________________________________ >>>>> >>> Postgres-xc-developers mailing list >>>>> >>> Pos...@li... >>>>> >>> >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Best Wishes, >>>>> >> Ashutosh Bapat >>>>> >> EntepriseDB Corporation >>>>> >> The Enterprise Postgres Company >>>>> >> >>>>> >> >>>>> ------------------------------------------------------------------------------ >>>>> >> Everyone hates slow websites. So do we. >>>>> >> Make your web apps faster with AppDynamics >>>>> >> Download AppDynamics Lite for free today: >>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb >>>>> >> _______________________________________________ >>>>> >> Postgres-xc-developers mailing list >>>>> >> Pos...@li... >>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >> >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > StormDB - http://www.stormdb.com >>>>> > The Database Cloud >>>>> > Postgres-XC Support and Service >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |