From: Amit K. <ami...@en...> - 2013-03-04 05:47:22
|
On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >> >> Issue: Whether we should fetch the whole from the datanode (OLD row) and not >> just ctid and node_id and required columns and store it at the coordinator >> for the processing OR whether we should fetch each row (OLD and NEW >> variants) while processing each row. >> >> Both of them have performance impacts - the first one has disk impact for >> large number of rows whereas the second has network impact for querying >> rows. Is it possible to do some analytical assessment as to which of them >> would be better? If you can come up with something concrete (may be numbers >> or formulae) we will be able to judge better as to which one to pick up. Will check if we can come up with some sensible analysis or figures. >> > > Or we can consider a hybrid approach of getting the rows in batches of > 1000 or so if possible as well. That ways they get into coordinator > memory in one shot and can be processed in batches. Obviously this > should be considered if it's not going to be a complicated > implementation. It just occurred to me that it would not be that hard to optimize the row-fetching-by-ctid as shown below: 1. When it is time to fire the queued triggers at the statement/transaction end, initialize cursors - one cursor per datanode - which would do: SELECT remote_heap_fetch(table_name, '<ctidlist>'); We can form this ctidlist out of the trigger even list. 2. For each trigger event entry in the trigger queue, FETCH NEXT using the appropriate cursor name according to the datanode id to which the trigger entry belongs. > >>> Currently we fetch all attributes in the SELECT subplans. I have >>> created another patch to fetch only the required attribtues, but have >>> not merged that into this patch. > > Do we have other places where we unnecessary fetch all attributes? > ISTM, this should be fixed as a performance improvement first ahead of > everything else. I believe DML subplan is the only remaining place where we fetch all attributes. And yes, this is a must-have for triggers, otherwise, the other optimizations would be of no use. > >>> 2. One important TODO for BEFORE trigger is this: Just before >>> invoking the trigger functions, in PG, the tuple is row-locked >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >>> from the table. So it is made sure that while all the triggers for >>> that table are executed, no one can update that particular row. >>> In the patch, we haven't locked the row. We need to lock it either by >>> executing : >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then >>> use the returned ROW as the OLD row. >>> OR >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so that >>> the row is already locked, and we don't have to lock it again. >>> #2 is simple though it might cause some amount of longer waits in general. >>> Using #1, though the locks would be acquired only when the particular >>> row is updated, the locks would be released only after transaction >>> end, so #1 might not be worth implementing. >>> Also #1 requires another explicit remote fetch for the >>> lock-and-get-latest-version operation. >>> I am more inclined towards #2. >>> >> The option #2 however, has problem of locking too many rows if there are >> coordinator quals in the subplans IOW the number of rows finally updated are >> lesser than the number of rows fetched from the datanode. It can cause >> unwanted deadlocks. Unless there is a way to release these extra locks, I am >> afraid this option will be a problem. True. Regardless of anything else - whether it is deadlocks or longer waits, we should not lock rows that are not to be updated. There is a more general row-locking issue that we need to solve first : 3606317. I anticipate that solving this will solve the trigger specific lock issue. So for triggers, this is a must-have, and I am going to solve this issue as part of this bug 3606317. >> > Deadlocks? ISTM, we can get more lock waits because of this but I do > not see deadlock scenarios.. > > With the FQS shipping work being done by Ashutosh, will we also ship > major chunks of subplans to the datanodes? If yes, then row locking > will only involve required tuples (hopefully) from the coordinator's > point of view. > > Also, something radical is can be invent a new type of FOR [NODE] > UPDATE type lock to minimize the impact of such locking of rows on > datanodes? > > Regards, > Nikhils > >>> >>> 3. The BEFORE trigger function can change the distribution column >>> itself. We need to add a check at the end of the trigger executions. >>> >> >> Good, you thought about that. Yes we should check it. >> >>> >>> 4. Fetching OLD row for WHEN clause handling. >>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to work >>> without significant changes, but they are yet to be tested. >>> INSERT/DELETE triggers: Most of the infrastructure has been done while >>> implementing UPDATE triggers. But some changes specific to INSERT and >>> DELETE are yet to be done. >>> Deferred triggers to be tested. >>> >>> 7. Regression analysis. There are some new failures. Will post another >>> fair version of the patch after regression analysis and fixing various >>> TODOs. >>> >>> Comments welcome. >>> >>> >>> ------------------------------------------------------------------------------ >>> Everyone hates slow websites. So do we. >>> Make your web apps faster with AppDynamics >>> Download AppDynamics Lite for free today: >>> http://p.sf.net/sfu/appdyn_d2d_feb >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > Postgres-XC Support and Service |