From: Amit K. <ami...@en...> - 2013-02-19 04:38:56
Attachments:
before_after_update_row_WIP.patch
|
Below are details that cover various points that need to be considered for the implemention of BEFORE UPDATE and AFTER UPDATE row triggers. Because UPDATE covers issues that might occur for both DELETE as well as INSERT, I have targetted UPDATE first. Attached is the WIP patch if you need to peek into specifics. AFTER triggers -------------- AFTER triggers are executed only at the end of statement/transaction. The trigger data is saved in a trigger event queue along with the OLD and NEW ctid. And just before the query/transaction end, the OLD and NEW tuples corresponding to the OLD and NEW ctid are fetched using heap_fetch(SnapshotAny), and each of the saved trigger events are executed. For XC (in the patch), we fetch the tuple remotely from the datanode. And, in addition to ctid, we store the xc_node_id as well in the trigger event data. Created a function to do the remote fetch; this function would be called instead of heap_fetch(). To get NEW row ctid, we add junk attributes ctid and node_id in the RETURNING target list of UPDATE. To get OLD row ctid, we already have the ctid in the SELECT subplan. Currently we fetch all attributes in the SELECT subplans. I have created another patch to fetch only the required attribtues, but have not merged that into this patch. Ultimately, when triggers are present, we want the SELECT to look like this: SELECT ctid, xc_node_id from tab1 Another way to do the AFTER ROW triggers would be to store all the new and old rows in the coordinator memory instead of fetching it when needed. This has a potential of disk writes for large number of update rows. When there is a WHEN condition on OLD and NEW row for AFTER triggers, the WHEN condition is evaluated immediately, and is not deferred until the query/transaction end, and all the triggers that are evaluated to false are not added in the event queue. So if WHEN is present, we need to make it available by fetching the OLD row using SELECT sublan. For this we add another junk attribute wholerow in the SELECT target list. So in this case, the SELECT will look like this: SELECT ctid, xc_node_id, tab1.*::tab1 from tab1 BEFORE triggers --------------- These are executed just before the row is inserted/updated. There can be multiple triggers on the same table. The return ROW of a trigger function is the NEW row of the next trigger, while the OLD row remains same across all of the triggers. So the NEW row can keep changing because the trigger function can modify the NEW row before returning. Ultimately the last NEW row is the one that gets into the table. We need to add wholerow junk attribute in SELECT for fetching the OLD row. And the NEW row is created by overwriting the OLD row values by the modified column values. Once we have OLD and NEW tuples, the rest of the work is taken care of usign existing trigger PG implementation. We have been using the subplan slot to generate the data for BINDing the parameter values for UPDATE. Now for BEFORE ROW triggers, the data of NEW row is the one that is to be BOUND. And this NEW tuple slot does not have the ctid and node_id junk attribute values. So we need to generate the BIND data using the data from NEW row and the junk attributes from the source data. Similarly there are changes in the way we generate the parameters markers for the remote UPDATE statement. TODOs: ====== 1. Current implementation patch has a very basic working model. There are various TODOs marked in the patch. 2. One important TODO for BEFORE trigger is this: Just before invoking the trigger functions, in PG, the tuple is row-locked (exclusive) by GetTupleTrigger() and the locked version is fetched from the table. So it is made sure that while all the triggers for that table are executed, no one can update that particular row. In the patch, we haven't locked the row. We need to lock it either by executing : 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then use the returned ROW as the OLD row. OR 2. The UPDATE subplan itself should have SELECT for UPDATE so that the row is already locked, and we don't have to lock it again. #2 is simple though it might cause some amount of longer waits in general. Using #1, though the locks would be acquired only when the particular row is updated, the locks would be released only after transaction end, so #1 might not be worth implementing. Also #1 requires another explicit remote fetch for the lock-and-get-latest-version operation. I am more inclined towards #2. 3. The BEFORE trigger function can change the distribution column itself. We need to add a check at the end of the trigger executions. 4. Fetching OLD row for WHEN clause handling. 5. Testing with mix of Shippable and non-shippable ROW triggers 6. Other types of triggers. INSTEAD triggers are anticipated to work without significant changes, but they are yet to be tested. INSERT/DELETE triggers: Most of the infrastructure has been done while implementing UPDATE triggers. But some changes specific to INSERT and DELETE are yet to be done. Deferred triggers to be tested. 7. Regression analysis. There are some new failures. Will post another fair version of the patch after regression analysis and fixing various TODOs. Comments welcome. |
From: Ashutosh B. <ash...@en...> - 2013-03-01 07:39:39
|
Hi Amit, Your write-up summarizes the requirements well. Thanks for the write-up. Let me comment on the open issues written below Issue: Whether we should fetch the whole from the datanode (OLD row) and not just ctid and node_id and required columns and store it at the coordinator for the processing OR whether we should fetch each row (OLD and NEW variants) while processing each row. Both of them have performance impacts - the first one has disk impact for large number of rows whereas the second has network impact for querying rows. Is it possible to do some analytical assessment as to which of them would be better? If you can come up with something concrete (may be numbers or formulae) we will be able to judge better as to which one to pick up. See few other comments inline On Tue, Feb 19, 2013 at 10:08 AM, Amit Khandekar < ami...@en...> wrote: > Below are details that cover various points that need to be > considered for the implemention of BEFORE UPDATE and AFTER UPDATE row > triggers. Because UPDATE covers issues that might occur for both > DELETE as well as INSERT, I have targetted UPDATE first. Attached is > the WIP patch if you need to peek into specifics. > > AFTER triggers > -------------- > > AFTER triggers are executed only at the end of statement/transaction. > The trigger data is saved in a trigger event queue along with the OLD > and NEW ctid. And just before the query/transaction end, the OLD and > NEW tuples corresponding to the OLD and NEW ctid are fetched using > heap_fetch(SnapshotAny), and each of the saved trigger events are > executed. For XC (in the patch), we fetch the tuple remotely from the > datanode. And, in addition to ctid, we store the xc_node_id as well in > the trigger event data. Created a function to do the remote fetch; > this function would be called instead of heap_fetch(). > > To get NEW row ctid, we add junk attributes ctid and node_id in the > RETURNING target list of UPDATE. > To get OLD row ctid, we already have the ctid in the SELECT subplan. > > Currently we fetch all attributes in the SELECT subplans. I have > created another patch to fetch only the required attribtues, but have > not merged that into this patch. Ultimately, when triggers are > present, we want the SELECT to look like this: > SELECT ctid, xc_node_id from tab1 > > Another way to do the AFTER ROW triggers would be to store all the new > and old rows in the coordinator memory instead of fetching it when > needed. This has a potential of disk writes for large number of update > rows. > > When there is a WHEN condition on OLD and NEW row for AFTER triggers, > the WHEN condition is evaluated immediately, and is not deferred until > the query/transaction end, and all the triggers that are evaluated to > false are not added in the event queue. So if WHEN is present, we need > to make it available by fetching the OLD row using SELECT sublan. For > this we add another junk attribute wholerow in the SELECT target list. > So in this case, the SELECT will look like this: > SELECT ctid, xc_node_id, tab1.*::tab1 from tab1 > > > > BEFORE triggers > --------------- > > These are executed just before the row is inserted/updated. There can > be multiple triggers on the same table. The return ROW of a trigger > function is the NEW row of the next trigger, while the OLD row remains > same across all of the triggers. So the NEW row can keep changing > because the trigger function can modify the NEW row before returning. > Ultimately the last NEW row is the one that gets into the table. > > We need to add wholerow junk attribute in SELECT for fetching the OLD > row. And the NEW row is created by overwriting the OLD row values by > the modified column values. Once we have OLD and NEW tuples, the rest > of the work is taken care of usign existing trigger PG implementation. > > We have been using the subplan slot to generate the data for BINDing > the parameter values for UPDATE. Now for BEFORE ROW triggers, the data > of NEW row is the one that is to be BOUND. And this NEW tuple slot > does not have the ctid and node_id junk attribute values. So we need > to generate the BIND data using the data from NEW row and the junk > attributes from the source data. > Similarly there are changes in the way we generate the parameters > markers for the remote UPDATE statement. > > > TODOs: > ====== > > 1. Current implementation patch has a very basic working model. There > are various TODOs marked in the patch. > > 2. One important TODO for BEFORE trigger is this: Just before > invoking the trigger functions, in PG, the tuple is row-locked > (exclusive) by GetTupleTrigger() and the locked version is fetched > from the table. So it is made sure that while all the triggers for > that table are executed, no one can update that particular row. > In the patch, we haven't locked the row. We need to lock it either by > executing : > 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then > use the returned ROW as the OLD row. > OR > 2. The UPDATE subplan itself should have SELECT for UPDATE so that > the row is already locked, and we don't have to lock it again. > #2 is simple though it might cause some amount of longer waits in general. > Using #1, though the locks would be acquired only when the particular > row is updated, the locks would be released only after transaction > end, so #1 might not be worth implementing. > Also #1 requires another explicit remote fetch for the > lock-and-get-latest-version operation. > I am more inclined towards #2. > > The option #2 however, has problem of locking too many rows if there are coordinator quals in the subplans IOW the number of rows finally updated are lesser than the number of rows fetched from the datanode. It can cause unwanted deadlocks. Unless there is a way to release these extra locks, I am afraid this option will be a problem. > 3. The BEFORE trigger function can change the distribution column > itself. We need to add a check at the end of the trigger executions. > > Good, you thought about that. Yes we should check it. > 4. Fetching OLD row for WHEN clause handling. > > 5. Testing with mix of Shippable and non-shippable ROW triggers > > 6. Other types of triggers. INSTEAD triggers are anticipated to work > without significant changes, but they are yet to be tested. > INSERT/DELETE triggers: Most of the infrastructure has been done while > implementing UPDATE triggers. But some changes specific to INSERT and > DELETE are yet to be done. > Deferred triggers to be tested. > > 7. Regression analysis. There are some new failures. Will post another > fair version of the patch after regression analysis and fixing various > TODOs. > > Comments welcome. > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2013-03-01 08:56:51
|
On Fri, Mar 1, 2013 at 1:53 PM, Nikhil Sontakke <ni...@st...> wrote: > > > > Issue: Whether we should fetch the whole from the datanode (OLD row) and > not > > just ctid and node_id and required columns and store it at the > coordinator > > for the processing OR whether we should fetch each row (OLD and NEW > > variants) while processing each row. > > > > Both of them have performance impacts - the first one has disk impact for > > large number of rows whereas the second has network impact for querying > > rows. Is it possible to do some analytical assessment as to which of them > > would be better? If you can come up with something concrete (may be > numbers > > or formulae) we will be able to judge better as to which one to pick up. > > > > Or we can consider a hybrid approach of getting the rows in batches of > 1000 or so if possible as well. That ways they get into coordinator > memory in one shot and can be processed in batches. Obviously this > should be considered if it's not going to be a complicated > implementation. > > >> Currently we fetch all attributes in the SELECT subplans. I have > >> created another patch to fetch only the required attribtues, but have > >> not merged that into this patch. > > Do we have other places where we unnecessary fetch all attributes? > ISTM, this should be fixed as a performance improvement first ahead of > everything else. > > >> 2. One important TODO for BEFORE trigger is this: Just before > >> invoking the trigger functions, in PG, the tuple is row-locked > >> (exclusive) by GetTupleTrigger() and the locked version is fetched > >> from the table. So it is made sure that while all the triggers for > >> that table are executed, no one can update that particular row. > >> In the patch, we haven't locked the row. We need to lock it either by > >> executing : > >> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then > >> use the returned ROW as the OLD row. > >> OR > >> 2. The UPDATE subplan itself should have SELECT for UPDATE so that > >> the row is already locked, and we don't have to lock it again. > >> #2 is simple though it might cause some amount of longer waits in > general. > >> Using #1, though the locks would be acquired only when the particular > >> row is updated, the locks would be released only after transaction > >> end, so #1 might not be worth implementing. > >> Also #1 requires another explicit remote fetch for the > >> lock-and-get-latest-version operation. > >> I am more inclined towards #2. > >> > > The option #2 however, has problem of locking too many rows if there are > > coordinator quals in the subplans IOW the number of rows finally updated > are > > lesser than the number of rows fetched from the datanode. It can cause > > unwanted deadlocks. Unless there is a way to release these extra locks, > I am > > afraid this option will be a problem. > > > Deadlocks? ISTM, we can get more lock waits because of this but I do > not see deadlock scenarios.. > :) Simple rule is: if DBMS is locking transaction-long resources which application doesn't expect, there are bound to be deadlocks. The application induced deadlocks. An application would think that statement A would not lock row rA because it's not being updated, but actually it gets locked for UPDATE because of option #2 it locks it. The same application would assume that statement B would not lock row rB the same way. WIth this context consider following sequence of events Statement A updates row rB Statement B updates row rA Statement A tries to update row rM, but XC tries to lock rA (because of option #2) and waits Statement B tries to update row rN, but XC tries to lock rB (because of option #2) and waits None of A and B can proceed and thus deadlock, even if the application doesn't expect those to deadlock. > > With the FQS shipping work being done by Ashutosh, will we also ship > major chunks of subplans to the datanodes? If yes, then row locking > will only involve required tuples (hopefully) from the coordinator's > point of view. > > The push-down will work only when there shippable subplans, but if they are not ... > Also, something radical is can be invent a new type of FOR [NODE] > UPDATE type lock to minimize the impact of such locking of rows on > datanodes? > > Regards, > Nikhils > > >> > >> 3. The BEFORE trigger function can change the distribution column > >> itself. We need to add a check at the end of the trigger executions. > >> > > > > Good, you thought about that. Yes we should check it. > > > >> > >> 4. Fetching OLD row for WHEN clause handling. > >> > >> 5. Testing with mix of Shippable and non-shippable ROW triggers > >> > >> 6. Other types of triggers. INSTEAD triggers are anticipated to work > >> without significant changes, but they are yet to be tested. > >> INSERT/DELETE triggers: Most of the infrastructure has been done while > >> implementing UPDATE triggers. But some changes specific to INSERT and > >> DELETE are yet to be done. > >> Deferred triggers to be tested. > >> > >> 7. Regression analysis. There are some new failures. Will post another > >> fair version of the patch after regression analysis and fixing various > >> TODOs. > >> > >> Comments welcome. > >> > >> > >> > ------------------------------------------------------------------------------ > >> Everyone hates slow websites. So do we. > >> Make your web apps faster with AppDynamics > >> Download AppDynamics Lite for free today: > >> http://p.sf.net/sfu/appdyn_d2d_feb > >> _______________________________________________ > >> Postgres-xc-developers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > > > > > > > -- > > Best Wishes, > > Ashutosh Bapat > > EntepriseDB Corporation > > The Enterprise Postgres Company > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_feb > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > Postgres-XC Support and Service > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Nikhil S. <ni...@st...> - 2013-03-02 01:51:14
|
Hmmm, this is going to be a pretty common scenario then. So no option other than #1 I guess. Regards, Nikhils >> Deadlocks? ISTM, we can get more lock waits because of this but I do >> not see deadlock scenarios.. > > > :) Simple rule is: if DBMS is locking transaction-long resources which > application doesn't expect, there are bound to be deadlocks. > > The application induced deadlocks. An application would think that statement > A would not lock row rA because it's not being updated, but actually it gets > locked for UPDATE because of option #2 it locks it. The same application > would assume that statement B would not lock row rB the same way. WIth this > context consider following sequence of events > > Statement A updates row rB > > Statement B updates row rA > > Statement A tries to update row rM, but XC tries to lock rA (because of > option #2) and waits > > Statement B tries to update row rN, but XC tries to lock rB (because of > option #2) and waits > > None of A and B can proceed and thus deadlock, even if the application > doesn't expect those to deadlock. > >> >> >> With the FQS shipping work being done by Ashutosh, will we also ship >> major chunks of subplans to the datanodes? If yes, then row locking >> will only involve required tuples (hopefully) from the coordinator's >> point of view. >> > > The push-down will work only when there shippable subplans, but if they are > not ... > >> >> Also, something radical is can be invent a new type of FOR [NODE] >> UPDATE type lock to minimize the impact of such locking of rows on >> datanodes? >> >> Regards, >> Nikhils >> >> >> >> >> 3. The BEFORE trigger function can change the distribution column >> >> itself. We need to add a check at the end of the trigger executions. >> >> >> > >> > Good, you thought about that. Yes we should check it. >> > >> >> >> >> 4. Fetching OLD row for WHEN clause handling. >> >> >> >> 5. Testing with mix of Shippable and non-shippable ROW triggers >> >> >> >> 6. Other types of triggers. INSTEAD triggers are anticipated to work >> >> without significant changes, but they are yet to be tested. >> >> INSERT/DELETE triggers: Most of the infrastructure has been done while >> >> implementing UPDATE triggers. But some changes specific to INSERT and >> >> DELETE are yet to be done. >> >> Deferred triggers to be tested. >> >> >> >> 7. Regression analysis. There are some new failures. Will post another >> >> fair version of the patch after regression analysis and fixing various >> >> TODOs. >> >> >> >> Comments welcome. >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Everyone hates slow websites. So do we. >> >> Make your web apps faster with AppDynamics >> >> Download AppDynamics Lite for free today: >> >> http://p.sf.net/sfu/appdyn_d2d_feb >> >> _______________________________________________ >> >> Postgres-xc-developers mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> > >> > >> > >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> > >> > ------------------------------------------------------------------------------ >> > Everyone hates slow websites. So do we. >> > Make your web apps faster with AppDynamics >> > Download AppDynamics Lite for free today: >> > http://p.sf.net/sfu/appdyn_d2d_feb >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud >> Postgres-XC Support and Service > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company -- StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Service |
From: Nikhil S. <ni...@st...> - 2013-03-01 19:22:38
|
> > Issue: Whether we should fetch the whole from the datanode (OLD row) and not > just ctid and node_id and required columns and store it at the coordinator > for the processing OR whether we should fetch each row (OLD and NEW > variants) while processing each row. > > Both of them have performance impacts - the first one has disk impact for > large number of rows whereas the second has network impact for querying > rows. Is it possible to do some analytical assessment as to which of them > would be better? If you can come up with something concrete (may be numbers > or formulae) we will be able to judge better as to which one to pick up. > Or we can consider a hybrid approach of getting the rows in batches of 1000 or so if possible as well. That ways they get into coordinator memory in one shot and can be processed in batches. Obviously this should be considered if it's not going to be a complicated implementation. >> Currently we fetch all attributes in the SELECT subplans. I have >> created another patch to fetch only the required attribtues, but have >> not merged that into this patch. Do we have other places where we unnecessary fetch all attributes? ISTM, this should be fixed as a performance improvement first ahead of everything else. >> 2. One important TODO for BEFORE trigger is this: Just before >> invoking the trigger functions, in PG, the tuple is row-locked >> (exclusive) by GetTupleTrigger() and the locked version is fetched >> from the table. So it is made sure that while all the triggers for >> that table are executed, no one can update that particular row. >> In the patch, we haven't locked the row. We need to lock it either by >> executing : >> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then >> use the returned ROW as the OLD row. >> OR >> 2. The UPDATE subplan itself should have SELECT for UPDATE so that >> the row is already locked, and we don't have to lock it again. >> #2 is simple though it might cause some amount of longer waits in general. >> Using #1, though the locks would be acquired only when the particular >> row is updated, the locks would be released only after transaction >> end, so #1 might not be worth implementing. >> Also #1 requires another explicit remote fetch for the >> lock-and-get-latest-version operation. >> I am more inclined towards #2. >> > The option #2 however, has problem of locking too many rows if there are > coordinator quals in the subplans IOW the number of rows finally updated are > lesser than the number of rows fetched from the datanode. It can cause > unwanted deadlocks. Unless there is a way to release these extra locks, I am > afraid this option will be a problem. > Deadlocks? ISTM, we can get more lock waits because of this but I do not see deadlock scenarios.. With the FQS shipping work being done by Ashutosh, will we also ship major chunks of subplans to the datanodes? If yes, then row locking will only involve required tuples (hopefully) from the coordinator's point of view. Also, something radical is can be invent a new type of FOR [NODE] UPDATE type lock to minimize the impact of such locking of rows on datanodes? Regards, Nikhils >> >> 3. The BEFORE trigger function can change the distribution column >> itself. We need to add a check at the end of the trigger executions. >> > > Good, you thought about that. Yes we should check it. > >> >> 4. Fetching OLD row for WHEN clause handling. >> >> 5. Testing with mix of Shippable and non-shippable ROW triggers >> >> 6. Other types of triggers. INSTEAD triggers are anticipated to work >> without significant changes, but they are yet to be tested. >> INSERT/DELETE triggers: Most of the infrastructure has been done while >> implementing UPDATE triggers. But some changes specific to INSERT and >> DELETE are yet to be done. >> Deferred triggers to be tested. >> >> 7. Regression analysis. There are some new failures. Will post another >> fair version of the patch after regression analysis and fixing various >> TODOs. >> >> Comments welcome. >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Service |
From: Amit K. <ami...@en...> - 2013-03-04 05:47:22
|
On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >> >> Issue: Whether we should fetch the whole from the datanode (OLD row) and not >> just ctid and node_id and required columns and store it at the coordinator >> for the processing OR whether we should fetch each row (OLD and NEW >> variants) while processing each row. >> >> Both of them have performance impacts - the first one has disk impact for >> large number of rows whereas the second has network impact for querying >> rows. Is it possible to do some analytical assessment as to which of them >> would be better? If you can come up with something concrete (may be numbers >> or formulae) we will be able to judge better as to which one to pick up. Will check if we can come up with some sensible analysis or figures. >> > > Or we can consider a hybrid approach of getting the rows in batches of > 1000 or so if possible as well. That ways they get into coordinator > memory in one shot and can be processed in batches. Obviously this > should be considered if it's not going to be a complicated > implementation. It just occurred to me that it would not be that hard to optimize the row-fetching-by-ctid as shown below: 1. When it is time to fire the queued triggers at the statement/transaction end, initialize cursors - one cursor per datanode - which would do: SELECT remote_heap_fetch(table_name, '<ctidlist>'); We can form this ctidlist out of the trigger even list. 2. For each trigger event entry in the trigger queue, FETCH NEXT using the appropriate cursor name according to the datanode id to which the trigger entry belongs. > >>> Currently we fetch all attributes in the SELECT subplans. I have >>> created another patch to fetch only the required attribtues, but have >>> not merged that into this patch. > > Do we have other places where we unnecessary fetch all attributes? > ISTM, this should be fixed as a performance improvement first ahead of > everything else. I believe DML subplan is the only remaining place where we fetch all attributes. And yes, this is a must-have for triggers, otherwise, the other optimizations would be of no use. > >>> 2. One important TODO for BEFORE trigger is this: Just before >>> invoking the trigger functions, in PG, the tuple is row-locked >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >>> from the table. So it is made sure that while all the triggers for >>> that table are executed, no one can update that particular row. >>> In the patch, we haven't locked the row. We need to lock it either by >>> executing : >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then >>> use the returned ROW as the OLD row. >>> OR >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so that >>> the row is already locked, and we don't have to lock it again. >>> #2 is simple though it might cause some amount of longer waits in general. >>> Using #1, though the locks would be acquired only when the particular >>> row is updated, the locks would be released only after transaction >>> end, so #1 might not be worth implementing. >>> Also #1 requires another explicit remote fetch for the >>> lock-and-get-latest-version operation. >>> I am more inclined towards #2. >>> >> The option #2 however, has problem of locking too many rows if there are >> coordinator quals in the subplans IOW the number of rows finally updated are >> lesser than the number of rows fetched from the datanode. It can cause >> unwanted deadlocks. Unless there is a way to release these extra locks, I am >> afraid this option will be a problem. True. Regardless of anything else - whether it is deadlocks or longer waits, we should not lock rows that are not to be updated. There is a more general row-locking issue that we need to solve first : 3606317. I anticipate that solving this will solve the trigger specific lock issue. So for triggers, this is a must-have, and I am going to solve this issue as part of this bug 3606317. >> > Deadlocks? ISTM, we can get more lock waits because of this but I do > not see deadlock scenarios.. > > With the FQS shipping work being done by Ashutosh, will we also ship > major chunks of subplans to the datanodes? If yes, then row locking > will only involve required tuples (hopefully) from the coordinator's > point of view. > > Also, something radical is can be invent a new type of FOR [NODE] > UPDATE type lock to minimize the impact of such locking of rows on > datanodes? > > Regards, > Nikhils > >>> >>> 3. The BEFORE trigger function can change the distribution column >>> itself. We need to add a check at the end of the trigger executions. >>> >> >> Good, you thought about that. Yes we should check it. >> >>> >>> 4. Fetching OLD row for WHEN clause handling. >>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to work >>> without significant changes, but they are yet to be tested. >>> INSERT/DELETE triggers: Most of the infrastructure has been done while >>> implementing UPDATE triggers. But some changes specific to INSERT and >>> DELETE are yet to be done. >>> Deferred triggers to be tested. >>> >>> 7. Regression analysis. There are some new failures. Will post another >>> fair version of the patch after regression analysis and fixing various >>> TODOs. >>> >>> Comments welcome. >>> >>> >>> ------------------------------------------------------------------------------ >>> Everyone hates slow websites. So do we. >>> Make your web apps faster with AppDynamics >>> Download AppDynamics Lite for free today: >>> http://p.sf.net/sfu/appdyn_d2d_feb >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > Postgres-XC Support and Service |
From: Amit K. <ami...@en...> - 2013-03-26 03:26:53
|
On 4 March 2013 11:11, Amit Khandekar <ami...@en...>wrote: > On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: > >> > >> Issue: Whether we should fetch the whole from the datanode (OLD row) > and not > >> just ctid and node_id and required columns and store it at the > coordinator > >> for the processing OR whether we should fetch each row (OLD and NEW > >> variants) while processing each row. > >> > >> Both of them have performance impacts - the first one has disk impact > for > >> large number of rows whereas the second has network impact for querying > >> rows. Is it possible to do some analytical assessment as to which of > them > >> would be better? If you can come up with something concrete (may be > numbers > >> or formulae) we will be able to judge better as to which one to pick up. > > Will check if we can come up with some sensible analysis or figures. > > I have done some analysis on both of these approaches here: https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing In practical terms, we anyways would need to implement (B). The reason is because when the trigger has conditional execution(WHEN clause) we *have* to fetch the rows beforehand, so there is no point in fetching all of them again at the end of the statement when we already have them locally. So may be it would be too ambitious to have have both implementations, at least for this release. So I am focussing on (B) right now. We have two options: 1. Store all rows in palloced memory, and save the HeapTuple pointers in the trigger queue, and directly access the OLD and NEW rows using these pointers when needed. Here we will have no control over how much memory we should use for the old and new records, and this might even hamper system performance, let alone XC performance. 2. Other option is to use tuplestore. Here, we need to store the positions of the records in the tuplestore. So for a particular tigger event, fetch by the position. From what I understand, tuplestore can be advanced only sequentially in either direction. So when the read pointer is at position 6 and we need to fetch a record at position 10, we need to call tuplestore_advance() 4 times, and this call involves palloc/pfree overhead because it calls tuplestore_gettuple(). But the trigger records are not distributed so randomly. In fact a set of trigger events for a particular event id are accessed in the same order as the order in which they are queued. So for a particular event id, only the first access call will require random access. tuplestore supports multiple read pointers, so may be we can make use of that to access the first record using the closest read pointer. > >> > > > > Or we can consider a hybrid approach of getting the rows in batches of > > 1000 or so if possible as well. That ways they get into coordinator > > memory in one shot and can be processed in batches. Obviously this > > should be considered if it's not going to be a complicated > > implementation. > > It just occurred to me that it would not be that hard to optimize the > row-fetching-by-ctid as shown below: > 1. When it is time to fire the queued triggers at the > statement/transaction end, initialize cursors - one cursor per > datanode - which would do: SELECT remote_heap_fetch(table_name, > '<ctidlist>'); We can form this ctidlist out of the trigger even list. > 2. For each trigger event entry in the trigger queue, FETCH NEXT using > the appropriate cursor name according to the datanode id to which the > trigger entry belongs. > > > > >>> Currently we fetch all attributes in the SELECT subplans. I have > >>> created another patch to fetch only the required attribtues, but have > >>> not merged that into this patch. > > > > Do we have other places where we unnecessary fetch all attributes? > > ISTM, this should be fixed as a performance improvement first ahead of > > everything else. > > I believe DML subplan is the only remaining place where we fetch all > attributes. And yes, this is a must-have for triggers, otherwise, the > other optimizations would be of no use. > > > > >>> 2. One important TODO for BEFORE trigger is this: Just before > >>> invoking the trigger functions, in PG, the tuple is row-locked > >>> (exclusive) by GetTupleTrigger() and the locked version is fetched > >>> from the table. So it is made sure that while all the triggers for > >>> that table are executed, no one can update that particular row. > >>> In the patch, we haven't locked the row. We need to lock it either by > >>> executing : > >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then > >>> use the returned ROW as the OLD row. > >>> OR > >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so that > >>> the row is already locked, and we don't have to lock it again. > >>> #2 is simple though it might cause some amount of longer waits in > general. > >>> Using #1, though the locks would be acquired only when the particular > >>> row is updated, the locks would be released only after transaction > >>> end, so #1 might not be worth implementing. > >>> Also #1 requires another explicit remote fetch for the > >>> lock-and-get-latest-version operation. > >>> I am more inclined towards #2. > >>> > >> The option #2 however, has problem of locking too many rows if there are > >> coordinator quals in the subplans IOW the number of rows finally > updated are > >> lesser than the number of rows fetched from the datanode. It can cause > >> unwanted deadlocks. Unless there is a way to release these extra locks, > I am > >> afraid this option will be a problem. > > True. Regardless of anything else - whether it is deadlocks or longer > waits, we should not lock rows that are not to be updated. > > There is a more general row-locking issue that we need to solve first > : 3606317. I anticipate that solving this will solve the trigger > specific lock issue. So for triggers, this is a must-have, and I am > going to solve this issue as part of this bug 3606317. > > >> > > Deadlocks? ISTM, we can get more lock waits because of this but I do > > not see deadlock scenarios.. > > > > With the FQS shipping work being done by Ashutosh, will we also ship > > major chunks of subplans to the datanodes? If yes, then row locking > > will only involve required tuples (hopefully) from the coordinator's > > point of view. > > > > Also, something radical is can be invent a new type of FOR [NODE] > > UPDATE type lock to minimize the impact of such locking of rows on > > datanodes? > > > > Regards, > > Nikhils > > > >>> > >>> 3. The BEFORE trigger function can change the distribution column > >>> itself. We need to add a check at the end of the trigger executions. > >>> > >> > >> Good, you thought about that. Yes we should check it. > >> > >>> > >>> 4. Fetching OLD row for WHEN clause handling. > >>> > >>> 5. Testing with mix of Shippable and non-shippable ROW triggers > >>> > >>> 6. Other types of triggers. INSTEAD triggers are anticipated to work > >>> without significant changes, but they are yet to be tested. > >>> INSERT/DELETE triggers: Most of the infrastructure has been done while > >>> implementing UPDATE triggers. But some changes specific to INSERT and > >>> DELETE are yet to be done. > >>> Deferred triggers to be tested. > >>> > >>> 7. Regression analysis. There are some new failures. Will post another > >>> fair version of the patch after regression analysis and fixing various > >>> TODOs. > >>> > >>> Comments welcome. > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Everyone hates slow websites. So do we. > >>> Make your web apps faster with AppDynamics > >>> Download AppDynamics Lite for free today: > >>> http://p.sf.net/sfu/appdyn_d2d_feb > >>> _______________________________________________ > >>> Postgres-xc-developers mailing list > >>> Pos...@li... > >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >>> > >> > >> > >> > >> -- > >> Best Wishes, > >> Ashutosh Bapat > >> EntepriseDB Corporation > >> The Enterprise Postgres Company > >> > >> > ------------------------------------------------------------------------------ > >> Everyone hates slow websites. So do we. > >> Make your web apps faster with AppDynamics > >> Download AppDynamics Lite for free today: > >> http://p.sf.net/sfu/appdyn_d2d_feb > >> _______________________________________________ > >> Postgres-xc-developers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > > > > > > > -- > > StormDB - http://www.stormdb.com > > The Database Cloud > > Postgres-XC Support and Service > |
From: Ashutosh B. <ash...@en...> - 2013-03-26 10:23:23
|
On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < ami...@en...> wrote: > > > On 4 March 2013 11:11, Amit Khandekar <ami...@en...>wrote: > >> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >> >> >> >> Issue: Whether we should fetch the whole from the datanode (OLD row) >> and not >> >> just ctid and node_id and required columns and store it at the >> coordinator >> >> for the processing OR whether we should fetch each row (OLD and NEW >> >> variants) while processing each row. >> >> >> >> Both of them have performance impacts - the first one has disk impact >> for >> >> large number of rows whereas the second has network impact for querying >> >> rows. Is it possible to do some analytical assessment as to which of >> them >> >> would be better? If you can come up with something concrete (may be >> numbers >> >> or formulae) we will be able to judge better as to which one to pick >> up. >> >> Will check if we can come up with some sensible analysis or figures. >> >> > I have done some analysis on both of these approaches here: > > https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing > > In practical terms, we anyways would need to implement (B). The reason is > because when the trigger has conditional execution(WHEN clause) we *have* > to fetch the rows beforehand, so there is no point in fetching all of them > again at the end of the statement when we already have them locally. So may > be it would be too ambitious to have have both implementations, at least > for this release. > > I agree here. We can certainly optimize for various cases later, but we should have something which would give all the functionality (albeit at a lower performance for now). > So I am focussing on (B) right now. We have two options: > > 1. Store all rows in palloced memory, and save the HeapTuple pointers in > the trigger queue, and directly access the OLD and NEW rows using these > pointers when needed. Here we will have no control over how much memory we > should use for the old and new records, and this might even hamper system > performance, let alone XC performance. > 2. Other option is to use tuplestore. Here, we need to store the positions > of the records in the tuplestore. So for a particular tigger event, fetch > by the position. From what I understand, tuplestore can be advanced only > sequentially in either direction. So when the read pointer is at position 6 > and we need to fetch a record at position 10, we need to call > tuplestore_advance() 4 times, and this call involves palloc/pfree overhead > because it calls tuplestore_gettuple(). But the trigger records are not > distributed so randomly. In fact a set of trigger events for a particular > event id are accessed in the same order as the order in which they are > queued. So for a particular event id, only the first access call will > require random access. tuplestore supports multiple read pointers, so may > be we can make use of that to access the first record using the closest > read pointer. > > Using palloc will be a problem if the size of data fetched is more that what could fit in memory. Also pallocing frequently is going to be performance problem. Let's see how does the tuple store approach go. > > >> >> >> > >> > Or we can consider a hybrid approach of getting the rows in batches of >> > 1000 or so if possible as well. That ways they get into coordinator >> > memory in one shot and can be processed in batches. Obviously this >> > should be considered if it's not going to be a complicated >> > implementation. >> >> It just occurred to me that it would not be that hard to optimize the >> row-fetching-by-ctid as shown below: >> 1. When it is time to fire the queued triggers at the >> statement/transaction end, initialize cursors - one cursor per >> datanode - which would do: SELECT remote_heap_fetch(table_name, >> '<ctidlist>'); We can form this ctidlist out of the trigger even list. >> 2. For each trigger event entry in the trigger queue, FETCH NEXT using >> the appropriate cursor name according to the datanode id to which the >> trigger entry belongs. >> >> > >> >>> Currently we fetch all attributes in the SELECT subplans. I have >> >>> created another patch to fetch only the required attribtues, but have >> >>> not merged that into this patch. >> > >> > Do we have other places where we unnecessary fetch all attributes? >> > ISTM, this should be fixed as a performance improvement first ahead of >> > everything else. >> >> I believe DML subplan is the only remaining place where we fetch all >> attributes. And yes, this is a must-have for triggers, otherwise, the >> other optimizations would be of no use. >> >> > >> >>> 2. One important TODO for BEFORE trigger is this: Just before >> >>> invoking the trigger functions, in PG, the tuple is row-locked >> >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >> >>> from the table. So it is made sure that while all the triggers for >> >>> that table are executed, no one can update that particular row. >> >>> In the patch, we haven't locked the row. We need to lock it either by >> >>> executing : >> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then >> >>> use the returned ROW as the OLD row. >> >>> OR >> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so that >> >>> the row is already locked, and we don't have to lock it again. >> >>> #2 is simple though it might cause some amount of longer waits in >> general. >> >>> Using #1, though the locks would be acquired only when the particular >> >>> row is updated, the locks would be released only after transaction >> >>> end, so #1 might not be worth implementing. >> >>> Also #1 requires another explicit remote fetch for the >> >>> lock-and-get-latest-version operation. >> >>> I am more inclined towards #2. >> >>> >> >> The option #2 however, has problem of locking too many rows if there >> are >> >> coordinator quals in the subplans IOW the number of rows finally >> updated are >> >> lesser than the number of rows fetched from the datanode. It can cause >> >> unwanted deadlocks. Unless there is a way to release these extra >> locks, I am >> >> afraid this option will be a problem. >> >> True. Regardless of anything else - whether it is deadlocks or longer >> waits, we should not lock rows that are not to be updated. >> >> There is a more general row-locking issue that we need to solve first >> : 3606317. I anticipate that solving this will solve the trigger >> specific lock issue. So for triggers, this is a must-have, and I am >> going to solve this issue as part of this bug 3606317. >> >> >> >> > Deadlocks? ISTM, we can get more lock waits because of this but I do >> > not see deadlock scenarios.. >> > >> > With the FQS shipping work being done by Ashutosh, will we also ship >> > major chunks of subplans to the datanodes? If yes, then row locking >> > will only involve required tuples (hopefully) from the coordinator's >> > point of view. >> > >> > Also, something radical is can be invent a new type of FOR [NODE] >> > UPDATE type lock to minimize the impact of such locking of rows on >> > datanodes? >> > >> > Regards, >> > Nikhils >> > >> >>> >> >>> 3. The BEFORE trigger function can change the distribution column >> >>> itself. We need to add a check at the end of the trigger executions. >> >>> >> >> >> >> Good, you thought about that. Yes we should check it. >> >> >> >>> >> >>> 4. Fetching OLD row for WHEN clause handling. >> >>> >> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >> >>> >> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to work >> >>> without significant changes, but they are yet to be tested. >> >>> INSERT/DELETE triggers: Most of the infrastructure has been done while >> >>> implementing UPDATE triggers. But some changes specific to INSERT and >> >>> DELETE are yet to be done. >> >>> Deferred triggers to be tested. >> >>> >> >>> 7. Regression analysis. There are some new failures. Will post another >> >>> fair version of the patch after regression analysis and fixing various >> >>> TODOs. >> >>> >> >>> Comments welcome. >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> Everyone hates slow websites. So do we. >> >>> Make your web apps faster with AppDynamics >> >>> Download AppDynamics Lite for free today: >> >>> http://p.sf.net/sfu/appdyn_d2d_feb >> >>> _______________________________________________ >> >>> Postgres-xc-developers mailing list >> >>> Pos...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>> >> >> >> >> >> >> >> >> -- >> >> Best Wishes, >> >> Ashutosh Bapat >> >> EntepriseDB Corporation >> >> The Enterprise Postgres Company >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Everyone hates slow websites. So do we. >> >> Make your web apps faster with AppDynamics >> >> Download AppDynamics Lite for free today: >> >> http://p.sf.net/sfu/appdyn_d2d_feb >> >> _______________________________________________ >> >> Postgres-xc-developers mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> > >> > >> > >> > -- >> > StormDB - http://www.stormdb.com >> > The Database Cloud >> > Postgres-XC Support and Service >> > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2013-04-03 09:25:12
|
Hi Amit, Given the magnitude of the change, I think we should break this work into smaller self-sufficient patches and commit them (either on master or in a separate branch for trigger work). This will allow us to review and commit small amount of work and set it aside, rather than going over everything in every round. On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar < ami...@en...> wrote: > > > > On 26 March 2013 15:53, Ashutosh Bapat <ash...@en...>wrote: > >> >> >> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < >> ami...@en...> wrote: >> >>> >>> >>> On 4 March 2013 11:11, Amit Khandekar <ami...@en...>wrote: >>> >>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >>>> >> >>>> >> Issue: Whether we should fetch the whole from the datanode (OLD row) >>>> and not >>>> >> just ctid and node_id and required columns and store it at the >>>> coordinator >>>> >> for the processing OR whether we should fetch each row (OLD and NEW >>>> >> variants) while processing each row. >>>> >> >>>> >> Both of them have performance impacts - the first one has disk >>>> impact for >>>> >> large number of rows whereas the second has network impact for >>>> querying >>>> >> rows. Is it possible to do some analytical assessment as to which of >>>> them >>>> >> would be better? If you can come up with something concrete (may be >>>> numbers >>>> >> or formulae) we will be able to judge better as to which one to pick >>>> up. >>>> >>>> Will check if we can come up with some sensible analysis or figures. >>>> >>>> >>> I have done some analysis on both of these approaches here: >>> >>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing >>> >>> In practical terms, we anyways would need to implement (B). The reason >>> is because when the trigger has conditional execution(WHEN clause) we >>> *have* to fetch the rows beforehand, so there is no point in fetching all >>> of them again at the end of the statement when we already have them >>> locally. So may be it would be too ambitious to have have both >>> implementations, at least for this release. >>> >>> >> I agree here. We can certainly optimize for various cases later, but we >> should have something which would give all the functionality (albeit at a >> lower performance for now). >> >> >>> So I am focussing on (B) right now. We have two options: >>> >>> 1. Store all rows in palloced memory, and save the HeapTuple pointers in >>> the trigger queue, and directly access the OLD and NEW rows using these >>> pointers when needed. Here we will have no control over how much memory we >>> should use for the old and new records, and this might even hamper system >>> performance, let alone XC performance. >>> 2. Other option is to use tuplestore. Here, we need to store the >>> positions of the records in the tuplestore. So for a particular tigger >>> event, fetch by the position. From what I understand, tuplestore can be >>> advanced only sequentially in either direction. So when the read pointer is >>> at position 6 and we need to fetch a record at position 10, we need to call >>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead >>> because it calls tuplestore_gettuple(). But the trigger records are not >>> distributed so randomly. In fact a set of trigger events for a particular >>> event id are accessed in the same order as the order in which they are >>> queued. So for a particular event id, only the first access call will >>> require random access. tuplestore supports multiple read pointers, so may >>> be we can make use of that to access the first record using the closest >>> read pointer. >>> >>> >> Using palloc will be a problem if the size of data fetched is more that >> what could fit in memory. Also pallocing frequently is going to be >> performance problem. Let's see how does the tuple store approach go. >> > > While I am working on the AFTER ROW optimization, here's a patch that has > only BEFORE ROW trigger support, so that it can get you started with first > round of review. The regression is not analyzed fully yet. Besides the AR > trigger related changes, I have also stripped the logic of whether to run > the trigger on datanode or coordinator; this logic depends on both before > and after triggers. > > >> >> >>> >>> >>>> >> >>>> > >>>> > Or we can consider a hybrid approach of getting the rows in batches of >>>> > 1000 or so if possible as well. That ways they get into coordinator >>>> > memory in one shot and can be processed in batches. Obviously this >>>> > should be considered if it's not going to be a complicated >>>> > implementation. >>>> >>>> It just occurred to me that it would not be that hard to optimize the >>>> row-fetching-by-ctid as shown below: >>>> 1. When it is time to fire the queued triggers at the >>>> statement/transaction end, initialize cursors - one cursor per >>>> datanode - which would do: SELECT remote_heap_fetch(table_name, >>>> '<ctidlist>'); We can form this ctidlist out of the trigger even list. >>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT using >>>> the appropriate cursor name according to the datanode id to which the >>>> trigger entry belongs. >>>> >>>> > >>>> >>> Currently we fetch all attributes in the SELECT subplans. I have >>>> >>> created another patch to fetch only the required attribtues, but >>>> have >>>> >>> not merged that into this patch. >>>> > >>>> > Do we have other places where we unnecessary fetch all attributes? >>>> > ISTM, this should be fixed as a performance improvement first ahead of >>>> > everything else. >>>> >>>> I believe DML subplan is the only remaining place where we fetch all >>>> attributes. And yes, this is a must-have for triggers, otherwise, the >>>> other optimizations would be of no use. >>>> >>>> > >>>> >>> 2. One important TODO for BEFORE trigger is this: Just before >>>> >>> invoking the trigger functions, in PG, the tuple is row-locked >>>> >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >>>> >>> from the table. So it is made sure that while all the triggers for >>>> >>> that table are executed, no one can update that particular row. >>>> >>> In the patch, we haven't locked the row. We need to lock it either >>>> by >>>> >>> executing : >>>> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and >>>> then >>>> >>> use the returned ROW as the OLD row. >>>> >>> OR >>>> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so >>>> that >>>> >>> the row is already locked, and we don't have to lock it again. >>>> >>> #2 is simple though it might cause some amount of longer waits in >>>> general. >>>> >>> Using #1, though the locks would be acquired only when the >>>> particular >>>> >>> row is updated, the locks would be released only after transaction >>>> >>> end, so #1 might not be worth implementing. >>>> >>> Also #1 requires another explicit remote fetch for the >>>> >>> lock-and-get-latest-version operation. >>>> >>> I am more inclined towards #2. >>>> >>> >>>> >> The option #2 however, has problem of locking too many rows if there >>>> are >>>> >> coordinator quals in the subplans IOW the number of rows finally >>>> updated are >>>> >> lesser than the number of rows fetched from the datanode. It can >>>> cause >>>> >> unwanted deadlocks. Unless there is a way to release these extra >>>> locks, I am >>>> >> afraid this option will be a problem. >>>> >>>> True. Regardless of anything else - whether it is deadlocks or longer >>>> waits, we should not lock rows that are not to be updated. >>>> >>>> There is a more general row-locking issue that we need to solve first >>>> : 3606317. I anticipate that solving this will solve the trigger >>>> specific lock issue. So for triggers, this is a must-have, and I am >>>> going to solve this issue as part of this bug 3606317. >>>> >>>> >> >>>> > Deadlocks? ISTM, we can get more lock waits because of this but I do >>>> > not see deadlock scenarios.. >>>> > >>>> > With the FQS shipping work being done by Ashutosh, will we also ship >>>> > major chunks of subplans to the datanodes? If yes, then row locking >>>> > will only involve required tuples (hopefully) from the coordinator's >>>> > point of view. >>>> > >>>> > Also, something radical is can be invent a new type of FOR [NODE] >>>> > UPDATE type lock to minimize the impact of such locking of rows on >>>> > datanodes? >>>> > >>>> > Regards, >>>> > Nikhils >>>> > >>>> >>> >>>> >>> 3. The BEFORE trigger function can change the distribution column >>>> >>> itself. We need to add a check at the end of the trigger executions. >>>> >>> >>>> >> >>>> >> Good, you thought about that. Yes we should check it. >>>> >> >>>> >>> >>>> >>> 4. Fetching OLD row for WHEN clause handling. >>>> >>> >>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>>> >>> >>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to work >>>> >>> without significant changes, but they are yet to be tested. >>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been done >>>> while >>>> >>> implementing UPDATE triggers. But some changes specific to INSERT >>>> and >>>> >>> DELETE are yet to be done. >>>> >>> Deferred triggers to be tested. >>>> >>> >>>> >>> 7. Regression analysis. There are some new failures. Will post >>>> another >>>> >>> fair version of the patch after regression analysis and fixing >>>> various >>>> >>> TODOs. >>>> >>> >>>> >>> Comments welcome. >>>> >>> >>>> >>> >>>> >>> >>>> ------------------------------------------------------------------------------ >>>> >>> Everyone hates slow websites. So do we. >>>> >>> Make your web apps faster with AppDynamics >>>> >>> Download AppDynamics Lite for free today: >>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb >>>> >>> _______________________________________________ >>>> >>> Postgres-xc-developers mailing list >>>> >>> Pos...@li... >>>> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Best Wishes, >>>> >> Ashutosh Bapat >>>> >> EntepriseDB Corporation >>>> >> The Enterprise Postgres Company >>>> >> >>>> >> >>>> ------------------------------------------------------------------------------ >>>> >> Everyone hates slow websites. So do we. >>>> >> Make your web apps faster with AppDynamics >>>> >> Download AppDynamics Lite for free today: >>>> >> http://p.sf.net/sfu/appdyn_d2d_feb >>>> >> _______________________________________________ >>>> >> Postgres-xc-developers mailing list >>>> >> Pos...@li... >>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > StormDB - http://www.stormdb.com >>>> > The Database Cloud >>>> > Postgres-XC Support and Service >>>> >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Amit K. <ami...@en...> - 2013-04-04 06:57:02
|
On 3 April 2013 15:10, Ashutosh Bapat <ash...@en...>wrote: > Hi Amit, > Given the magnitude of the change, I think we should break this work into > smaller self-sufficient patches and commit them (either on master or in a > separate branch for trigger work). This will allow us to review and commit > small amount of work and set it aside, rather than going over everything in > every round. > I have created a new branch "rowtriggers" in postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc, where I have dumped incremental changes. > > On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar < > ami...@en...> wrote: > >> >> >> >> On 26 March 2013 15:53, Ashutosh Bapat <ash...@en...>wrote: >> >>> >>> >>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < >>> ami...@en...> wrote: >>> >>>> >>>> >>>> On 4 March 2013 11:11, Amit Khandekar <ami...@en...>wrote: >>>> >>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >>>>> >> >>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD >>>>> row) and not >>>>> >> just ctid and node_id and required columns and store it at the >>>>> coordinator >>>>> >> for the processing OR whether we should fetch each row (OLD and NEW >>>>> >> variants) while processing each row. >>>>> >> >>>>> >> Both of them have performance impacts - the first one has disk >>>>> impact for >>>>> >> large number of rows whereas the second has network impact for >>>>> querying >>>>> >> rows. Is it possible to do some analytical assessment as to which >>>>> of them >>>>> >> would be better? If you can come up with something concrete (may be >>>>> numbers >>>>> >> or formulae) we will be able to judge better as to which one to >>>>> pick up. >>>>> >>>>> Will check if we can come up with some sensible analysis or figures. >>>>> >>>>> >>>> I have done some analysis on both of these approaches here: >>>> >>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing >>>> >>>> In practical terms, we anyways would need to implement (B). The reason >>>> is because when the trigger has conditional execution(WHEN clause) we >>>> *have* to fetch the rows beforehand, so there is no point in fetching all >>>> of them again at the end of the statement when we already have them >>>> locally. So may be it would be too ambitious to have have both >>>> implementations, at least for this release. >>>> >>>> >>> I agree here. We can certainly optimize for various cases later, but we >>> should have something which would give all the functionality (albeit at a >>> lower performance for now). >>> >>> >>>> So I am focussing on (B) right now. We have two options: >>>> >>>> 1. Store all rows in palloced memory, and save the HeapTuple pointers >>>> in the trigger queue, and directly access the OLD and NEW rows using these >>>> pointers when needed. Here we will have no control over how much memory we >>>> should use for the old and new records, and this might even hamper system >>>> performance, let alone XC performance. >>>> 2. Other option is to use tuplestore. Here, we need to store the >>>> positions of the records in the tuplestore. So for a particular tigger >>>> event, fetch by the position. From what I understand, tuplestore can be >>>> advanced only sequentially in either direction. So when the read pointer is >>>> at position 6 and we need to fetch a record at position 10, we need to call >>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead >>>> because it calls tuplestore_gettuple(). But the trigger records are not >>>> distributed so randomly. In fact a set of trigger events for a particular >>>> event id are accessed in the same order as the order in which they are >>>> queued. So for a particular event id, only the first access call will >>>> require random access. tuplestore supports multiple read pointers, so may >>>> be we can make use of that to access the first record using the closest >>>> read pointer. >>>> >>>> >>> Using palloc will be a problem if the size of data fetched is more that >>> what could fit in memory. Also pallocing frequently is going to be >>> performance problem. Let's see how does the tuple store approach go. >>> >> >> While I am working on the AFTER ROW optimization, here's a patch that has >> only BEFORE ROW trigger support, so that it can get you started with first >> round of review. The regression is not analyzed fully yet. Besides the AR >> trigger related changes, I have also stripped the logic of whether to run >> the trigger on datanode or coordinator; this logic depends on both before >> and after triggers. >> >> >>> >>> >>>> >>>> >>>>> >> >>>>> > >>>>> > Or we can consider a hybrid approach of getting the rows in batches >>>>> of >>>>> > 1000 or so if possible as well. That ways they get into coordinator >>>>> > memory in one shot and can be processed in batches. Obviously this >>>>> > should be considered if it's not going to be a complicated >>>>> > implementation. >>>>> >>>>> It just occurred to me that it would not be that hard to optimize the >>>>> row-fetching-by-ctid as shown below: >>>>> 1. When it is time to fire the queued triggers at the >>>>> statement/transaction end, initialize cursors - one cursor per >>>>> datanode - which would do: SELECT remote_heap_fetch(table_name, >>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even list. >>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT using >>>>> the appropriate cursor name according to the datanode id to which the >>>>> trigger entry belongs. >>>>> >>>>> > >>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have >>>>> >>> created another patch to fetch only the required attribtues, but >>>>> have >>>>> >>> not merged that into this patch. >>>>> > >>>>> > Do we have other places where we unnecessary fetch all attributes? >>>>> > ISTM, this should be fixed as a performance improvement first ahead >>>>> of >>>>> > everything else. >>>>> >>>>> I believe DML subplan is the only remaining place where we fetch all >>>>> attributes. And yes, this is a must-have for triggers, otherwise, the >>>>> other optimizations would be of no use. >>>>> >>>>> > >>>>> >>> 2. One important TODO for BEFORE trigger is this: Just before >>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked >>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >>>>> >>> from the table. So it is made sure that while all the triggers for >>>>> >>> that table are executed, no one can update that particular row. >>>>> >>> In the patch, we haven't locked the row. We need to lock it either >>>>> by >>>>> >>> executing : >>>>> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and >>>>> then >>>>> >>> use the returned ROW as the OLD row. >>>>> >>> OR >>>>> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so >>>>> that >>>>> >>> the row is already locked, and we don't have to lock it again. >>>>> >>> #2 is simple though it might cause some amount of longer waits in >>>>> general. >>>>> >>> Using #1, though the locks would be acquired only when the >>>>> particular >>>>> >>> row is updated, the locks would be released only after transaction >>>>> >>> end, so #1 might not be worth implementing. >>>>> >>> Also #1 requires another explicit remote fetch for the >>>>> >>> lock-and-get-latest-version operation. >>>>> >>> I am more inclined towards #2. >>>>> >>> >>>>> >> The option #2 however, has problem of locking too many rows if >>>>> there are >>>>> >> coordinator quals in the subplans IOW the number of rows finally >>>>> updated are >>>>> >> lesser than the number of rows fetched from the datanode. It can >>>>> cause >>>>> >> unwanted deadlocks. Unless there is a way to release these extra >>>>> locks, I am >>>>> >> afraid this option will be a problem. >>>>> >>>>> True. Regardless of anything else - whether it is deadlocks or longer >>>>> waits, we should not lock rows that are not to be updated. >>>>> >>>>> There is a more general row-locking issue that we need to solve first >>>>> : 3606317. I anticipate that solving this will solve the trigger >>>>> specific lock issue. So for triggers, this is a must-have, and I am >>>>> going to solve this issue as part of this bug 3606317. >>>>> >>>>> >> >>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I do >>>>> > not see deadlock scenarios.. >>>>> > >>>>> > With the FQS shipping work being done by Ashutosh, will we also ship >>>>> > major chunks of subplans to the datanodes? If yes, then row locking >>>>> > will only involve required tuples (hopefully) from the coordinator's >>>>> > point of view. >>>>> > >>>>> > Also, something radical is can be invent a new type of FOR [NODE] >>>>> > UPDATE type lock to minimize the impact of such locking of rows on >>>>> > datanodes? >>>>> > >>>>> > Regards, >>>>> > Nikhils >>>>> > >>>>> >>> >>>>> >>> 3. The BEFORE trigger function can change the distribution column >>>>> >>> itself. We need to add a check at the end of the trigger >>>>> executions. >>>>> >>> >>>>> >> >>>>> >> Good, you thought about that. Yes we should check it. >>>>> >> >>>>> >>> >>>>> >>> 4. Fetching OLD row for WHEN clause handling. >>>>> >>> >>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>>>> >>> >>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to >>>>> work >>>>> >>> without significant changes, but they are yet to be tested. >>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been done >>>>> while >>>>> >>> implementing UPDATE triggers. But some changes specific to INSERT >>>>> and >>>>> >>> DELETE are yet to be done. >>>>> >>> Deferred triggers to be tested. >>>>> >>> >>>>> >>> 7. Regression analysis. There are some new failures. Will post >>>>> another >>>>> >>> fair version of the patch after regression analysis and fixing >>>>> various >>>>> >>> TODOs. >>>>> >>> >>>>> >>> Comments welcome. >>>>> >>> >>>>> >>> >>>>> >>> >>>>> ------------------------------------------------------------------------------ >>>>> >>> Everyone hates slow websites. So do we. >>>>> >>> Make your web apps faster with AppDynamics >>>>> >>> Download AppDynamics Lite for free today: >>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb >>>>> >>> _______________________________________________ >>>>> >>> Postgres-xc-developers mailing list >>>>> >>> Pos...@li... >>>>> >>> >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Best Wishes, >>>>> >> Ashutosh Bapat >>>>> >> EntepriseDB Corporation >>>>> >> The Enterprise Postgres Company >>>>> >> >>>>> >> >>>>> ------------------------------------------------------------------------------ >>>>> >> Everyone hates slow websites. So do we. >>>>> >> Make your web apps faster with AppDynamics >>>>> >> Download AppDynamics Lite for free today: >>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb >>>>> >> _______________________________________________ >>>>> >> Postgres-xc-developers mailing list >>>>> >> Pos...@li... >>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >> >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > StormDB - http://www.stormdb.com >>>>> > The Database Cloud >>>>> > Postgres-XC Support and Service >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Ashutosh B. <ash...@en...> - 2013-04-04 14:18:09
|
Hi Amit, Thanks for creating the branch and the commits. I will give my comments on each of the commits in separate mail. I am starting with 1dc081ebe097e63009bab219231e55679db6fae0. Is that the correct one? On Thu, Apr 4, 2013 at 12:26 PM, Amit Khandekar < ami...@en...> wrote: > > > > On 3 April 2013 15:10, Ashutosh Bapat <ash...@en...>wrote: > >> Hi Amit, >> Given the magnitude of the change, I think we should break this work into >> smaller self-sufficient patches and commit them (either on master or in a >> separate branch for trigger work). This will allow us to review and commit >> small amount of work and set it aside, rather than going over everything in >> every round. >> > > I have created a new branch "rowtriggers" in > postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc, where I > have dumped incremental changes. > > >> >> On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar < >> ami...@en...> wrote: >> >>> >>> >>> >>> On 26 March 2013 15:53, Ashutosh Bapat <ash...@en...>wrote: >>> >>>> >>>> >>>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < >>>> ami...@en...> wrote: >>>> >>>>> >>>>> >>>>> On 4 March 2013 11:11, Amit Khandekar <ami...@en... >>>>> > wrote: >>>>> >>>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >>>>>> >> >>>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD >>>>>> row) and not >>>>>> >> just ctid and node_id and required columns and store it at the >>>>>> coordinator >>>>>> >> for the processing OR whether we should fetch each row (OLD and >>>>>> NEW >>>>>> >> variants) while processing each row. >>>>>> >> >>>>>> >> Both of them have performance impacts - the first one has disk >>>>>> impact for >>>>>> >> large number of rows whereas the second has network impact for >>>>>> querying >>>>>> >> rows. Is it possible to do some analytical assessment as to which >>>>>> of them >>>>>> >> would be better? If you can come up with something concrete (may >>>>>> be numbers >>>>>> >> or formulae) we will be able to judge better as to which one to >>>>>> pick up. >>>>>> >>>>>> Will check if we can come up with some sensible analysis or figures. >>>>>> >>>>>> >>>>> I have done some analysis on both of these approaches here: >>>>> >>>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing >>>>> >>>>> In practical terms, we anyways would need to implement (B). The reason >>>>> is because when the trigger has conditional execution(WHEN clause) we >>>>> *have* to fetch the rows beforehand, so there is no point in fetching all >>>>> of them again at the end of the statement when we already have them >>>>> locally. So may be it would be too ambitious to have have both >>>>> implementations, at least for this release. >>>>> >>>>> >>>> I agree here. We can certainly optimize for various cases later, but we >>>> should have something which would give all the functionality (albeit at a >>>> lower performance for now). >>>> >>>> >>>>> So I am focussing on (B) right now. We have two options: >>>>> >>>>> 1. Store all rows in palloced memory, and save the HeapTuple pointers >>>>> in the trigger queue, and directly access the OLD and NEW rows using these >>>>> pointers when needed. Here we will have no control over how much memory we >>>>> should use for the old and new records, and this might even hamper system >>>>> performance, let alone XC performance. >>>>> 2. Other option is to use tuplestore. Here, we need to store the >>>>> positions of the records in the tuplestore. So for a particular tigger >>>>> event, fetch by the position. From what I understand, tuplestore can be >>>>> advanced only sequentially in either direction. So when the read pointer is >>>>> at position 6 and we need to fetch a record at position 10, we need to call >>>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead >>>>> because it calls tuplestore_gettuple(). But the trigger records are not >>>>> distributed so randomly. In fact a set of trigger events for a particular >>>>> event id are accessed in the same order as the order in which they are >>>>> queued. So for a particular event id, only the first access call will >>>>> require random access. tuplestore supports multiple read pointers, so may >>>>> be we can make use of that to access the first record using the closest >>>>> read pointer. >>>>> >>>>> >>>> Using palloc will be a problem if the size of data fetched is more that >>>> what could fit in memory. Also pallocing frequently is going to be >>>> performance problem. Let's see how does the tuple store approach go. >>>> >>> >>> While I am working on the AFTER ROW optimization, here's a patch that >>> has only BEFORE ROW trigger support, so that it can get you started with >>> first round of review. The regression is not analyzed fully yet. Besides >>> the AR trigger related changes, I have also stripped the logic of whether >>> to run the trigger on datanode or coordinator; this logic depends on both >>> before and after triggers. >>> >>> >>>> >>>> >>>>> >>>>> >>>>>> >> >>>>>> > >>>>>> > Or we can consider a hybrid approach of getting the rows in batches >>>>>> of >>>>>> > 1000 or so if possible as well. That ways they get into coordinator >>>>>> > memory in one shot and can be processed in batches. Obviously this >>>>>> > should be considered if it's not going to be a complicated >>>>>> > implementation. >>>>>> >>>>>> It just occurred to me that it would not be that hard to optimize the >>>>>> row-fetching-by-ctid as shown below: >>>>>> 1. When it is time to fire the queued triggers at the >>>>>> statement/transaction end, initialize cursors - one cursor per >>>>>> datanode - which would do: SELECT remote_heap_fetch(table_name, >>>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even list. >>>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT using >>>>>> the appropriate cursor name according to the datanode id to which the >>>>>> trigger entry belongs. >>>>>> >>>>>> > >>>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have >>>>>> >>> created another patch to fetch only the required attribtues, but >>>>>> have >>>>>> >>> not merged that into this patch. >>>>>> > >>>>>> > Do we have other places where we unnecessary fetch all attributes? >>>>>> > ISTM, this should be fixed as a performance improvement first ahead >>>>>> of >>>>>> > everything else. >>>>>> >>>>>> I believe DML subplan is the only remaining place where we fetch all >>>>>> attributes. And yes, this is a must-have for triggers, otherwise, the >>>>>> other optimizations would be of no use. >>>>>> >>>>>> > >>>>>> >>> 2. One important TODO for BEFORE trigger is this: Just before >>>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked >>>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >>>>>> >>> from the table. So it is made sure that while all the triggers for >>>>>> >>> that table are executed, no one can update that particular row. >>>>>> >>> In the patch, we haven't locked the row. We need to lock it >>>>>> either by >>>>>> >>> executing : >>>>>> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and >>>>>> then >>>>>> >>> use the returned ROW as the OLD row. >>>>>> >>> OR >>>>>> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so >>>>>> that >>>>>> >>> the row is already locked, and we don't have to lock it again. >>>>>> >>> #2 is simple though it might cause some amount of longer waits in >>>>>> general. >>>>>> >>> Using #1, though the locks would be acquired only when the >>>>>> particular >>>>>> >>> row is updated, the locks would be released only after transaction >>>>>> >>> end, so #1 might not be worth implementing. >>>>>> >>> Also #1 requires another explicit remote fetch for the >>>>>> >>> lock-and-get-latest-version operation. >>>>>> >>> I am more inclined towards #2. >>>>>> >>> >>>>>> >> The option #2 however, has problem of locking too many rows if >>>>>> there are >>>>>> >> coordinator quals in the subplans IOW the number of rows finally >>>>>> updated are >>>>>> >> lesser than the number of rows fetched from the datanode. It can >>>>>> cause >>>>>> >> unwanted deadlocks. Unless there is a way to release these extra >>>>>> locks, I am >>>>>> >> afraid this option will be a problem. >>>>>> >>>>>> True. Regardless of anything else - whether it is deadlocks or longer >>>>>> waits, we should not lock rows that are not to be updated. >>>>>> >>>>>> There is a more general row-locking issue that we need to solve first >>>>>> : 3606317. I anticipate that solving this will solve the trigger >>>>>> specific lock issue. So for triggers, this is a must-have, and I am >>>>>> going to solve this issue as part of this bug 3606317. >>>>>> >>>>>> >> >>>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I do >>>>>> > not see deadlock scenarios.. >>>>>> > >>>>>> > With the FQS shipping work being done by Ashutosh, will we also ship >>>>>> > major chunks of subplans to the datanodes? If yes, then row locking >>>>>> > will only involve required tuples (hopefully) from the coordinator's >>>>>> > point of view. >>>>>> > >>>>>> > Also, something radical is can be invent a new type of FOR [NODE] >>>>>> > UPDATE type lock to minimize the impact of such locking of rows on >>>>>> > datanodes? >>>>>> > >>>>>> > Regards, >>>>>> > Nikhils >>>>>> > >>>>>> >>> >>>>>> >>> 3. The BEFORE trigger function can change the distribution column >>>>>> >>> itself. We need to add a check at the end of the trigger >>>>>> executions. >>>>>> >>> >>>>>> >> >>>>>> >> Good, you thought about that. Yes we should check it. >>>>>> >> >>>>>> >>> >>>>>> >>> 4. Fetching OLD row for WHEN clause handling. >>>>>> >>> >>>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>>>>> >>> >>>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to >>>>>> work >>>>>> >>> without significant changes, but they are yet to be tested. >>>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been done >>>>>> while >>>>>> >>> implementing UPDATE triggers. But some changes specific to INSERT >>>>>> and >>>>>> >>> DELETE are yet to be done. >>>>>> >>> Deferred triggers to be tested. >>>>>> >>> >>>>>> >>> 7. Regression analysis. There are some new failures. Will post >>>>>> another >>>>>> >>> fair version of the patch after regression analysis and fixing >>>>>> various >>>>>> >>> TODOs. >>>>>> >>> >>>>>> >>> Comments welcome. >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>> Everyone hates slow websites. So do we. >>>>>> >>> Make your web apps faster with AppDynamics >>>>>> >>> Download AppDynamics Lite for free today: >>>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>> >>> _______________________________________________ >>>>>> >>> Postgres-xc-developers mailing list >>>>>> >>> Pos...@li... >>>>>> >>> >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >>> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Best Wishes, >>>>>> >> Ashutosh Bapat >>>>>> >> EntepriseDB Corporation >>>>>> >> The Enterprise Postgres Company >>>>>> >> >>>>>> >> >>>>>> ------------------------------------------------------------------------------ >>>>>> >> Everyone hates slow websites. So do we. >>>>>> >> Make your web apps faster with AppDynamics >>>>>> >> Download AppDynamics Lite for free today: >>>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>> >> _______________________________________________ >>>>>> >> Postgres-xc-developers mailing list >>>>>> >> Pos...@li... >>>>>> >> >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >> >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > StormDB - http://www.stormdb.com >>>>>> > The Database Cloud >>>>>> > Postgres-XC Support and Service >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2013-04-05 07:09:04
|
Hi Amit, This is a general comment about trigger execution on coordinator. Catalog tables are local to coordinator and any modifcations to those happen at coordinator in the same way as it would in PostgreSQL. I am not sure if one can create a trigger on catalog table ( I think not). In case we could create triggers on catalog tables, we will need to be careful to make sure that they execute the same way as PostgreSQL. On Thu, Apr 4, 2013 at 7:48 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Amit, > Thanks for creating the branch and the commits. > > I will give my comments on each of the commits in separate mail. I am > starting with 1dc081ebe097e63009bab219231e55679db6fae0. Is that the correct > one? > > > On Thu, Apr 4, 2013 at 12:26 PM, Amit Khandekar < > ami...@en...> wrote: > >> >> >> >> On 3 April 2013 15:10, Ashutosh Bapat <ash...@en...>wrote: >> >>> Hi Amit, >>> Given the magnitude of the change, I think we should break this work >>> into smaller self-sufficient patches and commit them (either on master or >>> in a separate branch for trigger work). This will allow us to review and >>> commit small amount of work and set it aside, rather than going over >>> everything in every round. >>> >> >> I have created a new branch "rowtriggers" in >> postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc, where I >> have dumped incremental changes. >> >> >>> >>> On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar < >>> ami...@en...> wrote: >>> >>>> >>>> >>>> >>>> On 26 March 2013 15:53, Ashutosh Bapat <ash...@en... >>>> > wrote: >>>> >>>>> >>>>> >>>>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < >>>>> ami...@en...> wrote: >>>>> >>>>>> >>>>>> >>>>>> On 4 March 2013 11:11, Amit Khandekar < >>>>>> ami...@en...> wrote: >>>>>> >>>>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >>>>>>> >> >>>>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD >>>>>>> row) and not >>>>>>> >> just ctid and node_id and required columns and store it at the >>>>>>> coordinator >>>>>>> >> for the processing OR whether we should fetch each row (OLD and >>>>>>> NEW >>>>>>> >> variants) while processing each row. >>>>>>> >> >>>>>>> >> Both of them have performance impacts - the first one has disk >>>>>>> impact for >>>>>>> >> large number of rows whereas the second has network impact for >>>>>>> querying >>>>>>> >> rows. Is it possible to do some analytical assessment as to which >>>>>>> of them >>>>>>> >> would be better? If you can come up with something concrete (may >>>>>>> be numbers >>>>>>> >> or formulae) we will be able to judge better as to which one to >>>>>>> pick up. >>>>>>> >>>>>>> Will check if we can come up with some sensible analysis or figures. >>>>>>> >>>>>>> >>>>>> I have done some analysis on both of these approaches here: >>>>>> >>>>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing >>>>>> >>>>>> In practical terms, we anyways would need to implement (B). The >>>>>> reason is because when the trigger has conditional execution(WHEN clause) >>>>>> we *have* to fetch the rows beforehand, so there is no point in fetching >>>>>> all of them again at the end of the statement when we already have them >>>>>> locally. So may be it would be too ambitious to have have both >>>>>> implementations, at least for this release. >>>>>> >>>>>> >>>>> I agree here. We can certainly optimize for various cases later, but >>>>> we should have something which would give all the functionality (albeit at >>>>> a lower performance for now). >>>>> >>>>> >>>>>> So I am focussing on (B) right now. We have two options: >>>>>> >>>>>> 1. Store all rows in palloced memory, and save the HeapTuple pointers >>>>>> in the trigger queue, and directly access the OLD and NEW rows using these >>>>>> pointers when needed. Here we will have no control over how much memory we >>>>>> should use for the old and new records, and this might even hamper system >>>>>> performance, let alone XC performance. >>>>>> 2. Other option is to use tuplestore. Here, we need to store the >>>>>> positions of the records in the tuplestore. So for a particular tigger >>>>>> event, fetch by the position. From what I understand, tuplestore can be >>>>>> advanced only sequentially in either direction. So when the read pointer is >>>>>> at position 6 and we need to fetch a record at position 10, we need to call >>>>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead >>>>>> because it calls tuplestore_gettuple(). But the trigger records are not >>>>>> distributed so randomly. In fact a set of trigger events for a particular >>>>>> event id are accessed in the same order as the order in which they are >>>>>> queued. So for a particular event id, only the first access call will >>>>>> require random access. tuplestore supports multiple read pointers, so may >>>>>> be we can make use of that to access the first record using the closest >>>>>> read pointer. >>>>>> >>>>>> >>>>> Using palloc will be a problem if the size of data fetched is more >>>>> that what could fit in memory. Also pallocing frequently is going to be >>>>> performance problem. Let's see how does the tuple store approach go. >>>>> >>>> >>>> While I am working on the AFTER ROW optimization, here's a patch that >>>> has only BEFORE ROW trigger support, so that it can get you started with >>>> first round of review. The regression is not analyzed fully yet. Besides >>>> the AR trigger related changes, I have also stripped the logic of whether >>>> to run the trigger on datanode or coordinator; this logic depends on both >>>> before and after triggers. >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >> >>>>>>> > >>>>>>> > Or we can consider a hybrid approach of getting the rows in >>>>>>> batches of >>>>>>> > 1000 or so if possible as well. That ways they get into coordinator >>>>>>> > memory in one shot and can be processed in batches. Obviously this >>>>>>> > should be considered if it's not going to be a complicated >>>>>>> > implementation. >>>>>>> >>>>>>> It just occurred to me that it would not be that hard to optimize the >>>>>>> row-fetching-by-ctid as shown below: >>>>>>> 1. When it is time to fire the queued triggers at the >>>>>>> statement/transaction end, initialize cursors - one cursor per >>>>>>> datanode - which would do: SELECT remote_heap_fetch(table_name, >>>>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even >>>>>>> list. >>>>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT >>>>>>> using >>>>>>> the appropriate cursor name according to the datanode id to which the >>>>>>> trigger entry belongs. >>>>>>> >>>>>>> > >>>>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have >>>>>>> >>> created another patch to fetch only the required attribtues, but >>>>>>> have >>>>>>> >>> not merged that into this patch. >>>>>>> > >>>>>>> > Do we have other places where we unnecessary fetch all attributes? >>>>>>> > ISTM, this should be fixed as a performance improvement first >>>>>>> ahead of >>>>>>> > everything else. >>>>>>> >>>>>>> I believe DML subplan is the only remaining place where we fetch all >>>>>>> attributes. And yes, this is a must-have for triggers, otherwise, the >>>>>>> other optimizations would be of no use. >>>>>>> >>>>>>> > >>>>>>> >>> 2. One important TODO for BEFORE trigger is this: Just before >>>>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked >>>>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is >>>>>>> fetched >>>>>>> >>> from the table. So it is made sure that while all the triggers >>>>>>> for >>>>>>> >>> that table are executed, no one can update that particular row. >>>>>>> >>> In the patch, we haven't locked the row. We need to lock it >>>>>>> either by >>>>>>> >>> executing : >>>>>>> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and >>>>>>> then >>>>>>> >>> use the returned ROW as the OLD row. >>>>>>> >>> OR >>>>>>> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so >>>>>>> that >>>>>>> >>> the row is already locked, and we don't have to lock it again. >>>>>>> >>> #2 is simple though it might cause some amount of longer waits >>>>>>> in general. >>>>>>> >>> Using #1, though the locks would be acquired only when the >>>>>>> particular >>>>>>> >>> row is updated, the locks would be released only after >>>>>>> transaction >>>>>>> >>> end, so #1 might not be worth implementing. >>>>>>> >>> Also #1 requires another explicit remote fetch for the >>>>>>> >>> lock-and-get-latest-version operation. >>>>>>> >>> I am more inclined towards #2. >>>>>>> >>> >>>>>>> >> The option #2 however, has problem of locking too many rows if >>>>>>> there are >>>>>>> >> coordinator quals in the subplans IOW the number of rows finally >>>>>>> updated are >>>>>>> >> lesser than the number of rows fetched from the datanode. It can >>>>>>> cause >>>>>>> >> unwanted deadlocks. Unless there is a way to release these extra >>>>>>> locks, I am >>>>>>> >> afraid this option will be a problem. >>>>>>> >>>>>>> True. Regardless of anything else - whether it is deadlocks or longer >>>>>>> waits, we should not lock rows that are not to be updated. >>>>>>> >>>>>>> There is a more general row-locking issue that we need to solve first >>>>>>> : 3606317. I anticipate that solving this will solve the trigger >>>>>>> specific lock issue. So for triggers, this is a must-have, and I am >>>>>>> going to solve this issue as part of this bug 3606317. >>>>>>> >>>>>>> >> >>>>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I >>>>>>> do >>>>>>> > not see deadlock scenarios.. >>>>>>> > >>>>>>> > With the FQS shipping work being done by Ashutosh, will we also >>>>>>> ship >>>>>>> > major chunks of subplans to the datanodes? If yes, then row locking >>>>>>> > will only involve required tuples (hopefully) from the >>>>>>> coordinator's >>>>>>> > point of view. >>>>>>> > >>>>>>> > Also, something radical is can be invent a new type of FOR [NODE] >>>>>>> > UPDATE type lock to minimize the impact of such locking of rows on >>>>>>> > datanodes? >>>>>>> > >>>>>>> > Regards, >>>>>>> > Nikhils >>>>>>> > >>>>>>> >>> >>>>>>> >>> 3. The BEFORE trigger function can change the distribution >>>>>>> column >>>>>>> >>> itself. We need to add a check at the end of the trigger >>>>>>> executions. >>>>>>> >>> >>>>>>> >> >>>>>>> >> Good, you thought about that. Yes we should check it. >>>>>>> >> >>>>>>> >>> >>>>>>> >>> 4. Fetching OLD row for WHEN clause handling. >>>>>>> >>> >>>>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>>>>>> >>> >>>>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to >>>>>>> work >>>>>>> >>> without significant changes, but they are yet to be tested. >>>>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been done >>>>>>> while >>>>>>> >>> implementing UPDATE triggers. But some changes specific to >>>>>>> INSERT and >>>>>>> >>> DELETE are yet to be done. >>>>>>> >>> Deferred triggers to be tested. >>>>>>> >>> >>>>>>> >>> 7. Regression analysis. There are some new failures. Will post >>>>>>> another >>>>>>> >>> fair version of the patch after regression analysis and fixing >>>>>>> various >>>>>>> >>> TODOs. >>>>>>> >>> >>>>>>> >>> Comments welcome. >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>> Everyone hates slow websites. So do we. >>>>>>> >>> Make your web apps faster with AppDynamics >>>>>>> >>> Download AppDynamics Lite for free today: >>>>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>>> >>> _______________________________________________ >>>>>>> >>> Postgres-xc-developers mailing list >>>>>>> >>> Pos...@li... >>>>>>> >>> >>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>> >>> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> -- >>>>>>> >> Best Wishes, >>>>>>> >> Ashutosh Bapat >>>>>>> >> EntepriseDB Corporation >>>>>>> >> The Enterprise Postgres Company >>>>>>> >> >>>>>>> >> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >> Everyone hates slow websites. So do we. >>>>>>> >> Make your web apps faster with AppDynamics >>>>>>> >> Download AppDynamics Lite for free today: >>>>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>>> >> _______________________________________________ >>>>>>> >> Postgres-xc-developers mailing list >>>>>>> >> Pos...@li... >>>>>>> >> >>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>> >> >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > StormDB - http://www.stormdb.com >>>>>>> > The Database Cloud >>>>>>> > Postgres-XC Support and Service >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2013-04-10 12:16:35
|
Following are no problem commits, I won't be sending separate mails for these commit d2a982cc1e188f6701db79affe138e186132f137 Author: Amit Khandekar <ami...@en...> Date: Thu Apr 4 12:03:24 2013 +0545 Prototype change for ExecProcNodeDMLInXC() commit e65e2daf2e8d9a2eeb2160963c6e04a6b9316df9 Author: Amit Khandekar <ami...@en...> Date: Thu Apr 4 12:22:28 2013 +0545 Add the missing trigger-function header changes. commit bddc56a2a3961ac52266b22d84e27cdb08e19017 Author: Amit Khandekar <ami...@en...> Date: Thu Apr 4 12:24:26 2013 +0545 Added missing jf_xc_node_id and jf_xc_wholerow fields in JunkFilter structure. modified: src/include/nodes/execnodes.h commit 93dd6f9ff3d535856772181929dd41ef3e25f506 Author: Amit Khandekar <ami...@en...> Date: Thu Apr 4 12:28:24 2013 +0545 Enable trigger support by removing the "ROW-trigger-feature-not-supported" message. On Fri, Apr 5, 2013 at 12:38 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Amit, > This is a general comment about trigger execution on coordinator. > > Catalog tables are local to coordinator and any modifcations to those > happen at coordinator in the same way as it would in PostgreSQL. I am not > sure if one can create a trigger on catalog table ( I think not). In case > we could create triggers on catalog tables, we will need to be careful to > make sure that they execute the same way as PostgreSQL. > > > On Thu, Apr 4, 2013 at 7:48 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi Amit, >> Thanks for creating the branch and the commits. >> >> I will give my comments on each of the commits in separate mail. I am >> starting with 1dc081ebe097e63009bab219231e55679db6fae0. Is that the correct >> one? >> >> >> On Thu, Apr 4, 2013 at 12:26 PM, Amit Khandekar < >> ami...@en...> wrote: >> >>> >>> >>> >>> On 3 April 2013 15:10, Ashutosh Bapat <ash...@en...>wrote: >>> >>>> Hi Amit, >>>> Given the magnitude of the change, I think we should break this work >>>> into smaller self-sufficient patches and commit them (either on master or >>>> in a separate branch for trigger work). This will allow us to review and >>>> commit small amount of work and set it aside, rather than going over >>>> everything in every round. >>>> >>> >>> I have created a new branch "rowtriggers" in >>> postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc, where >>> I have dumped incremental changes. >>> >>> >>>> >>>> On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar < >>>> ami...@en...> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On 26 March 2013 15:53, Ashutosh Bapat < >>>>> ash...@en...> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < >>>>>> ami...@en...> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On 4 March 2013 11:11, Amit Khandekar < >>>>>>> ami...@en...> wrote: >>>>>>> >>>>>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >>>>>>>> >> >>>>>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD >>>>>>>> row) and not >>>>>>>> >> just ctid and node_id and required columns and store it at the >>>>>>>> coordinator >>>>>>>> >> for the processing OR whether we should fetch each row (OLD and >>>>>>>> NEW >>>>>>>> >> variants) while processing each row. >>>>>>>> >> >>>>>>>> >> Both of them have performance impacts - the first one has disk >>>>>>>> impact for >>>>>>>> >> large number of rows whereas the second has network impact for >>>>>>>> querying >>>>>>>> >> rows. Is it possible to do some analytical assessment as to >>>>>>>> which of them >>>>>>>> >> would be better? If you can come up with something concrete (may >>>>>>>> be numbers >>>>>>>> >> or formulae) we will be able to judge better as to which one to >>>>>>>> pick up. >>>>>>>> >>>>>>>> Will check if we can come up with some sensible analysis or figures. >>>>>>>> >>>>>>>> >>>>>>> I have done some analysis on both of these approaches here: >>>>>>> >>>>>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing >>>>>>> >>>>>>> In practical terms, we anyways would need to implement (B). The >>>>>>> reason is because when the trigger has conditional execution(WHEN clause) >>>>>>> we *have* to fetch the rows beforehand, so there is no point in fetching >>>>>>> all of them again at the end of the statement when we already have them >>>>>>> locally. So may be it would be too ambitious to have have both >>>>>>> implementations, at least for this release. >>>>>>> >>>>>>> >>>>>> I agree here. We can certainly optimize for various cases later, but >>>>>> we should have something which would give all the functionality (albeit at >>>>>> a lower performance for now). >>>>>> >>>>>> >>>>>>> So I am focussing on (B) right now. We have two options: >>>>>>> >>>>>>> 1. Store all rows in palloced memory, and save the HeapTuple >>>>>>> pointers in the trigger queue, and directly access the OLD and NEW rows >>>>>>> using these pointers when needed. Here we will have no control over how >>>>>>> much memory we should use for the old and new records, and this might even >>>>>>> hamper system performance, let alone XC performance. >>>>>>> 2. Other option is to use tuplestore. Here, we need to store the >>>>>>> positions of the records in the tuplestore. So for a particular tigger >>>>>>> event, fetch by the position. From what I understand, tuplestore can be >>>>>>> advanced only sequentially in either direction. So when the read pointer is >>>>>>> at position 6 and we need to fetch a record at position 10, we need to call >>>>>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead >>>>>>> because it calls tuplestore_gettuple(). But the trigger records are not >>>>>>> distributed so randomly. In fact a set of trigger events for a particular >>>>>>> event id are accessed in the same order as the order in which they are >>>>>>> queued. So for a particular event id, only the first access call will >>>>>>> require random access. tuplestore supports multiple read pointers, so may >>>>>>> be we can make use of that to access the first record using the closest >>>>>>> read pointer. >>>>>>> >>>>>>> >>>>>> Using palloc will be a problem if the size of data fetched is more >>>>>> that what could fit in memory. Also pallocing frequently is going to be >>>>>> performance problem. Let's see how does the tuple store approach go. >>>>>> >>>>> >>>>> While I am working on the AFTER ROW optimization, here's a patch that >>>>> has only BEFORE ROW trigger support, so that it can get you started with >>>>> first round of review. The regression is not analyzed fully yet. Besides >>>>> the AR trigger related changes, I have also stripped the logic of whether >>>>> to run the trigger on datanode or coordinator; this logic depends on both >>>>> before and after triggers. >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> >> >>>>>>>> > >>>>>>>> > Or we can consider a hybrid approach of getting the rows in >>>>>>>> batches of >>>>>>>> > 1000 or so if possible as well. That ways they get into >>>>>>>> coordinator >>>>>>>> > memory in one shot and can be processed in batches. Obviously this >>>>>>>> > should be considered if it's not going to be a complicated >>>>>>>> > implementation. >>>>>>>> >>>>>>>> It just occurred to me that it would not be that hard to optimize >>>>>>>> the >>>>>>>> row-fetching-by-ctid as shown below: >>>>>>>> 1. When it is time to fire the queued triggers at the >>>>>>>> statement/transaction end, initialize cursors - one cursor per >>>>>>>> datanode - which would do: SELECT remote_heap_fetch(table_name, >>>>>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even >>>>>>>> list. >>>>>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT >>>>>>>> using >>>>>>>> the appropriate cursor name according to the datanode id to which >>>>>>>> the >>>>>>>> trigger entry belongs. >>>>>>>> >>>>>>>> > >>>>>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have >>>>>>>> >>> created another patch to fetch only the required attribtues, >>>>>>>> but have >>>>>>>> >>> not merged that into this patch. >>>>>>>> > >>>>>>>> > Do we have other places where we unnecessary fetch all attributes? >>>>>>>> > ISTM, this should be fixed as a performance improvement first >>>>>>>> ahead of >>>>>>>> > everything else. >>>>>>>> >>>>>>>> I believe DML subplan is the only remaining place where we fetch all >>>>>>>> attributes. And yes, this is a must-have for triggers, otherwise, >>>>>>>> the >>>>>>>> other optimizations would be of no use. >>>>>>>> >>>>>>>> > >>>>>>>> >>> 2. One important TODO for BEFORE trigger is this: Just before >>>>>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked >>>>>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is >>>>>>>> fetched >>>>>>>> >>> from the table. So it is made sure that while all the triggers >>>>>>>> for >>>>>>>> >>> that table are executed, no one can update that particular row. >>>>>>>> >>> In the patch, we haven't locked the row. We need to lock it >>>>>>>> either by >>>>>>>> >>> executing : >>>>>>>> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, >>>>>>>> and then >>>>>>>> >>> use the returned ROW as the OLD row. >>>>>>>> >>> OR >>>>>>>> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE >>>>>>>> so that >>>>>>>> >>> the row is already locked, and we don't have to lock it again. >>>>>>>> >>> #2 is simple though it might cause some amount of longer waits >>>>>>>> in general. >>>>>>>> >>> Using #1, though the locks would be acquired only when the >>>>>>>> particular >>>>>>>> >>> row is updated, the locks would be released only after >>>>>>>> transaction >>>>>>>> >>> end, so #1 might not be worth implementing. >>>>>>>> >>> Also #1 requires another explicit remote fetch for the >>>>>>>> >>> lock-and-get-latest-version operation. >>>>>>>> >>> I am more inclined towards #2. >>>>>>>> >>> >>>>>>>> >> The option #2 however, has problem of locking too many rows if >>>>>>>> there are >>>>>>>> >> coordinator quals in the subplans IOW the number of rows finally >>>>>>>> updated are >>>>>>>> >> lesser than the number of rows fetched from the datanode. It can >>>>>>>> cause >>>>>>>> >> unwanted deadlocks. Unless there is a way to release these extra >>>>>>>> locks, I am >>>>>>>> >> afraid this option will be a problem. >>>>>>>> >>>>>>>> True. Regardless of anything else - whether it is deadlocks or >>>>>>>> longer >>>>>>>> waits, we should not lock rows that are not to be updated. >>>>>>>> >>>>>>>> There is a more general row-locking issue that we need to solve >>>>>>>> first >>>>>>>> : 3606317. I anticipate that solving this will solve the trigger >>>>>>>> specific lock issue. So for triggers, this is a must-have, and I am >>>>>>>> going to solve this issue as part of this bug 3606317. >>>>>>>> >>>>>>>> >> >>>>>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I >>>>>>>> do >>>>>>>> > not see deadlock scenarios.. >>>>>>>> > >>>>>>>> > With the FQS shipping work being done by Ashutosh, will we also >>>>>>>> ship >>>>>>>> > major chunks of subplans to the datanodes? If yes, then row >>>>>>>> locking >>>>>>>> > will only involve required tuples (hopefully) from the >>>>>>>> coordinator's >>>>>>>> > point of view. >>>>>>>> > >>>>>>>> > Also, something radical is can be invent a new type of FOR [NODE] >>>>>>>> > UPDATE type lock to minimize the impact of such locking of rows on >>>>>>>> > datanodes? >>>>>>>> > >>>>>>>> > Regards, >>>>>>>> > Nikhils >>>>>>>> > >>>>>>>> >>> >>>>>>>> >>> 3. The BEFORE trigger function can change the distribution >>>>>>>> column >>>>>>>> >>> itself. We need to add a check at the end of the trigger >>>>>>>> executions. >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> Good, you thought about that. Yes we should check it. >>>>>>>> >> >>>>>>>> >>> >>>>>>>> >>> 4. Fetching OLD row for WHEN clause handling. >>>>>>>> >>> >>>>>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>>>>>>> >>> >>>>>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to >>>>>>>> work >>>>>>>> >>> without significant changes, but they are yet to be tested. >>>>>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been >>>>>>>> done while >>>>>>>> >>> implementing UPDATE triggers. But some changes specific to >>>>>>>> INSERT and >>>>>>>> >>> DELETE are yet to be done. >>>>>>>> >>> Deferred triggers to be tested. >>>>>>>> >>> >>>>>>>> >>> 7. Regression analysis. There are some new failures. Will post >>>>>>>> another >>>>>>>> >>> fair version of the patch after regression analysis and fixing >>>>>>>> various >>>>>>>> >>> TODOs. >>>>>>>> >>> >>>>>>>> >>> Comments welcome. >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>> Everyone hates slow websites. So do we. >>>>>>>> >>> Make your web apps faster with AppDynamics >>>>>>>> >>> Download AppDynamics Lite for free today: >>>>>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>>>> >>> _______________________________________________ >>>>>>>> >>> Postgres-xc-developers mailing list >>>>>>>> >>> Pos...@li... >>>>>>>> >>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> -- >>>>>>>> >> Best Wishes, >>>>>>>> >> Ashutosh Bapat >>>>>>>> >> EntepriseDB Corporation >>>>>>>> >> The Enterprise Postgres Company >>>>>>>> >> >>>>>>>> >> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >> Everyone hates slow websites. So do we. >>>>>>>> >> Make your web apps faster with AppDynamics >>>>>>>> >> Download AppDynamics Lite for free today: >>>>>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>>>> >> _______________________________________________ >>>>>>>> >> Postgres-xc-developers mailing list >>>>>>>> >> Pos...@li... >>>>>>>> >> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>> >> >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > -- >>>>>>>> > StormDB - http://www.stormdb.com >>>>>>>> > The Database Cloud >>>>>>>> > Postgres-XC Support and Service >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Wishes, >>>>>> Ashutosh Bapat >>>>>> EntepriseDB Corporation >>>>>> The Enterprise Postgres Company >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |