From: Ashutosh B. <ash...@en...> - 2013-04-04 14:18:09
|
Hi Amit, Thanks for creating the branch and the commits. I will give my comments on each of the commits in separate mail. I am starting with 1dc081ebe097e63009bab219231e55679db6fae0. Is that the correct one? On Thu, Apr 4, 2013 at 12:26 PM, Amit Khandekar < ami...@en...> wrote: > > > > On 3 April 2013 15:10, Ashutosh Bapat <ash...@en...>wrote: > >> Hi Amit, >> Given the magnitude of the change, I think we should break this work into >> smaller self-sufficient patches and commit them (either on master or in a >> separate branch for trigger work). This will allow us to review and commit >> small amount of work and set it aside, rather than going over everything in >> every round. >> > > I have created a new branch "rowtriggers" in > postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc, where I > have dumped incremental changes. > > >> >> On Wed, Apr 3, 2013 at 10:46 AM, Amit Khandekar < >> ami...@en...> wrote: >> >>> >>> >>> >>> On 26 March 2013 15:53, Ashutosh Bapat <ash...@en...>wrote: >>> >>>> >>>> >>>> On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < >>>> ami...@en...> wrote: >>>> >>>>> >>>>> >>>>> On 4 March 2013 11:11, Amit Khandekar <ami...@en... >>>>> > wrote: >>>>> >>>>>> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >>>>>> >> >>>>>> >> Issue: Whether we should fetch the whole from the datanode (OLD >>>>>> row) and not >>>>>> >> just ctid and node_id and required columns and store it at the >>>>>> coordinator >>>>>> >> for the processing OR whether we should fetch each row (OLD and >>>>>> NEW >>>>>> >> variants) while processing each row. >>>>>> >> >>>>>> >> Both of them have performance impacts - the first one has disk >>>>>> impact for >>>>>> >> large number of rows whereas the second has network impact for >>>>>> querying >>>>>> >> rows. Is it possible to do some analytical assessment as to which >>>>>> of them >>>>>> >> would be better? If you can come up with something concrete (may >>>>>> be numbers >>>>>> >> or formulae) we will be able to judge better as to which one to >>>>>> pick up. >>>>>> >>>>>> Will check if we can come up with some sensible analysis or figures. >>>>>> >>>>>> >>>>> I have done some analysis on both of these approaches here: >>>>> >>>>> https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing >>>>> >>>>> In practical terms, we anyways would need to implement (B). The reason >>>>> is because when the trigger has conditional execution(WHEN clause) we >>>>> *have* to fetch the rows beforehand, so there is no point in fetching all >>>>> of them again at the end of the statement when we already have them >>>>> locally. So may be it would be too ambitious to have have both >>>>> implementations, at least for this release. >>>>> >>>>> >>>> I agree here. We can certainly optimize for various cases later, but we >>>> should have something which would give all the functionality (albeit at a >>>> lower performance for now). >>>> >>>> >>>>> So I am focussing on (B) right now. We have two options: >>>>> >>>>> 1. Store all rows in palloced memory, and save the HeapTuple pointers >>>>> in the trigger queue, and directly access the OLD and NEW rows using these >>>>> pointers when needed. Here we will have no control over how much memory we >>>>> should use for the old and new records, and this might even hamper system >>>>> performance, let alone XC performance. >>>>> 2. Other option is to use tuplestore. Here, we need to store the >>>>> positions of the records in the tuplestore. So for a particular tigger >>>>> event, fetch by the position. From what I understand, tuplestore can be >>>>> advanced only sequentially in either direction. So when the read pointer is >>>>> at position 6 and we need to fetch a record at position 10, we need to call >>>>> tuplestore_advance() 4 times, and this call involves palloc/pfree overhead >>>>> because it calls tuplestore_gettuple(). But the trigger records are not >>>>> distributed so randomly. In fact a set of trigger events for a particular >>>>> event id are accessed in the same order as the order in which they are >>>>> queued. So for a particular event id, only the first access call will >>>>> require random access. tuplestore supports multiple read pointers, so may >>>>> be we can make use of that to access the first record using the closest >>>>> read pointer. >>>>> >>>>> >>>> Using palloc will be a problem if the size of data fetched is more that >>>> what could fit in memory. Also pallocing frequently is going to be >>>> performance problem. Let's see how does the tuple store approach go. >>>> >>> >>> While I am working on the AFTER ROW optimization, here's a patch that >>> has only BEFORE ROW trigger support, so that it can get you started with >>> first round of review. The regression is not analyzed fully yet. Besides >>> the AR trigger related changes, I have also stripped the logic of whether >>> to run the trigger on datanode or coordinator; this logic depends on both >>> before and after triggers. >>> >>> >>>> >>>> >>>>> >>>>> >>>>>> >> >>>>>> > >>>>>> > Or we can consider a hybrid approach of getting the rows in batches >>>>>> of >>>>>> > 1000 or so if possible as well. That ways they get into coordinator >>>>>> > memory in one shot and can be processed in batches. Obviously this >>>>>> > should be considered if it's not going to be a complicated >>>>>> > implementation. >>>>>> >>>>>> It just occurred to me that it would not be that hard to optimize the >>>>>> row-fetching-by-ctid as shown below: >>>>>> 1. When it is time to fire the queued triggers at the >>>>>> statement/transaction end, initialize cursors - one cursor per >>>>>> datanode - which would do: SELECT remote_heap_fetch(table_name, >>>>>> '<ctidlist>'); We can form this ctidlist out of the trigger even list. >>>>>> 2. For each trigger event entry in the trigger queue, FETCH NEXT using >>>>>> the appropriate cursor name according to the datanode id to which the >>>>>> trigger entry belongs. >>>>>> >>>>>> > >>>>>> >>> Currently we fetch all attributes in the SELECT subplans. I have >>>>>> >>> created another patch to fetch only the required attribtues, but >>>>>> have >>>>>> >>> not merged that into this patch. >>>>>> > >>>>>> > Do we have other places where we unnecessary fetch all attributes? >>>>>> > ISTM, this should be fixed as a performance improvement first ahead >>>>>> of >>>>>> > everything else. >>>>>> >>>>>> I believe DML subplan is the only remaining place where we fetch all >>>>>> attributes. And yes, this is a must-have for triggers, otherwise, the >>>>>> other optimizations would be of no use. >>>>>> >>>>>> > >>>>>> >>> 2. One important TODO for BEFORE trigger is this: Just before >>>>>> >>> invoking the trigger functions, in PG, the tuple is row-locked >>>>>> >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >>>>>> >>> from the table. So it is made sure that while all the triggers for >>>>>> >>> that table are executed, no one can update that particular row. >>>>>> >>> In the patch, we haven't locked the row. We need to lock it >>>>>> either by >>>>>> >>> executing : >>>>>> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and >>>>>> then >>>>>> >>> use the returned ROW as the OLD row. >>>>>> >>> OR >>>>>> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so >>>>>> that >>>>>> >>> the row is already locked, and we don't have to lock it again. >>>>>> >>> #2 is simple though it might cause some amount of longer waits in >>>>>> general. >>>>>> >>> Using #1, though the locks would be acquired only when the >>>>>> particular >>>>>> >>> row is updated, the locks would be released only after transaction >>>>>> >>> end, so #1 might not be worth implementing. >>>>>> >>> Also #1 requires another explicit remote fetch for the >>>>>> >>> lock-and-get-latest-version operation. >>>>>> >>> I am more inclined towards #2. >>>>>> >>> >>>>>> >> The option #2 however, has problem of locking too many rows if >>>>>> there are >>>>>> >> coordinator quals in the subplans IOW the number of rows finally >>>>>> updated are >>>>>> >> lesser than the number of rows fetched from the datanode. It can >>>>>> cause >>>>>> >> unwanted deadlocks. Unless there is a way to release these extra >>>>>> locks, I am >>>>>> >> afraid this option will be a problem. >>>>>> >>>>>> True. Regardless of anything else - whether it is deadlocks or longer >>>>>> waits, we should not lock rows that are not to be updated. >>>>>> >>>>>> There is a more general row-locking issue that we need to solve first >>>>>> : 3606317. I anticipate that solving this will solve the trigger >>>>>> specific lock issue. So for triggers, this is a must-have, and I am >>>>>> going to solve this issue as part of this bug 3606317. >>>>>> >>>>>> >> >>>>>> > Deadlocks? ISTM, we can get more lock waits because of this but I do >>>>>> > not see deadlock scenarios.. >>>>>> > >>>>>> > With the FQS shipping work being done by Ashutosh, will we also ship >>>>>> > major chunks of subplans to the datanodes? If yes, then row locking >>>>>> > will only involve required tuples (hopefully) from the coordinator's >>>>>> > point of view. >>>>>> > >>>>>> > Also, something radical is can be invent a new type of FOR [NODE] >>>>>> > UPDATE type lock to minimize the impact of such locking of rows on >>>>>> > datanodes? >>>>>> > >>>>>> > Regards, >>>>>> > Nikhils >>>>>> > >>>>>> >>> >>>>>> >>> 3. The BEFORE trigger function can change the distribution column >>>>>> >>> itself. We need to add a check at the end of the trigger >>>>>> executions. >>>>>> >>> >>>>>> >> >>>>>> >> Good, you thought about that. Yes we should check it. >>>>>> >> >>>>>> >>> >>>>>> >>> 4. Fetching OLD row for WHEN clause handling. >>>>>> >>> >>>>>> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >>>>>> >>> >>>>>> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to >>>>>> work >>>>>> >>> without significant changes, but they are yet to be tested. >>>>>> >>> INSERT/DELETE triggers: Most of the infrastructure has been done >>>>>> while >>>>>> >>> implementing UPDATE triggers. But some changes specific to INSERT >>>>>> and >>>>>> >>> DELETE are yet to be done. >>>>>> >>> Deferred triggers to be tested. >>>>>> >>> >>>>>> >>> 7. Regression analysis. There are some new failures. Will post >>>>>> another >>>>>> >>> fair version of the patch after regression analysis and fixing >>>>>> various >>>>>> >>> TODOs. >>>>>> >>> >>>>>> >>> Comments welcome. >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>> Everyone hates slow websites. So do we. >>>>>> >>> Make your web apps faster with AppDynamics >>>>>> >>> Download AppDynamics Lite for free today: >>>>>> >>> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>> >>> _______________________________________________ >>>>>> >>> Postgres-xc-developers mailing list >>>>>> >>> Pos...@li... >>>>>> >>> >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >>> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Best Wishes, >>>>>> >> Ashutosh Bapat >>>>>> >> EntepriseDB Corporation >>>>>> >> The Enterprise Postgres Company >>>>>> >> >>>>>> >> >>>>>> ------------------------------------------------------------------------------ >>>>>> >> Everyone hates slow websites. So do we. >>>>>> >> Make your web apps faster with AppDynamics >>>>>> >> Download AppDynamics Lite for free today: >>>>>> >> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>> >> _______________________________________________ >>>>>> >> Postgres-xc-developers mailing list >>>>>> >> Pos...@li... >>>>>> >> >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >> >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > StormDB - http://www.stormdb.com >>>>>> > The Database Cloud >>>>>> > Postgres-XC Support and Service >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |