From: Abbas B. <abb...@en...> - 2014-02-12 12:16:02
|
On Wed, Feb 12, 2014 at 3:47 PM, Mason Sharp <ms...@tr...>wrote: > > > > On Wed, Feb 12, 2014 at 1:08 AM, 鈴木 幸市 <ko...@in...> wrote: > >> 2014/02/12 15:00、Ashutosh Bapat <ash...@en...> のメール: >> >> >> >> >> On Tue, Feb 11, 2014 at 8:03 PM, Abbas Butt <abb...@en...>wrote: >> >>> >>> The summary of the discussion so far: >>> >>> Approach A: (Suggested by Amit) >>> In the scan plan, fetch ctid, node_id from all the datanodes. >>> While scanning, the tuples need to be fetched in the same order, >>> may be using order by 1, 2, 3, ... >>> Use UPDATE where ctd = ? , but use nodeid-based method to >>> generate the ExecNodes at execute-time (enhance ExecNodes->en_expr >>> evaluation so as to use the nodeid from source plan, as against >>> the distribution column that it currently uses for distributed tables). >>> This method will not work as-is in case of non-shippable row triggers. >>> Because trigger needs to be fired only once per row, and we are going >>> to execute UPDATE for all of the ctids of a given row corresponding >>> to all of the datanodes. So somehow we should fire triggers only once. >>> This method will also hit performance, because currently we fetch *all* >>> columns and not just ctid, so it's better to first do that optimization >>> of fetching only reqd columns (there's one pending patch submitted in >>> the mailing list, which fixes this). >>> >> >>> Approach B: (Suggested by many) >>> If the replicated table does not have primary or unique not null key >>> then error out on a non-shippable update or delete otherwise use the >>> patch sent by Mason after some testing and refactoring. >>> >>> >> This would break backward compatibility. Also, a table which is fairly >> stable and doesn't have a primary key or unique key, will need to be >> distributed even though it's a perfect candidate for being a replicated >> table. Also, we have to see if updating primary or unique key would cause a >> problem. >> >> >> Then we should keep using the same WHERE close in shipped statement too. >> As pointed out, using ctid in replicated table is very dangerous. >> > > I agree. > > I don't think it is unreasonable at all to require a primary key or unique > index for replicated tables... normally one would want to do that. If they > don't have a primary key, they themselves can just add a SERIAL at creation > time and use that. > > As an alternative, all columns could be used as a fake primary key to try > to find the particular row. In GridSQL we used that approach, but does not > seem so clean... I believe that there is a check in there that if multiple > rows match the criteria that the operation fails since the row is not > uniquely identifiable. In hindsight, I wish we had not bothered. > > > >> >> >> Approach C: (Suggested by Amit) >>> Always have some kind of a hidden (or system) column for replicated >>> tables. >>> Its type can be serial type, or an int column with default >>> nextval('sequence_type') >>> so that it will always be executed on coordinator and use this colum >>> as primary key. >>> >>> >> This looks a better approach, but also means that the inserts in the >> replicated table have to be driven through the coordinator. This might not >> be that stringent a condition, given that the replicated tables are >> expected to be fairly stable. Any replicated table being inserted so often >> would anyway get into the performance problem. >> >> >> I’m afraid it takes long effort to fix all the influences of this >> change. How do you think about this? As I noted, approach C has good >> point. The issue is how long it takes. With approach B, we can easily >> change this handling to approach C. I’d like to have you opinion on this. >> > > It seems unnecessary if the table already has a primary key or unique > index. Anyway, approach C is the approach that I originally took with > GridSQL/Stado, adding something called xrowid, but we later disabled it by > default. > Was there any other reason of disabling it other than code simplicity and maintainability? > In hindsight I would have saved the trouble and not implemented it to keep > the code simpler and easier to maintain, and just left it up to the user to > use a key. > > To summarize, I would go with B. > What is your stance on the fact that going with option B makes us backward in-compatible? > > > > >> >> >>> My vote is for approach B. >>> >>> >> Whatever approach is taken, we will have to stick to it in future >> versions of XC. We can not keep on changing the user visible functionality >> with every version. Till 1.2 we didn't have the restriction that replicated >> tables should have a primary key. Now introducing that requirement, means >> users have to modify their applications. If we change it again, in the next >> version, we will break the compatibility again. So, considering the long >> term benefit, we should go with C. >> >> >> This is reasonable to require at this stage since how it is done now is > very dangerous. As previously mentioned, I have seen this cause problems > with production data. I think people would gladly add a key in exchange > for not having bad things happen with their data. > > Also, I would not release any new version of Postgres-XC without this fix. > If a release of 1.2 is a ways away, there should be an intermediate 1.1.x > release that fixes this soon. I would not recommend using Postgres-XC for > people who will be updating replicated tables without the patch I > submitted, it is too dangerous. > > > -- > Mason Sharp > > TransLattice - http://www.translattice.com > Distributed and Clustered Database Solutions > > > > > ------------------------------------------------------------------------------ > Android apps run on BlackBerry 10 > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > Now with support for Jelly Bean, Bluetooth, Mapview and more. > Get your Android app in front of a whole new audience. Start now. > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> *Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> |