From: Koichi S. <koi...@gm...> - 2014-02-12 04:40:22
|
I'd vote for approach B. Some more inputs: I'm not sure if Approach A works fine. Approach C is what I suggested as well. It is just OID but local to a given table. This looks to work against both replicated and distributed table. On the other hand, influence of this is much bigger than approach B and could be much more than XC can handle as its own feature. Approach B looks reasonable. One concern is if it works with current regression. I hope it's not serious because most of the tests use distributed tables, not replicated ones. Regards; --- Koichi Suzuki 2014-02-11 23:33 GMT+09:00 Abbas Butt <abb...@en...>: > > The summary of the discussion so far: > > Approach A: (Suggested by Amit) > In the scan plan, fetch ctid, node_id from all the datanodes. > While scanning, the tuples need to be fetched in the same order, > may be using order by 1, 2, 3, ... > Use UPDATE where ctd = ? , but use nodeid-based method to > generate the ExecNodes at execute-time (enhance ExecNodes->en_expr > evaluation so as to use the nodeid from source plan, as against > the distribution column that it currently uses for distributed tables). > This method will not work as-is in case of non-shippable row triggers. > Because trigger needs to be fired only once per row, and we are going > to execute UPDATE for all of the ctids of a given row corresponding > to all of the datanodes. So somehow we should fire triggers only once. > This method will also hit performance, because currently we fetch *all* > columns and not just ctid, so it's better to first do that optimization > of fetching only reqd columns (there's one pending patch submitted in > the mailing list, which fixes this). > > Approach B: (Suggested by many) > If the replicated table does not have primary or unique not null key > then error out on a non-shippable update or delete otherwise use the > patch sent by Mason after some testing and refactoring. > > Approach C: (Suggested by Amit) > Always have some kind of a hidden (or system) column for replicated tables. > Its type can be serial type, or an int column with default > nextval('sequence_type') > so that it will always be executed on coordinator and use this colum as > primary key. > > My vote is for approach B. > > Comments? > > Best Regards > > > > On Fri, Nov 8, 2013 at 10:05 AM, Amit Khandekar > <ami...@en...> wrote: >> >> >> >> >> On 7 November 2013 19:50, Mason Sharp <ms...@tr...> wrote: >>> >>> >>> >>> >>> On Thu, Nov 7, 2013 at 12:45 AM, 鈴木 幸市 <ko...@in...> wrote: >>>> >>>> Yes, we need to focus on such general solution for replicated tuple >>>> identification. >>>> >>>> I'm afraid it may take much more research and implementation work. I >>>> believe the submitted patch handles tuple replica based on the primary key >>>> or other equivalents if available. If not, the code falls into the current >>>> case, local CTID. The latter could lead to inconsistent replica but it is >>>> far better than the current situation. >>>> >>>> For short-term solution, I think Mason's code looks reasonable if I >>>> understand the patch correctly. >>>> >>>> Mason, do you have any more thoughts/comments? >>> >>> >>> I don't think it is unreasonable if a primary/unique key is required to >>> handle this case. >>> >>> I did some testing, but it would be nice if someone else gave it a test >>> as well. I enabled statement logging to make sure it was doing the right >>> thing. >>> >>> I see someone mentioned a patch out there to only get needed columns. At >>> the moment extra columns may be used, but at least the data remains >>> consistent across the cluster, which is most important here. Unfortunately, >>> I do not have time at the moment to improve this further. If someone else >>> has time, that would be great, or done as a separate commit. >>> >>> Anyway, since this is a critical issue, I think it should get committed >>> to STABLE_1_1 once reviewed. >>> >>> >>> >>>> >>>> --- >>>> Koichi Suzuki >>>> >>>> On 2013/11/07, at 14:21, Amit Khandekar >>>> <ami...@en...> >>>> wrote: >>>> >>>> >>>> >>>> >>>> On 6 November 2013 18:31, Michael Paquier <mic...@gm...> >>>> wrote: >>>>> >>>>> On Wed, Nov 6, 2013 at 3:28 PM, Amit Khandekar >>>>> <ami...@en...> wrote: >>>>> > What exactly does the PostgreSQL FDW doc say about updates and >>>>> > primary key ? >>>>> By having a look here: >>>>> >>>>> http://www.postgresql.org/docs/9.3/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE >>>>> It is recommended to use a kind of row ID or the primary key columns. >>>>> In the case of XC row ID = CTID, and its uniqueness is not guaranteed >>>>> except if coupled with a node ID, which I think it has... Using a CTID >>>>> + node ID combination makes the analysis of tuple uniqueness >>>>> impossible for replicated tables either way, so a primary key would be >>>>> better IMO. >>>>> >>>>> > How does the postgres_fdw update a table that has no primary or >>>>> > unique key ? >>>>> It uses the CTID when scanning remote tuples for UPDATE/DELETE, thing >>>>> guarantying that tuples are unique in this case as the FDW deals with >>>>> a single server, here is for example the case of 2 nodes listening >>>>> ports 5432 and 5433. >>>>> $ psql -p 5433 -c "CREATE TABLE aa (a int, b int);" >>>>> CREATE TABLE >>>>> >>>>> On server with port 5432: >>>>> =# CREATE EXTENSION postgres_fdw; >>>>> CREATE EXTENSION >>>>> =# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw >>>>> OPTIONS (host 'localhost', port '5432', dbname 'ioltas'); >>>>> CREATE SERVER >>>>> =# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS >>>>> (password ''); >>>>> CREATE USER MAPPING >>>>> =# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER >>>>> postgres_server OPTIONS (table_name 'aa'); >>>>> CREATE FOREIGN TABLE >>>>> =# explain verbose update aa_foreign set a = 1, b=2 where a = 1; >>>>> QUERY PLAN >>>>> >>>>> -------------------------------------------------------------------------------- >>>>> Update on public.aa_foreign (cost=100.00..144.40 rows=14 width=6) >>>>> Remote SQL: UPDATE public.aa SET a = $2, b = $3 WHERE ctid = $1 >>>>> -> Foreign Scan on public.aa_foreign (cost=100.00..144.40 rows=14 >>>>> width=6) >>>>> Output: 1, 2, ctid >>>>> Remote SQL: SELECT ctid FROM public.aa WHERE ((a = 1)) FOR >>>>> UPDATE >>>>> (5 rows) >>>>> And ctid is used for scanning... >>>>> >>>>> > In the patch, what do we do when the replicated table has no >>>>> > unique/primary >>>>> > key ? >>>>> I didn't look at the patch, but I think that replicated tables should >>>>> also need a primary key. Let's imagine something like that with >>>>> sessions S1 and S2 for a replication table, and 2 datanodes (1 session >>>>> runs in common on 1 Coordinator and each Datanode): >>>>> S1: INSERT VALUES foo in Dn1 >>>>> S2: INSERT VALUES foo2 in Dn1 >>>>> S2: INSERT VALUES foo2 in Dn2 >>>>> S1: INSERT VALUES foo in Dn2 >>>>> This will imply that those tuples have a different CTID, so a primary >>>>> key would be necessary as I think that this is possible. >>>> >>>> >>>> If the patch does not handle the case of replicated table without unique >>>> key, I think we should have a common solution which takes care of this case >>>> also. Or else, if this solution can be extended to handle no-unique-key >>>> case, then that would be good. But I think we would end up in having two >>>> different implementations, one for unique-key method, and another for the >>>> other method, which does not seem good. >>>> >>>> The method I had in mind was : >>>> In the scan plan, fetch ctid, node_id from all the datanodes. Use >>>> UPDATE where ctd = ? , but use nodeid-based method to generate the >>>> ExecNodes at execute-time (enhance ExecNodes->en_expr evaluation so as to >>>> use the nodeid from source plan, as against the distribution column that it >>>> currently uses for distributed tables) . >>> >>> >>> Would that work in all cases? What if 2 tuples on each node fulfill the >>> criteria and a sequence is value being assigned? Might the tuples be >>> processed in a different order on each node the data ends up being >>> inconsistent (tuple A gets the value 101, B gets the value 102 on node 1, >>> and B gets 101, A gets 102 on node 2). I am not sure it is worth trying to >>> handle the case, and just require a primary key or unique index. >> >> Yes, the tuples need to be fetched in the same order. May be using order >> by 1, 2, 3, ... . (I hope that if one column is found to be having unique >> values for a set of rows, sorting is not attempted again for the same set of >> rows using the remaining columns). >> >> Another approach can be considered where we would always have some kind of >> a hidden (or system) column for replicated tables Its type can be serial >> type, or an int column with default nextval('sequence_type') so that it will >> always be executed on coordinator. And use this one as against the primary >> key. Or may be create table with oids. But not sure if OIDs get >> incremented/wrapped around exactly the same way regardless of anything. >> Also, they are documented as having deprecated. >> >> These are just some thoughts for a long term solution. I myself don't have >> the bandwidth to work on XC at this moment, so won't be able to review the >> patch, so I don't want to fall into a situation where the patch is reworked >> and I won't review it after all. But these are just points to be thought of >> in case the would-be reviewer or the submitter feels like >> extending/modifying this patch for a long term solution taking care of >> tables without primary key. >> >>> >>>> >>>> But this method will not work as-is in case of non-shippable row >>>> triggers. Because trigger needs to be fired only once per row, and we are >>>> going to execute UPDATE for all of the ctids of a given row corresponding to >>>> all of the datanodes. So somehow we should fire triggers only once. This >>>> method will also hit performance, because currently we fetch *all* columns >>>> and not just ctid, so it's better to first do that optimization of fetching >>>> only reqd columns (there's one pending patch submitted in the mailing list, >>>> which fixes this). >>>> >>>> This is just one approach, there might be better approaches.. >>>> >>>> Overall, I think if we decide to get this issue solved (and I think we >>>> should really, this is a serious issue), sufficient resource time needs to >>>> be given to think over and have discussions before we finalize the approach. >>>> >>>> >>>>> -- >>>>> Michael >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> November Webinars for C, C++, Fortran Developers >>>> Accelerate application performance with scalable programming models. >>>> Explore >>>> techniques for threading, error checking, porting, and tuning. Get the >>>> most >>>> from the latest Intel processors and coprocessors. See abstracts and >>>> register >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk_______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> November Webinars for C, C++, Fortran Developers >>>> Accelerate application performance with scalable programming models. >>>> Explore >>>> techniques for threading, error checking, porting, and tuning. Get the >>>> most >>>> from the latest Intel processors and coprocessors. See abstracts and >>>> register >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>> >>> >>> >>> -- >>> Mason Sharp >>> >>> TransLattice - http://www.translattice.com >>> Distributed and Clustered Database Solutions >>> >>> >> >> >> >> ------------------------------------------------------------------------------ >> November Webinars for C, C++, Fortran Developers >> Accelerate application performance with scalable programming models. >> Explore >> techniques for threading, error checking, porting, and tuning. Get the >> most >> from the latest Intel processors and coprocessors. See abstracts and >> register >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > -- > -- > Abbas > Architect > > Ph: 92.334.5100153 > Skype ID: gabbasb > www.enterprisedb.com > > Follow us on Twitter > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers and more > > ------------------------------------------------------------------------------ > Android apps run on BlackBerry 10 > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > Now with support for Jelly Bean, Bluetooth, Mapview and more. > Get your Android app in front of a whole new audience. Start now. > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |