From: Nikhil S. <ni...@st...> - 2013-11-07 05:57:43
|
Isn't it fair to error out if there are no primary/unique keys on the replicated table? IMO, Mason's approach handles things pretty ok for tables with unique type of keys. Regards, Nikhils On Thu, Nov 7, 2013 at 11:15 AM, 鈴木 幸市 <ko...@in...> wrote: > Yes, we need to focus on such general solution for replicated tuple > identification. > > I'm afraid it may take much more research and implementation work. I > believe the submitted patch handles tuple replica based on the primary key > or other equivalents if available. If not, the code falls into the > current case, local CTID. The latter could lead to inconsistent replica > but it is far better than the current situation. > > For short-term solution, I think Mason's code looks reasonable if I > understand the patch correctly. > > Mason, do you have any more thoughts/comments? > --- > Koichi Suzuki > > On 2013/11/07, at 14:21, Amit Khandekar <ami...@en...> > wrote: > > > > > On 6 November 2013 18:31, Michael Paquier <mic...@gm...>wrote: > >> On Wed, Nov 6, 2013 at 3:28 PM, Amit Khandekar >> <ami...@en...> wrote: >> > What exactly does the PostgreSQL FDW doc say about updates and primary >> key ? >> By having a look here: >> >> http://www.postgresql.org/docs/9.3/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE >> It is recommended to use a kind of row ID or the primary key columns. >> In the case of XC row ID = CTID, and its uniqueness is not guaranteed >> except if coupled with a node ID, which I think it has... Using a CTID >> + node ID combination makes the analysis of tuple uniqueness >> impossible for replicated tables either way, so a primary key would be >> better IMO. >> >> > How does the postgres_fdw update a table that has no primary or unique >> key ? >> It uses the CTID when scanning remote tuples for UPDATE/DELETE, thing >> guarantying that tuples are unique in this case as the FDW deals with >> a single server, here is for example the case of 2 nodes listening >> ports 5432 and 5433. >> $ psql -p 5433 -c "CREATE TABLE aa (a int, b int);" >> CREATE TABLE >> >> On server with port 5432: >> =# CREATE EXTENSION postgres_fdw; >> CREATE EXTENSION >> =# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw >> OPTIONS (host 'localhost', port '5432', dbname 'ioltas'); >> CREATE SERVER >> =# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS >> (password ''); >> CREATE USER MAPPING >> =# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER >> postgres_server OPTIONS (table_name 'aa'); >> CREATE FOREIGN TABLE >> =# explain verbose update aa_foreign set a = 1, b=2 where a = 1; >> QUERY PLAN >> >> -------------------------------------------------------------------------------- >> Update on public.aa_foreign (cost=100.00..144.40 rows=14 width=6) >> Remote SQL: UPDATE public.aa SET a = $2, b = $3 WHERE ctid = $1 >> -> Foreign Scan on public.aa_foreign (cost=100.00..144.40 rows=14 >> width=6) >> Output: 1, 2, ctid >> Remote SQL: SELECT ctid FROM public.aa WHERE ((a = 1)) FOR UPDATE >> (5 rows) >> And ctid is used for scanning... >> >> > In the patch, what do we do when the replicated table has no >> unique/primary >> > key ? >> I didn't look at the patch, but I think that replicated tables should >> also need a primary key. Let's imagine something like that with >> sessions S1 and S2 for a replication table, and 2 datanodes (1 session >> runs in common on 1 Coordinator and each Datanode): >> S1: INSERT VALUES foo in Dn1 >> S2: INSERT VALUES foo2 in Dn1 >> S2: INSERT VALUES foo2 in Dn2 >> S1: INSERT VALUES foo in Dn2 >> This will imply that those tuples have a different CTID, so a primary >> key would be necessary as I think that this is possible. >> > > If the patch does not handle the case of replicated table without unique > key, I think we should have a common solution which takes care of this case > also. Or else, if this solution can be extended to handle no-unique-key > case, then that would be good. But I think we would end up in having two > different implementations, one for unique-key method, and another for the > other method, which does not seem good. > > The method I had in mind was : > In the scan plan, fetch ctid, node_id from all the datanodes. Use UPDATE > where ctd = ? , but use nodeid-based method to generate the ExecNodes at > execute-time (enhance ExecNodes->en_expr evaluation so as to use the > nodeid from source plan, as against the distribution column that it > currently uses for distributed tables) . > But this method will not work as-is in case of non-shippable row triggers. > Because trigger needs to be fired only once per row, and we are going to > execute UPDATE for all of the ctids of a given row corresponding to all of > the datanodes. So somehow we should fire triggers only once. This method > will also hit performance, because currently we fetch *all* columns and not > just ctid, so it's better to first do that optimization of fetching only > reqd columns (there's one pending patch submitted in the mailing list, > which fixes this). > > This is just one approach, there might be better approaches.. > > Overall, I think if we decide to get this issue solved (and I think we > should really, this is a serious issue), sufficient resource time needs to > be given to think over and have discussions before we finalize the approach. > > > -- >> Michael >> > > > ------------------------------------------------------------------------------ > November Webinars for C, C++, Fortran Developers > Accelerate application performance with scalable programming models. > Explore > techniques for threading, error checking, porting, and tuning. Get the > most > from the latest Intel processors and coprocessors. See abstracts and > register > > http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk_______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > ------------------------------------------------------------------------------ > November Webinars for C, C++, Fortran Developers > Accelerate application performance with scalable programming models. > Explore > techniques for threading, error checking, porting, and tuning. Get the most > from the latest Intel processors and coprocessors. See abstracts and > register > http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- StormDB - http://www.stormdb.com The Database Cloud |