Re: [Postgres-xc-developers] Replicated update and delete

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

The summary of the discussion so far:

Approach A: (Suggested by Amit)
In the scan plan, fetch ctid, node_id from all the datanodes.
While scanning, the tuples need to be fetched in the same order,
may be using order by 1, 2, 3, ...
Use UPDATE where ctd = ? , but use nodeid-based method to
generate the ExecNodes at execute-time (enhance ExecNodes->en_expr
evaluation so as to use the nodeid from source plan, as against
the distribution column that it currently uses for distributed tables).
This method will not work as-is in case of non-shippable row triggers.
Because trigger needs to be fired only once per row, and we are going
to execute UPDATE for all of the ctids of a given row corresponding
to all of the datanodes. So somehow we should fire triggers only once.
This method will also hit performance, because currently we fetch *all*
columns and not just ctid, so it's better to first do that optimization
of fetching only reqd columns (there's one pending patch submitted in
the mailing list, which fixes this).

Approach B: (Suggested by many)
If the replicated table does not have primary or unique not null key
then error out on a non-shippable update or delete otherwise use the
patch sent by Mason after some testing and refactoring.

Approach C: (Suggested by Amit)
Always have some kind of a hidden (or system) column for replicated tables.
Its type can be serial type, or an int column with default
nextval('sequence_type')
so that it will always be executed on coordinator and use this colum as
primary key.

My vote is for approach B.

Comments?

Best Regards

On Fri, Nov 8, 2013 at 10:05 AM, Amit Khandekar <
ami...@en...> wrote:

>
>
>
> On 7 November 2013 19:50, Mason Sharp <ms...@tr...> wrote:
>
>>
>>
>>
>> On Thu, Nov 7, 2013 at 12:45 AM, 鈴木 幸市 <ko...@in...> wrote:
>>
>>>  Yes, we need to focus on such general solution for replicated tuple
>>> identification.
>>>
>>>  I'm afraid it may take much more research and implementation work.   I
>>> believe the submitted patch handles tuple replica based on the primary key
>>> or other equivalents if available.   If not, the code falls into the
>>> current case, local CTID.   The latter could lead to inconsistent replica
>>> but it is far better than the current situation.
>>>
>>>  For short-term solution, I think Mason's code looks reasonable if I
>>> understand the patch correctly.
>>>
>>>  Mason, do you have any more thoughts/comments?
>>>
>>
>> I don't think it is unreasonable if a primary/unique key is required to
>> handle this case.
>>
>>  I did some testing, but it would be nice if someone else gave it a test
>> as well. I enabled statement logging to make sure it was doing the right
>> thing.
>>
>> I see someone mentioned a patch out there to only get needed columns. At
>> the moment extra columns may be used, but at least the data remains
>> consistent across the cluster, which is most important here. Unfortunately,
>> I do not have time at the moment to improve this further.  If someone else
>> has time, that would be great, or done as a separate commit.
>>
>> Anyway, since this is a critical issue, I think it should get committed
>> to STABLE_1_1 once reviewed.
>>
>>
>>
>>
>>>  ---
>>> Koichi Suzuki
>>>
>>>  On 2013/11/07, at 14:21, Amit Khandekar <
>>> ami...@en...>
>>>  wrote:
>>>
>>>
>>>
>>>
>>> On 6 November 2013 18:31, Michael Paquier <mic...@gm...>wrote:
>>>
>>>> On Wed, Nov 6, 2013 at 3:28 PM, Amit Khandekar
>>>> <ami...@en...> wrote:
>>>> > What exactly does the PostgreSQL FDW doc say about updates and
>>>> primary key ?
>>>>  By having a look here:
>>>>
>>>> http://www.postgresql.org/docs/9.3/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE
>>>> It is recommended to use a kind of row ID or the primary key columns.
>>>> In the case of XC row ID = CTID, and its uniqueness is not guaranteed
>>>> except if coupled with a node ID, which I think it has... Using a CTID
>>>> + node ID combination makes the analysis of tuple uniqueness
>>>> impossible for replicated tables either way, so a primary key would be
>>>> better IMO.
>>>>
>>>> > How does the postgres_fdw update a table that has no primary or
>>>> unique key ?
>>>>  It uses the CTID when scanning remote tuples for UPDATE/DELETE, thing
>>>> guarantying that tuples are unique in this case as the FDW deals with
>>>> a single server, here is for example the case of 2 nodes listening
>>>> ports 5432 and 5433.
>>>> $ psql -p 5433 -c "CREATE TABLE aa (a int, b int);"
>>>> CREATE TABLE
>>>>
>>>> On server with port 5432:
>>>> =# CREATE EXTENSION postgres_fdw;
>>>> CREATE EXTENSION
>>>> =# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
>>>> OPTIONS (host 'localhost', port '5432', dbname 'ioltas');
>>>> CREATE SERVER
>>>> =# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS
>>>> (password '');
>>>> CREATE USER MAPPING
>>>> =# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER
>>>> postgres_server OPTIONS (table_name 'aa');
>>>> CREATE FOREIGN TABLE
>>>> =# explain verbose update  aa_foreign set a = 1, b=2 where a = 1;
>>>>                                    QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------
>>>>  Update on public.aa_foreign  (cost=100.00..144.40 rows=14 width=6)
>>>>    Remote SQL: UPDATE public.aa SET a = $2, b = $3 WHERE ctid = $1
>>>>    ->  Foreign Scan on public.aa_foreign  (cost=100.00..144.40 rows=14
>>>> width=6)
>>>>          Output: 1, 2, ctid
>>>>          Remote SQL: SELECT ctid FROM public.aa WHERE ((a = 1)) FOR
>>>> UPDATE
>>>> (5 rows)
>>>> And ctid is used for scanning...
>>>>
>>>> > In the patch, what do we do when the replicated table has no
>>>> unique/primary
>>>> > key ?
>>>>  I didn't look at the patch, but I think that replicated tables should
>>>> also need a primary key. Let's imagine something like that with
>>>> sessions S1 and S2 for a replication table, and 2 datanodes (1 session
>>>> runs in common on 1 Coordinator and each Datanode):
>>>> S1: INSERT VALUES foo in Dn1
>>>> S2: INSERT VALUES foo2 in Dn1
>>>> S2: INSERT VALUES foo2 in Dn2
>>>> S1: INSERT VALUES foo in Dn2
>>>> This will imply that those tuples have a different CTID, so a primary
>>>> key would be necessary as I think that this is possible.
>>>>
>>>
>>>  If the patch does not handle the case of replicated table without
>>> unique key, I think we should have a common solution which takes care of
>>> this case also. Or else, if this solution can be extended to handle
>>> no-unique-key case, then that would be good. But I think we would end up in
>>> having two different implementations, one for unique-key method, and
>>> another for the other method, which does not seem good.
>>>
>>>  The method I had in mind was :
>>> In the scan plan, fetch ctid, node_id from all the datanodes.  Use
>>> UPDATE where ctd = ? , but use nodeid-based method to  generate the
>>> ExecNodes at execute-time  (enhance ExecNodes->en_expr evaluation so as to
>>> use the nodeid from source plan, as against the distribution column that it
>>> currently uses for distributed tables) .
>>>
>>>
>> Would that work in all cases? What if 2 tuples on each node fulfill the
>> criteria and a sequence is value being assigned? Might the tuples be
>> processed in a different order on each node the data ends up being
>> inconsistent  (tuple A gets the value 101, B gets the value 102 on node 1,
>> and B gets 101, A gets 102 on node 2).  I am not sure it is worth trying to
>> handle the case, and just require a primary key or unique index.
>>
> Yes, the tuples need to be fetched in the same order. May be using order
> by 1, 2, 3, ... . (I hope that if one column is found to be having unique
> values for a set of rows, sorting is not attempted again for the same set
> of rows using the remaining columns).
>
> Another approach can be considered where we would always have some kind of
> a hidden (or system) column for replicated tables Its type can be serial
> type, or an int column with default nextval('sequence_type') so that it
> will always be executed on coordinator. And use this one as against the
> primary key. Or may be create table with oids. But not sure if OIDs get
> incremented/wrapped around exactly the same way regardless of anything.
> Also, they are documented as having deprecated.
>
> These are just some thoughts for a long term solution. I myself don't have
> the bandwidth to work on XC at this moment, so won't be able to review the
> patch, so I don't want to fall into a situation where the patch is reworked
> and I won't review it after all. But these are just points to be thought of
> in case the would-be reviewer or the submitter feels like
> extending/modifying this patch for a long term solution taking care of
> tables without primary key.
>
>
>>
>>>  But this method will not work as-is in case of non-shippable row
>>> triggers. Because trigger needs to be fired only once per row, and we are
>>> going to execute UPDATE for all of the ctids of a given row corresponding
>>> to all of the datanodes. So somehow we should fire triggers only once. This
>>> method will also hit performance, because currently we fetch *all* columns
>>> and not just ctid, so it's better to first do that optimization of fetching
>>> only reqd columns (there's one pending patch submitted in the mailing list,
>>> which fixes this).
>>>
>>>  This is just one approach, there might be better approaches..
>>>
>>>  Overall, I think if we decide to get this issue solved (and I think we
>>> should really, this is a serious issue), sufficient resource time needs to
>>> be given to think over and have discussions before we finalize the approach.
>>>
>>>
>>>  --
>>>> Michael
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> November Webinars for C, C++, Fortran Developers
>>> Accelerate application performance with scalable programming models.
>>> Explore
>>> techniques for threading, error checking, porting, and tuning. Get the
>>> most
>>> from the latest Intel processors and coprocessors. See abstracts and
>>> register
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk_______________________________________________
>>> Postgres-xc-developers mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> November Webinars for C, C++, Fortran Developers
>>> Accelerate application performance with scalable programming models.
>>> Explore
>>> techniques for threading, error checking, porting, and tuning. Get the
>>> most
>>> from the latest Intel processors and coprocessors. See abstracts and
>>> register
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Postgres-xc-developers mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>
>>>
>>
>>
>> --
>> Mason Sharp
>>
>> TransLattice - http://www.translattice.com
>> Distributed and Clustered Database Solutions
>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models.
> Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and
> register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>

-- 
-- 
*Abbas*
 Architect

Ph: 92.334.5100153
 Skype ID: gabbasb
www.enterprisedb.co
<http://www.enterprisedb.com/>m<http://www.enterprisedb.com/>

*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars,
whitepapers<http://www.enterprisedb.com/resources-community>and
more<http://www.enterprisedb.com/resources-community>