Re: [Postgres-xc-developers] Replicated update and delete

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I'd vote for approach B.

Some more inputs:

I'm not sure if Approach A works fine.  Approach C is what I suggested
as well.   It is just OID but local to a given table.   This looks to
work against both replicated and distributed table.   On the other
hand, influence of this is much bigger than approach B and could be
much more than XC can handle as its own feature.

Approach B looks reasonable.   One concern is if it works with current
regression.   I hope it's not serious because most of the tests use
distributed tables, not replicated ones.

Regards;
---
Koichi Suzuki

2014-02-11 23:33 GMT+09:00 Abbas Butt <abb...@en...>:
>
> The summary of the discussion so far:
>
> Approach A: (Suggested by Amit)
> In the scan plan, fetch ctid, node_id from all the datanodes.
> While scanning, the tuples need to be fetched in the same order,
> may be using order by 1, 2, 3, ...
> Use UPDATE where ctd = ? , but use nodeid-based method to
> generate the ExecNodes at execute-time (enhance ExecNodes->en_expr
> evaluation so as to use the nodeid from source plan, as against
> the distribution column that it currently uses for distributed tables).
> This method will not work as-is in case of non-shippable row triggers.
> Because trigger needs to be fired only once per row, and we are going
> to execute UPDATE for all of the ctids of a given row corresponding
> to all of the datanodes. So somehow we should fire triggers only once.
> This method will also hit performance, because currently we fetch *all*
> columns and not just ctid, so it's better to first do that optimization
> of fetching only reqd columns (there's one pending patch submitted in
> the mailing list, which fixes this).
>
> Approach B: (Suggested by many)
> If the replicated table does not have primary or unique not null key
> then error out on a non-shippable update or delete otherwise use the
> patch sent by Mason after some testing and refactoring.
>
> Approach C: (Suggested by Amit)
> Always have some kind of a hidden (or system) column for replicated tables.
> Its type can be serial type, or an int column with default
> nextval('sequence_type')
> so that it will always be executed on coordinator and use this colum as
> primary key.
>
> My vote is for approach B.
>
> Comments?
>
> Best Regards
>
>
>
> On Fri, Nov 8, 2013 at 10:05 AM, Amit Khandekar
> <ami...@en...> wrote:
>>
>>
>>
>>
>> On 7 November 2013 19:50, Mason Sharp <ms...@tr...> wrote:
>>>
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 12:45 AM, 鈴木 幸市 <ko...@in...> wrote:
>>>>
>>>> Yes, we need to focus on such general solution for replicated tuple
>>>> identification.
>>>>
>>>> I'm afraid it may take much more research and implementation work.   I
>>>> believe the submitted patch handles tuple replica based on the primary key
>>>> or other equivalents if available.   If not, the code falls into the current
>>>> case, local CTID.   The latter could lead to inconsistent replica but it is
>>>> far better than the current situation.
>>>>
>>>> For short-term solution, I think Mason's code looks reasonable if I
>>>> understand the patch correctly.
>>>>
>>>> Mason, do you have any more thoughts/comments?
>>>
>>>
>>> I don't think it is unreasonable if a primary/unique key is required to
>>> handle this case.
>>>
>>> I did some testing, but it would be nice if someone else gave it a test
>>> as well. I enabled statement logging to make sure it was doing the right
>>> thing.
>>>
>>> I see someone mentioned a patch out there to only get needed columns. At
>>> the moment extra columns may be used, but at least the data remains
>>> consistent across the cluster, which is most important here. Unfortunately,
>>> I do not have time at the moment to improve this further.  If someone else
>>> has time, that would be great, or done as a separate commit.
>>>
>>> Anyway, since this is a critical issue, I think it should get committed
>>> to STABLE_1_1 once reviewed.
>>>
>>>
>>>
>>>>
>>>> ---
>>>> Koichi Suzuki
>>>>
>>>> On 2013/11/07, at 14:21, Amit Khandekar
>>>> <ami...@en...>
>>>>  wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On 6 November 2013 18:31, Michael Paquier <mic...@gm...>
>>>> wrote:
>>>>>
>>>>> On Wed, Nov 6, 2013 at 3:28 PM, Amit Khandekar
>>>>> <ami...@en...> wrote:
>>>>> > What exactly does the PostgreSQL FDW doc say about updates and
>>>>> > primary key ?
>>>>> By having a look here:
>>>>>
>>>>> http://www.postgresql.org/docs/9.3/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE
>>>>> It is recommended to use a kind of row ID or the primary key columns.
>>>>> In the case of XC row ID = CTID, and its uniqueness is not guaranteed
>>>>> except if coupled with a node ID, which I think it has... Using a CTID
>>>>> + node ID combination makes the analysis of tuple uniqueness
>>>>> impossible for replicated tables either way, so a primary key would be
>>>>> better IMO.
>>>>>
>>>>> > How does the postgres_fdw update a table that has no primary or
>>>>> > unique key ?
>>>>> It uses the CTID when scanning remote tuples for UPDATE/DELETE, thing
>>>>> guarantying that tuples are unique in this case as the FDW deals with
>>>>> a single server, here is for example the case of 2 nodes listening
>>>>> ports 5432 and 5433.
>>>>> $ psql -p 5433 -c "CREATE TABLE aa (a int, b int);"
>>>>> CREATE TABLE
>>>>>
>>>>> On server with port 5432:
>>>>> =# CREATE EXTENSION postgres_fdw;
>>>>> CREATE EXTENSION
>>>>> =# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
>>>>> OPTIONS (host 'localhost', port '5432', dbname 'ioltas');
>>>>> CREATE SERVER
>>>>> =# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS
>>>>> (password '');
>>>>> CREATE USER MAPPING
>>>>> =# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER
>>>>> postgres_server OPTIONS (table_name 'aa');
>>>>> CREATE FOREIGN TABLE
>>>>> =# explain verbose update  aa_foreign set a = 1, b=2 where a = 1;
>>>>>                                    QUERY PLAN
>>>>>
>>>>> --------------------------------------------------------------------------------
>>>>>  Update on public.aa_foreign  (cost=100.00..144.40 rows=14 width=6)
>>>>>    Remote SQL: UPDATE public.aa SET a = $2, b = $3 WHERE ctid = $1
>>>>>    ->  Foreign Scan on public.aa_foreign  (cost=100.00..144.40 rows=14
>>>>> width=6)
>>>>>          Output: 1, 2, ctid
>>>>>          Remote SQL: SELECT ctid FROM public.aa WHERE ((a = 1)) FOR
>>>>> UPDATE
>>>>> (5 rows)
>>>>> And ctid is used for scanning...
>>>>>
>>>>> > In the patch, what do we do when the replicated table has no
>>>>> > unique/primary
>>>>> > key ?
>>>>> I didn't look at the patch, but I think that replicated tables should
>>>>> also need a primary key. Let's imagine something like that with
>>>>> sessions S1 and S2 for a replication table, and 2 datanodes (1 session
>>>>> runs in common on 1 Coordinator and each Datanode):
>>>>> S1: INSERT VALUES foo in Dn1
>>>>> S2: INSERT VALUES foo2 in Dn1
>>>>> S2: INSERT VALUES foo2 in Dn2
>>>>> S1: INSERT VALUES foo in Dn2
>>>>> This will imply that those tuples have a different CTID, so a primary
>>>>> key would be necessary as I think that this is possible.
>>>>
>>>>
>>>> If the patch does not handle the case of replicated table without unique
>>>> key, I think we should have a common solution which takes care of this case
>>>> also. Or else, if this solution can be extended to handle no-unique-key
>>>> case, then that would be good. But I think we would end up in having two
>>>> different implementations, one for unique-key method, and another for the
>>>> other method, which does not seem good.
>>>>
>>>> The method I had in mind was :
>>>> In the scan plan, fetch ctid, node_id from all the datanodes.  Use
>>>> UPDATE where ctd = ? , but use nodeid-based method to  generate the
>>>> ExecNodes at execute-time  (enhance ExecNodes->en_expr evaluation so as to
>>>> use the nodeid from source plan, as against the distribution column that it
>>>> currently uses for distributed tables) .
>>>
>>>
>>> Would that work in all cases? What if 2 tuples on each node fulfill the
>>> criteria and a sequence is value being assigned? Might the tuples be
>>> processed in a different order on each node the data ends up being
>>> inconsistent  (tuple A gets the value 101, B gets the value 102 on node 1,
>>> and B gets 101, A gets 102 on node 2).  I am not sure it is worth trying to
>>> handle the case, and just require a primary key or unique index.
>>
>> Yes, the tuples need to be fetched in the same order. May be using order
>> by 1, 2, 3, ... . (I hope that if one column is found to be having unique
>> values for a set of rows, sorting is not attempted again for the same set of
>> rows using the remaining columns).
>>
>> Another approach can be considered where we would always have some kind of
>> a hidden (or system) column for replicated tables Its type can be serial
>> type, or an int column with default nextval('sequence_type') so that it will
>> always be executed on coordinator. And use this one as against the primary
>> key. Or may be create table with oids. But not sure if OIDs get
>> incremented/wrapped around exactly the same way regardless of anything.
>> Also, they are documented as having deprecated.
>>
>> These are just some thoughts for a long term solution. I myself don't have
>> the bandwidth to work on XC at this moment, so won't be able to review the
>> patch, so I don't want to fall into a situation where the patch is reworked
>> and I won't review it after all. But these are just points to be thought of
>> in case the would-be reviewer or the submitter feels like
>> extending/modifying this patch for a long term solution taking care of
>> tables without primary key.
>>
>>>
>>>>
>>>> But this method will not work as-is in case of non-shippable row
>>>> triggers. Because trigger needs to be fired only once per row, and we are
>>>> going to execute UPDATE for all of the ctids of a given row corresponding to
>>>> all of the datanodes. So somehow we should fire triggers only once. This
>>>> method will also hit performance, because currently we fetch *all* columns
>>>> and not just ctid, so it's better to first do that optimization of fetching
>>>> only reqd columns (there's one pending patch submitted in the mailing list,
>>>> which fixes this).
>>>>
>>>> This is just one approach, there might be better approaches..
>>>>
>>>> Overall, I think if we decide to get this issue solved (and I think we
>>>> should really, this is a serious issue), sufficient resource time needs to
>>>> be given to think over and have discussions before we finalize the approach.
>>>>
>>>>
>>>>> --
>>>>> Michael
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> November Webinars for C, C++, Fortran Developers
>>>> Accelerate application performance with scalable programming models.
>>>> Explore
>>>> techniques for threading, error checking, porting, and tuning. Get the
>>>> most
>>>> from the latest Intel processors and coprocessors. See abstracts and
>>>> register
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk_______________________________________________
>>>> Postgres-xc-developers mailing list
>>>> Pos...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> November Webinars for C, C++, Fortran Developers
>>>> Accelerate application performance with scalable programming models.
>>>> Explore
>>>> techniques for threading, error checking, porting, and tuning. Get the
>>>> most
>>>> from the latest Intel processors and coprocessors. See abstracts and
>>>> register
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Postgres-xc-developers mailing list
>>>> Pos...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>
>>>
>>>
>>>
>>> --
>>> Mason Sharp
>>>
>>> TransLattice - http://www.translattice.com
>>> Distributed and Clustered Database Solutions
>>>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> November Webinars for C, C++, Fortran Developers
>> Accelerate application performance with scalable programming models.
>> Explore
>> techniques for threading, error checking, porting, and tuning. Get the
>> most
>> from the latest Intel processors and coprocessors. See abstracts and
>> register
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Postgres-xc-developers mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>
>
>
>
> --
> --
> Abbas
> Architect
>
> Ph: 92.334.5100153
> Skype ID: gabbasb
> www.enterprisedb.com
>
> Follow us on Twitter
> @EnterpriseDB
>
> Visit EnterpriseDB for tutorials, webinars, whitepapers and more
>
> ------------------------------------------------------------------------------
> Android apps run on BlackBerry 10
> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
> Now with support for Jelly Bean, Bluetooth, Mapview and more.
> Get your Android app in front of a whole new audience.  Start now.
> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>