Re: [Postgres-xc-developers] Replicated update and delete

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Feb 12, 2014 at 3:47 PM, Mason Sharp <ms...@tr...>wrote:

>
>
>
> On Wed, Feb 12, 2014 at 1:08 AM, 鈴木 幸市 <ko...@in...> wrote:
>
>>  2014/02/12 15:00、Ashutosh Bapat <ash...@en...> のメール：
>>
>>
>>
>>
>> On Tue, Feb 11, 2014 at 8:03 PM, Abbas Butt <abb...@en...>wrote:
>>
>>>
>>>  The summary of the discussion so far:
>>>
>>>  Approach A: (Suggested by Amit)
>>> In the scan plan, fetch ctid, node_id from all the datanodes.
>>> While scanning, the tuples need to be fetched in the same order,
>>>  may be using order by 1, 2, 3, ...
>>>  Use UPDATE where ctd = ? , but use nodeid-based method to
>>> generate the ExecNodes at execute-time (enhance ExecNodes->en_expr
>>> evaluation so as to use the nodeid from source plan, as against
>>> the distribution column that it currently uses for distributed tables).
>>>  This method will not work as-is in case of non-shippable row triggers.
>>> Because trigger needs to be fired only once per row, and we are going
>>> to execute UPDATE for all of the ctids of a given row corresponding
>>> to all of the datanodes. So somehow we should fire triggers only once.
>>> This method will also hit performance, because currently we fetch *all*
>>> columns and not just ctid, so it's better to first do that optimization
>>> of fetching only reqd columns (there's one pending patch submitted in
>>> the mailing list, which fixes this).
>>>
>>
>>>  Approach B: (Suggested by many)
>>> If the replicated table does not have primary or unique not null key
>>> then error out on a non-shippable update or delete otherwise use the
>>> patch sent by Mason after some testing and refactoring.
>>>
>>>
>>  This would break backward compatibility. Also, a table which is fairly
>> stable and doesn't have a primary key or unique key, will need to be
>> distributed even though it's a perfect candidate for being a replicated
>> table. Also, we have to see if updating primary or unique key would cause a
>> problem.
>>
>>
>>  Then we should keep using the same WHERE close in shipped statement too.
>>    As pointed out, using ctid in replicated table is very dangerous.
>>
>
> I agree.
>
> I don't think it is unreasonable at all to require a primary key or unique
> index for replicated tables... normally one would want to do that. If they
> don't have a primary key, they themselves can just add a SERIAL at creation
> time and use that.
>
> As an alternative, all columns could be used as a fake primary key to try
> to find the particular row. In GridSQL we used that approach, but does not
> seem so clean... I believe that there is a check in there that if multiple
> rows match the criteria that the operation fails since the row is not
> uniquely identifiable. In hindsight, I wish we had not bothered.
>
>
>
>>
>>
>>   Approach C: (Suggested by Amit)
>>> Always have some kind of a hidden (or system) column for replicated
>>> tables.
>>>  Its type can be serial type, or an int column with default
>>> nextval('sequence_type')
>>>  so that it will always be executed on coordinator and use this colum
>>> as primary key.
>>>
>>>
>>  This looks a better approach, but also means that the inserts in the
>> replicated table have to be driven through the coordinator. This might not
>> be that stringent a condition, given that the replicated tables are
>> expected to be fairly stable. Any replicated table being inserted so often
>> would anyway get into the performance problem.
>>
>>
>>  I’m afraid it takes long effort to fix all the influences of this
>> change.   How do you think about this?    As I noted, approach C has good
>> point.   The issue is how long it takes.   With approach B, we can easily
>> change this handling to approach C.   I’d like to have you opinion on this.
>>
>
> It seems unnecessary if the table already has a primary key or unique
> index. Anyway, approach C is the approach that I originally took with
> GridSQL/Stado, adding something called xrowid, but we later disabled it by
> default.
>

Was there any other reason of disabling it other than code simplicity and
maintainability?

> In hindsight I would have saved the trouble and not implemented it to keep
> the code simpler and easier to maintain, and just left it up to the user to
> use a key.
>
> To summarize, I would go with B.
>

What is your stance on the fact that going with option B makes us backward
in-compatible?

>
>
>
>
>>
>>
>>>  My vote is for approach B.
>>>
>>>
>>  Whatever approach is taken, we will have to stick to it in future
>> versions of XC. We can not keep on changing the user visible functionality
>> with every version. Till 1.2 we didn't have the restriction that replicated
>> tables should have a primary key. Now introducing that requirement, means
>> users have to modify their applications. If we change it again, in the next
>> version, we will break the compatibility again. So, considering the long
>> term benefit, we should go with C.
>>
>>
>> This is reasonable to require at this stage since how it is done now is
> very dangerous. As previously mentioned, I have seen this cause problems
> with production data.  I think people would gladly add a key in exchange
> for not having bad things happen with their data.
>
> Also, I would not release any new version of Postgres-XC without this fix.
> If a release of 1.2 is a ways away, there should be an intermediate 1.1.x
> release that fixes this soon. I would not recommend using Postgres-XC for
> people who will be updating replicated tables without the patch I
> submitted, it is too dangerous.
>
>
> --
> Mason Sharp
>
> TransLattice - http://www.translattice.com
> Distributed and Clustered Database Solutions
>
>
>
>
> ------------------------------------------------------------------------------
> Android apps run on BlackBerry 10
> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
> Now with support for Jelly Bean, Bluetooth, Mapview and more.
> Get your Android app in front of a whole new audience.  Start now.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>

-- 
-- 
*Abbas*
 Architect

Ph: 92.334.5100153
 Skype ID: gabbasb
www.enterprisedb.co
<http://www.enterprisedb.com/>m<http://www.enterprisedb.com/>

*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars,
whitepapers<http://www.enterprisedb.com/resources-community>and
more<http://www.enterprisedb.com/resources-community>