Re: [Postgres-xc-developers] Patch for fetching only required columns in case of DMLs - Bug 3609665

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Amit,

As i understand this is not directly related to the trigger implementation
but it is a optimization in general for DML queries and it will how affect
how trigger function? Secondly you want to test your trigger implementation
with this optimization in place? I believe we need to have this
optimization checked in before the feature freeze date.

So i am wondering if it is worth Ashutosh reviewing this now or waiting to
review this after you have submitted the trigger implementation patch. I am
assuming that you will take most/all of this week to submit the trigger's
patch?

On Mon, Apr 1, 2013 at 11:58 AM, Amit Khandekar <
ami...@en...> wrote:

> The reason I wanted to get this checked in is because I wanted to test the
> trigger work with how the final target list would look like , i.e. having
> just ctid, node_id attributes. Currently the code accidentally was picking
> up the OLD row incorrectly from the redundant attributes, and then I
> realized it should not be doing so, although the results were correct.
>
> But I am ok if you feel you want to review the trigger work first. I have
> begun with the tuplestore-approach implementation for  AFTER triggers.
>

Did you do any benchmarking between the two approaches before you decided
to go with the tuplestore-approach?

Thanks,
Ahsan

>
>
> On 1 April 2013 12:17, Ashutosh Bapat <ash...@en...>wrote:
>
>>
>>
>> On Mon, Apr 1, 2013 at 12:14 PM, Amit Khandekar <
>> ami...@en...> wrote:
>>
>>> It will not block or affect the trigger support as far as correctness is
>>> concerned. But for triggers, we need to fetch the complete row, so we don't
>>> want to include both the OLD row and  all the unnecessary columns in the
>>> SELECT list.
>>>
>>>
>> If it's so, I think, we will leave it aside for now, complete minimal
>> trigger work and then get back to it. Is that fine?
>>
>>
>>> Currently the SELECT is like this :
>>> SELECT col1, col2, col3 .... , ctid, xc_node_id from tab1
>>> For triggers, it would be :
>>> SELECT col1, col2, col3 .... , , tab1.* ctid, xc_node_id from tab1
>>>
>>> What we want is :
>>> Without triggers:
>>> SELECT ctid, xc_node_id from tab1
>>> With triggers:
>>> SELECT ctid, xc_node_id, tab1.*  from tab1
>>>
>>>
>>>
>>>
>>> On 1 April 2013 10:49, Ashutosh Bapat <ash...@en...>wrote:
>>>
>>>> Hi Amit,
>>>> Does this have any relation with triggers, if so, can you please
>>>> explain this relation?
>>>>
>>>> On Mon, Apr 1, 2013 at 9:57 AM, Amit Khandekar <
>>>> ami...@en...> wrote:
>>>>
>>>>> Attached is a patch dml-targetlist.patch which prevents unnecessary
>>>>> columns from being fetched from the ModifyTable SELECT subplans (bug
>>>>> 3609665<https://sourceforge.net/tracker/?func=detail&aid=3609665&group_id=311227&atid=1310232>
>>>>> )
>>>>>
>>>>> -------
>>>>>
>>>>> In expand_targetlist(), currently Vars of all the table attributes
>>>>> that are absent are added into the target list.  The patch adds NULL
>>>>> constants instead of Var nodes, so that those attributes will not be
>>>>> fetched from the SELECT. We still want to keep NULL constants because we
>>>>> need the complete tuple. We don't want to change the PG way of having all
>>>>> the attributes of the NEW ROW tuple descriptor of the source plan slot.
>>>>> Also other PG things like constraint expressions and triggers assume this
>>>>> format, so it is convenient to not modify this design.
>>>>>
>>>>> Using NULL constants will cause the remote SELECT statement to
>>>>> completely exclude all the unnecessary columns. It does not even have NULLs.
>>>>>
>>>>>
>>>>> Below is how the new plan will look now:
>>>>>
>>>>> # explain verbose update newtab *set name5 = f(name5) where name4 =
>>>>> f(name4)*;
>>>>>  Update on public.newtab  (cost=0.00..250.00 rows=1000 width=234)
>>>>>    Node/s: data_node_1, data_node_2
>>>>>    Node expr: newtab.id
>>>>>    Remote query: UPDATE ONLY public.newtab SET name5 = $6 WHERE ctid =
>>>>> $7 AND xc_node_id = $9
>>>>>    ->  Data Node Scan on newtab "_REMOTE_TABLE_QUERY_"
>>>>> (cost=0.00..250.00 rows=1000 width=234)
>>>>>          Output: newtab.id, NULL::character varying, NULL::character
>>>>> varying, NULL::character varying, NULL::character varying, f(newtab.name5),
>>>>> newtab.ctid, newtab.*, newtab.xc_node_id
>>>>>          Node/s: data_node_1, data_node_2
>>>>>          Remote query: *SELECT id, name5, ctid, newtab.*::newtab,
>>>>> xc_node_id, name4 *FROM ONLY newtab WHERE true
>>>>>          Coordinator quals: ((newtab.name4)::text =
>>>>> (f(newtab.name4))::text)
>>>>>
>>>>>
>>>>> Notice the following:
>>>>>
>>>>> 1. Remote query now does not have any other columns namely name1,
>>>>> name2, name3.
>>>>>
>>>>> 2. The column id is still present. This is because it is a
>>>>> distribution column, and currently we use this column to decide the
>>>>> target node at execution time. Unfortunately I could not exclude the
>>>>> distribution column from the SELECT subplan target list. Doing this
>>>>> required nodeid-based target datanode determination for updates/deletes.
>>>>> Currently update/deletes use distribution column even if we fetch
>>>>> xc_node_id. I had almost come up with this nodeid-based implementation
>>>>> except that EXPLAIN did not correctly show the en_expr expression if that
>>>>> expression is a xc_node_id Var. Have attached another patch
>>>>> (working_nodeid_based.patch) that includes both : selecting reqd columns,
>>>>> plus node_id based target datanode determination. Due to the
>>>>> explain-of-en_expr issue, we need to first checkin dml-targetlist.patch,
>>>>> because that optimization is important for trigger implementation. The
>>>>> nodeid-based implementation does not compliment any of the changes in this
>>>>> patch.
>>>>>
>>>>> -------
>>>>>
>>>>> In rewriteTargetListUD(), we had added into the parse->targetlist the
>>>>> attributes need for quals; the RemoteQuery->base_tlist anyways does this
>>>>> job of getting the attributes added up in the base_tlist so it is not
>>>>> required to do this in rewrite. I have removed it.
>>>>>
>>>>> -------
>>>>>
>>>>> Since now there are only required columns in the SELECT subplan, we
>>>>> cannot fire check and not-null constraints on coordinator. For this, in the
>>>>> patch, we now include those columns that are referred in the constraint
>>>>> expressions. Also, we keep the not-null constraint check deferred for the
>>>>> datanode. We anyways need the final values for the constraints to get
>>>>> fired, and on datanode we know these are the final values. So in general
>>>>> when possible we should not fire constraints on coordinator. When all the
>>>>> constraints are shippable, they should be fired on datanode, else they
>>>>> should be fired on coordinator. In fact, we don't currently have a check
>>>>> whether the constraint is shippable, so this is a bug. Have added this
>>>>> constraint-shippablity check also.
>>>>>
>>>>> ------
>>>>>
>>>>> In the future, we could also consider optimization in the BIND data
>>>>> row. Currently it includes NULL values for parameters that are not used in
>>>>> the remote statement. We could just skip those parameters instead and thus
>>>>> reduce the data row to some extent. But this requires tweaking the
>>>>> parameter numbers ($1, $2 etc) generated in the deparsed statement.
>>>>>
>>>>> ------
>>>>>
>>>>>
>>>>> Tests
>>>>> ======
>>>>>
>>>>> I have used xc_constraints test to add new check-constraint related
>>>>> tests. Otherwise, the existing tests are more than enough to test this
>>>>> patch. See the other regression test changes in the patch to get an idea of
>>>>> how this patch affects the plan target list.
>>>>>
>>>>> In xc_FQS.sql, the target list of FQSed queries are printed. I suspect
>>>>> that the target list does not reflect correct columns currently. I suspect
>>>>> set_plan_references changes the varattnos but they do not end up  correctly
>>>>> reflecting the actual target list. The diff in the test file is because we
>>>>> have lesser columns, but the column names anyway do not show the correct
>>>>> columns.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Own the Future-Intel&reg; Level Up Game Demo Contest 2013
>>>>> Rise to greatness in Intel's independent game demo contest.
>>>>> Compete for recognition, cash, and the chance to get your game
>>>>> on Steam. $5K grand prize plus 10 genre and skill prizes.
>>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
>>>>> _______________________________________________
>>>>> Postgres-xc-developers mailing list
>>>>> Pos...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes,
>>>> Ashutosh Bapat
>>>> EntepriseDB Corporation
>>>> The Enterprise Postgres Company
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>
>
>
> ------------------------------------------------------------------------------
> Own the Future-Intel&reg; Level Up Game Demo Contest 2013
> Rise to greatness in Intel's independent game demo contest.
> Compete for recognition, cash, and the chance to get your game
> on Steam. $5K grand prize plus 10 genre and skill prizes.
> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>

-- 
Ahsan Hadi
Snr Director Product Development
EnterpriseDB Corporation
The Enterprise Postgres Company

Phone: +92-51-8358874
Mobile: +92-333-5162114

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.