Re: [Postgres-xc-developers] Patch for fetching only required columns in case of DMLs - Bug 3609665

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

The reason I wanted to get this checked in is because I wanted to test the
trigger work with how the final target list would look like , i.e. having
just ctid, node_id attributes. Currently the code accidentally was picking
up the OLD row incorrectly from the redundant attributes, and then I
realized it should not be doing so, although the results were correct.

But I am ok if you feel you want to review the trigger work first. I have
begun with the tuplestore-approach implementation for  AFTER triggers.

On 1 April 2013 12:17, Ashutosh Bapat <ash...@en...>wrote:

>
>
> On Mon, Apr 1, 2013 at 12:14 PM, Amit Khandekar <
> ami...@en...> wrote:
>
>> It will not block or affect the trigger support as far as correctness is
>> concerned. But for triggers, we need to fetch the complete row, so we don't
>> want to include both the OLD row and  all the unnecessary columns in the
>> SELECT list.
>>
>>
> If it's so, I think, we will leave it aside for now, complete minimal
> trigger work and then get back to it. Is that fine?
>
>
>> Currently the SELECT is like this :
>> SELECT col1, col2, col3 .... , ctid, xc_node_id from tab1
>> For triggers, it would be :
>> SELECT col1, col2, col3 .... , , tab1.* ctid, xc_node_id from tab1
>>
>> What we want is :
>> Without triggers:
>> SELECT ctid, xc_node_id from tab1
>> With triggers:
>> SELECT ctid, xc_node_id, tab1.*  from tab1
>>
>>
>>
>>
>> On 1 April 2013 10:49, Ashutosh Bapat <ash...@en...>wrote:
>>
>>> Hi Amit,
>>> Does this have any relation with triggers, if so, can you please explain
>>> this relation?
>>>
>>> On Mon, Apr 1, 2013 at 9:57 AM, Amit Khandekar <
>>> ami...@en...> wrote:
>>>
>>>> Attached is a patch dml-targetlist.patch which prevents unnecessary
>>>> columns from being fetched from the ModifyTable SELECT subplans (bug
>>>> 3609665<https://sourceforge.net/tracker/?func=detail&aid=3609665&group_id=311227&atid=1310232>
>>>> )
>>>>
>>>> -------
>>>>
>>>> In expand_targetlist(), currently Vars of all the table attributes that
>>>> are absent are added into the target list.  The patch adds NULL constants
>>>> instead of Var nodes, so that those attributes will not be fetched from the
>>>> SELECT. We still want to keep NULL constants because we need the complete
>>>> tuple. We don't want to change the PG way of having all the attributes of
>>>> the NEW ROW tuple descriptor of the source plan slot. Also other PG things
>>>> like constraint expressions and triggers assume this format, so it is
>>>> convenient to not modify this design.
>>>>
>>>> Using NULL constants will cause the remote SELECT statement to
>>>> completely exclude all the unnecessary columns. It does not even have NULLs.
>>>>
>>>>
>>>> Below is how the new plan will look now:
>>>>
>>>> # explain verbose update newtab *set name5 = f(name5) where name4 =
>>>> f(name4)*;
>>>>  Update on public.newtab  (cost=0.00..250.00 rows=1000 width=234)
>>>>    Node/s: data_node_1, data_node_2
>>>>    Node expr: newtab.id
>>>>    Remote query: UPDATE ONLY public.newtab SET name5 = $6 WHERE ctid =
>>>> $7 AND xc_node_id = $9
>>>>    ->  Data Node Scan on newtab "_REMOTE_TABLE_QUERY_"
>>>> (cost=0.00..250.00 rows=1000 width=234)
>>>>          Output: newtab.id, NULL::character varying, NULL::character
>>>> varying, NULL::character varying, NULL::character varying, f(newtab.name5),
>>>> newtab.ctid, newtab.*, newtab.xc_node_id
>>>>          Node/s: data_node_1, data_node_2
>>>>          Remote query: *SELECT id, name5, ctid, newtab.*::newtab,
>>>> xc_node_id, name4 *FROM ONLY newtab WHERE true
>>>>          Coordinator quals: ((newtab.name4)::text =
>>>> (f(newtab.name4))::text)
>>>>
>>>>
>>>> Notice the following:
>>>>
>>>> 1. Remote query now does not have any other columns namely name1,
>>>> name2, name3.
>>>>
>>>> 2. The column id is still present. This is because it is a
>>>> distribution column, and currently we use this column to decide the
>>>> target node at execution time. Unfortunately I could not exclude the
>>>> distribution column from the SELECT subplan target list. Doing this
>>>> required nodeid-based target datanode determination for updates/deletes.
>>>> Currently update/deletes use distribution column even if we fetch
>>>> xc_node_id. I had almost come up with this nodeid-based implementation
>>>> except that EXPLAIN did not correctly show the en_expr expression if that
>>>> expression is a xc_node_id Var. Have attached another patch
>>>> (working_nodeid_based.patch) that includes both : selecting reqd columns,
>>>> plus node_id based target datanode determination. Due to the
>>>> explain-of-en_expr issue, we need to first checkin dml-targetlist.patch,
>>>> because that optimization is important for trigger implementation. The
>>>> nodeid-based implementation does not compliment any of the changes in this
>>>> patch.
>>>>
>>>> -------
>>>>
>>>> In rewriteTargetListUD(), we had added into the parse->targetlist the
>>>> attributes need for quals; the RemoteQuery->base_tlist anyways does this
>>>> job of getting the attributes added up in the base_tlist so it is not
>>>> required to do this in rewrite. I have removed it.
>>>>
>>>> -------
>>>>
>>>> Since now there are only required columns in the SELECT subplan, we
>>>> cannot fire check and not-null constraints on coordinator. For this, in the
>>>> patch, we now include those columns that are referred in the constraint
>>>> expressions. Also, we keep the not-null constraint check deferred for the
>>>> datanode. We anyways need the final values for the constraints to get
>>>> fired, and on datanode we know these are the final values. So in general
>>>> when possible we should not fire constraints on coordinator. When all the
>>>> constraints are shippable, they should be fired on datanode, else they
>>>> should be fired on coordinator. In fact, we don't currently have a check
>>>> whether the constraint is shippable, so this is a bug. Have added this
>>>> constraint-shippablity check also.
>>>>
>>>> ------
>>>>
>>>> In the future, we could also consider optimization in the BIND data
>>>> row. Currently it includes NULL values for parameters that are not used in
>>>> the remote statement. We could just skip those parameters instead and thus
>>>> reduce the data row to some extent. But this requires tweaking the
>>>> parameter numbers ($1, $2 etc) generated in the deparsed statement.
>>>>
>>>> ------
>>>>
>>>>
>>>> Tests
>>>> ======
>>>>
>>>> I have used xc_constraints test to add new check-constraint related
>>>> tests. Otherwise, the existing tests are more than enough to test this
>>>> patch. See the other regression test changes in the patch to get an idea of
>>>> how this patch affects the plan target list.
>>>>
>>>> In xc_FQS.sql, the target list of FQSed queries are printed. I suspect
>>>> that the target list does not reflect correct columns currently. I suspect
>>>> set_plan_references changes the varattnos but they do not end up  correctly
>>>> reflecting the actual target list. The diff in the test file is because we
>>>> have lesser columns, but the column names anyway do not show the correct
>>>> columns.
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Own the Future-Intel&reg; Level Up Game Demo Contest 2013
>>>> Rise to greatness in Intel's independent game demo contest.
>>>> Compete for recognition, cash, and the chance to get your game
>>>> on Steam. $5K grand prize plus 10 genre and skill prizes.
>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
>>>> _______________________________________________
>>>> Postgres-xc-developers mailing list
>>>> Pos...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes,
>>> Ashutosh Bapat
>>> EntepriseDB Corporation
>>> The Enterprise Postgres Company
>>>
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>