Re: [Open64-devel] one more question about strength reduction and SSA PRE

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

is N global?

On Fri, Jun 28, 2013 at 1:14 AM, Yiran Wang <yir...@gm...> wrote:

> Thanks for your comments.
>
> No, the assembly looks the same.
>
> Good or bad, the compiler is able to clean up the temporary completely,
> say, copy propagation and DSE.
>
> Regards,
> Yiran
>
>
>
> On Thu, Jun 27, 2013 at 12:26 AM, Jian-Xin Lai <la...@gm...> wrote:
>
>> From your description, if the code is change a little:
>>
>>   for(i = 0; i< j; i++)
>>   {
>>     int t = N*N;
>>     x += t << 3;
>>     z = x + N;
>>     y = y + *x + *z;
>>   }
>>
>> Will the N*N be hoisted?
>>
>>
>> 2013/6/27 Yiran Wang <yir...@gm...>
>>
>>> Hi All,
>>>
>>> This one looks somewhat similar to the last example, but is different.
>>>
>>> int foo(int N, int j, int *x, int *z)
>>> {
>>>   int y = N;
>>>   N += 7;
>>>   N >>= 3;
>>>   int i;
>>>   for(i = 0; i< j; i++)
>>>   {
>>>     x += N*N << 3;
>>>     z = x + N;
>>>     y = y + *x + *z;
>>>   }
>>>   return y;
>>> }
>>>
>>> Assembly of the loop at -O3.
>>> .p2align 4,,15
>>> .Lt_0_3586:
>>>  #<loop> Loop body line 7, nesting depth: 1, estimated iterations: 1000
>>>  .loc 1 9 0
>>>  #   8    {
>>>  #   9      x += N*N << 3;
>>> movl %eax,%ebx                 # [0]
>>> .loc 1 11 0
>>>  #  10      z = x + N;
>>>  #  11      y = y + *x + *z;
>>> addl $1,%ebp                   # [0]
>>> .loc 1 9 0
>>>  imull %eax,%ebx               # [1]
>>> shll $3,%ebx                   # [4]
>>>  shll $2,%ebx                   # [5]
>>> addl %ebx,%edi                 # [6]
>>>  addl %ebx,%esi                 # [6]
>>> .loc 1 11 0
>>>  movl 0(%edi),%ecx             # [7] id:23
>>> addl 0(%esi),%ecx             # [10]
>>>  addl %ecx,%edx                 # [13]
>>> cmpl 36(%esp),%ebp             # [13] j
>>>  jl .Lt_0_3586                 # [16]
>>>
>>> As we see, the imul instruction remains in the loop.
>>> (and two consequent shll instructions, my guess is that CG is thinking
>>> there should not be such input from WOPT, so it is not optimized in CG,
>>> though it is simple. )
>>>
>>> It looks like SSA PRE omitted the rhs of Iv_update statement x+= N*N<<3,
>>> and VNFRE is only doing one level of CSE, say, promoting the ASHR + LDC 3
>>> out of the loop.
>>>
>>> I am curious why SSA PRE is omitting the expression here.  By disabling
>>> this in opt_etable.cxx, the result looks good for this test case. I wonder
>>> if there is any correctness issue for some other test case, or performance
>>> issue?
>>>
>>> It should be noted one strength reduction transformation is done for z
>>> for this case. Also replacing "N>>=3;" with "N*=5;" results in similar
>>> sub-optimal code.
>>>
>>>  Best Regards,
>>> Yiran Wang
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Open64-devel mailing list
>>> Ope...@li...
>>> https://lists.sourceforge.net/lists/listinfo/open64-devel
>>>
>>>
>>
>>
>> --
>> Regards,
>> Lai Jian-Xin
>>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Open64-devel mailing list
> Ope...@li...
> https://lists.sourceforge.net/lists/listinfo/open64-devel
>
>