This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant


On Tue, Jul 25, 2017 at 2:38 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> This is a followup patch better handling below case:
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>          a[i+2] = 2;
>>>>        }
>>>> Instead of generating root variables by loading from memory and propagating with PHI
>>>> nodes, like:
>>>>      t0 = a[0];
>>>>      t1 = a[1];
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>          t2 = 2;
>>>>          t0 = t1;
>>>>          t1 = t2;
>>>>        }
>>>>      a[n] = t0;
>>>>      a[n+1] = t1;
>>>> We can simply store loop invariant values after loop body if we know loop iterates more
>>>> than chain->length times, like:
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>        }
>>>>      a[n] = 2;
>>>>      a[n+1] = 2;
>>>>
>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>> Update patch wrto changes in previous patch.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> +      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
>> +       continue;
>>
>> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
>> FIXED_CST would be ok as well for example.
>>
>> Ok with that change.  Did we eventually optimize this in followup
>> passes previously?
> Probably not?  Given below test:
>
> int a[10000], b[10000], c[10000];
> int f(void)
> {
>   int i, n = 100;
>   int t0 = a[0];
>   int t1 = a[1];
>      for (i = 0; i < n; i++)
>        {
>          a[i] = 1;
>          int t2 = 2;
>          t0 = t1;
>          t1 = t2;
>        }
>      a[n] = t0;
>      a[n+1] = t1;
>   return 0;
> }
> The optimized dump is as:
>
>   <bb 2> [1.00%] [count: INV]:
>   t1_8 = a[1];
>   ivtmp.9_17 = (unsigned long) &a;
>   _16 = ivtmp.9_17 + 400;
>
>   <bb 3> [99.00%] [count: INV]:
>   # t1_20 = PHI <2(3), t1_8(2)>
>   # ivtmp.9_2 = PHI <ivtmp.9_1(3), ivtmp.9_17(2)>
>   _15 = (void *) ivtmp.9_2;
>   MEM[base: _15, offset: 0B] = 1;
>   ivtmp.9_1 = ivtmp.9_2 + 4;
>   if (ivtmp.9_1 != _16)
>     goto <bb 3>; [98.99%] [count: INV]
>   else
>     goto <bb 4>; [1.01%] [count: INV]
>
>   <bb 4> [1.00%] [count: INV]:
>   a[100] = t1_20;
>   a[101] = 2;
>   return 0;
>
> We now eliminate one phi and leave another behind.  It is vrp1/dce2
> when the phi is eliminated.

Ok, I see.  Maybe worth filing a missed optimization PR.

Richard.

> Thanks,
> bin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]