This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 25 Jul 2017 14:57:59 +0200
- Subject: Re: [PATCH GCC][4/4]Better handle store-stores chain if eliminated stores only store loop invariant
- Authentication-results: sourceware.org; auth=none
- References: <VI1PR0802MB2176EC12FFDF68774C9B132BE7DC0@VI1PR0802MB2176.eurprd08.prod.outlook.com> <CAHFci28vO8iv_DvD+KqM2M=7e+EkR_fh6Msvc1dbFqg2eWnyNg@mail.gmail.com> <CAFiYyc35mCh1pbM5NY-mr_zLsh0ZLsNCUv+8zEH39iURWM-G0g@mail.gmail.com> <CAHFci2-Zy+hteBz-04=O0maFsW5TKQ6NT8ZW5qjMCsuwdS5o-Q@mail.gmail.com>
On Tue, Jul 25, 2017 at 2:38 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> This is a followup patch better handling below case:
>>>> for (i = 0; i < n; i++)
>>>> {
>>>> a[i] = 1;
>>>> a[i+2] = 2;
>>>> }
>>>> Instead of generating root variables by loading from memory and propagating with PHI
>>>> nodes, like:
>>>> t0 = a[0];
>>>> t1 = a[1];
>>>> for (i = 0; i < n; i++)
>>>> {
>>>> a[i] = 1;
>>>> t2 = 2;
>>>> t0 = t1;
>>>> t1 = t2;
>>>> }
>>>> a[n] = t0;
>>>> a[n+1] = t1;
>>>> We can simply store loop invariant values after loop body if we know loop iterates more
>>>> than chain->length times, like:
>>>> for (i = 0; i < n; i++)
>>>> {
>>>> a[i] = 1;
>>>> }
>>>> a[n] = 2;
>>>> a[n+1] = 2;
>>>>
>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64. Is it OK?
>>> Update patch wrto changes in previous patch.
>>> Bootstrap and test on x86_64 and AArch64. Is it OK?
>>
>> + if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
>> + continue;
>>
>> Please use CONSTANT_CLASS_P (val) instead. I suppose VECTOR_CST or
>> FIXED_CST would be ok as well for example.
>>
>> Ok with that change. Did we eventually optimize this in followup
>> passes previously?
> Probably not? Given below test:
>
> int a[10000], b[10000], c[10000];
> int f(void)
> {
> int i, n = 100;
> int t0 = a[0];
> int t1 = a[1];
> for (i = 0; i < n; i++)
> {
> a[i] = 1;
> int t2 = 2;
> t0 = t1;
> t1 = t2;
> }
> a[n] = t0;
> a[n+1] = t1;
> return 0;
> }
> The optimized dump is as:
>
> <bb 2> [1.00%] [count: INV]:
> t1_8 = a[1];
> ivtmp.9_17 = (unsigned long) &a;
> _16 = ivtmp.9_17 + 400;
>
> <bb 3> [99.00%] [count: INV]:
> # t1_20 = PHI <2(3), t1_8(2)>
> # ivtmp.9_2 = PHI <ivtmp.9_1(3), ivtmp.9_17(2)>
> _15 = (void *) ivtmp.9_2;
> MEM[base: _15, offset: 0B] = 1;
> ivtmp.9_1 = ivtmp.9_2 + 4;
> if (ivtmp.9_1 != _16)
> goto <bb 3>; [98.99%] [count: INV]
> else
> goto <bb 4>; [1.01%] [count: INV]
>
> <bb 4> [1.00%] [count: INV]:
> a[100] = t1_20;
> a[101] = 2;
> return 0;
>
> We now eliminate one phi and leave another behind. It is vrp1/dce2
> when the phi is eliminated.
Ok, I see. Maybe worth filing a missed optimization PR.
Richard.
> Thanks,
> bin