[PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f

Thu Nov 17 10:54:00 GMT 2016

On Thu, Nov 17, 2016 at 11:26 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Nov 17, 2016 at 8:32 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Nov 16, 2016 at 6:20 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> Currently test gfortran.dg/vect/fast-math-mgrid-resid.f checks all predictive commoning opportunities for all possible loops.  This makes it fragile because vectorizer may peel the loop differently, as well as may choose different vector factors.  For example, on x86-solaris, vectorizer doesn't peel for prologue loop; for -march=haswell, the case is long time failed because vector factor is 4, while iteration distance of predictive commoning opportunity is smaller than 4.  This patch refines it by only checking if predictive commoning variable is created when vector factor is 2; or vectorization variable is created when factor is 4.  This works since we have only one main loop, and only one vector factor can be used.
>>> Test result checked for various x64 targets.  Is it OK?
>>
>> I think that as you write the test is somewhat fragile.  But rather
>> than adjusting the scanning like you do
>> I'd add --param vect-max-peeling-for-alignment=0 and -mprefer-avx128
> In this way, is it better to add "--param
> vect-max-peeling-for-alignment=0" for all targets?  Otherwise we still
> need to differentiate test string to handle different targets.  But I
> have another question here: what if a target can't handle unaligned
> access and vectorizer have to peel for alignment for it?

You'd get versioning for alignment instead.

> Also do you think it's ok to check predictive commoning PHI node as below?
> # vectp_u.122__lsm0.158_94 = PHI <vectp_u.122__lsm0.158_95(8), _96(6)>
> In this way, we don't need to take possible prologue/epilogue loops
> into consideration.

I hoped w/o peeling we can simply scan for "Executing predictive commoning".
But with versioning for alignment you'd still get two loops.

So maybe checking for both "Executing predictive commoning" and looking
for a vect_lsm PHI node is ok...

>> as additional option on x86_64-*-* i?86-*-*.
>>
>> Your new pattern would fail with avx512 if vector (8) real would be used.
>>
>> What's the actual change that made the testcase fail btw?
> There are two cases.
> A) After vect_do_peeling change, vectorizer may only peel one
> iteration for prologue loop (if vf == 2), below test string was added
> for this reason:
> ! { dg-final { scan-tree-dump-times "Loop iterates only 1 time,
> nothing to do" 1 "pcom" } }
> This fails on x86_64 solaris because prologue loop is not peeled at all.
> B) Depending on ilp, I think below test strings fail for long time with haswell:
> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
> without unrolling" 1 "pcom" { target lp64 } } }
> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
> without unrolling" 2 "pcom" { target ia32 } } }
> Because vectorizer choose vf==4 in this case, and there is no
> predictive commoning opportunities at all.

Yes.  I suggest -mprefer-avx128 for that.

> Also the newly added test string fails in this case too because the
> prolog peeled iterates more than 1 times.
>
> Thanks,
> bin
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>>
>>> gcc/testsuite/ChangeLog
>>> 2016-11-16  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         PR testsuite/78114
>>>         * gfortran.dg/vect/fast-math-mgrid-resid.f: Refine test by
>>>         checking predictive commining variables in vectorized loop
>>>         wrto vector factor.