[PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops

Ilya Enkovich enkovich.gnu@gmail.com
Tue Apr 26 12:29:00 GMT 2016


2016-04-22 10:13 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> Hi,
>>
>> Currently when loop is vectorized we adjust its nb_iterations_upper_bound
>> by dividing it by VF.  This is incorrect since nb_iterations_upper_bound
>> is upper bound for (<number of loop iterations> - 1) and therefore simple
>> dividing it by VF in many cases gives us bounds greater than a real one.
>> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).
>
> Yeah, that seems correct.
>
>> Also decrement due to peeling for gaps should happen before we scale it
>> by VF because peeling applies to a scalar loop, not vectorized one.
>
> That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration
> is peeled as scalar operations.  We do not account for the amount
> of known prologue peeling (if peeling for alignment and the misalignment
> is known at compile-time) - that would be peeling of scalar iterations.

My initial patch didn't change anything for PEELING_FOR_GAP and it caused
a runfail for one of SPEC2006 benchmarks.  My investigation showed number
of vector iterations calculation doesn't match nb_iterations_upper_bound
adjustment in a way PEELING_FOR_GAP is accounted.

Looking into vect_generate_tmps_on_preheader I see:

/* If epilogue loop is required because of data accesses with gaps, we
   subtract one iteration from the total number of iterations here for
   correct calculation of RATIO.  */

And then we decrement loop counter before dividing it by VF to compute
ratio and ratio_mult_vf.  This doesn't match nb_iterations_upper_bound
update and that's why I fixed it.  This resolved runfail for me.

Thus ratio_mult_vf computation conflicts with your statement we peel a
vector iteration.

>
> But it would be interesting to know why we need the != 0 check - static
> cost modelling should have disabled vectorization if the vectorized body
> isn't run.
>
>> This patch modifies nb_iterations_upper_bound computation to resolve
>> these issues.
>
> You do not adjust the ->nb_iterations_estimate accordingly.
>
>> Running regression testing I got one fail due to optimized loop. Heres
>> is a loop:
>>
>> foo (signed char s)
>> {
>>   signed char i;
>>   for (i = 0; i < s; i++)
>>     yy[i] = (signed int) i;
>> }
>>
>> Here we vectorize for AVX512 using VF=64.  Original loop has max 127
>> iterations and therefore vectorized loop may be executed only once.
>> With the patch applied compiler detects it and transforms loop into
>> BB with just stores of constants vectors into yy.  Test was adjusted
>> to increase number of possible iterations.  A copy of test was added
>> to check we can optimize out the original loop.
>>
>> Bootstrapped and regtested on x86_64-pc-linux-gnu.  OK for trunk?
>
> I'd like to see testcases covering the corner-cases - have them have
> upper bound estimates by adjusting known array sizes and also cover
> the case of peeling for gaps.

OK, I'll make more tests.

Thanks,
Ilya

>
> Richard.
>



More information about the Gcc-patches mailing list