This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Ilya Enkovich <enkovich dot gnu at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 27 Apr 2016 16:05:17 +0200
- Subject: Re: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops
- Authentication-results: sourceware.org; auth=none
- References: <20160421160937 dot GB7047 at msticlxl57 dot ims dot intel dot com> <CAFiYyc20YF=c4DK9Vu-_6=7cpQecDzJy82A-vumDk8Nh7Xgvgg at mail dot gmail dot com> <CAMbmDYYfojynA6J1_C8TaiCwOmxkfGz9D=Z1Yga_jGGu=Z3ZoQ at mail dot gmail dot com>
On Tue, Apr 26, 2016 at 2:29 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2016-04-22 10:13 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> Hi,
>>>
>>> Currently when loop is vectorized we adjust its nb_iterations_upper_bound
>>> by dividing it by VF. This is incorrect since nb_iterations_upper_bound
>>> is upper bound for (<number of loop iterations> - 1) and therefore simple
>>> dividing it by VF in many cases gives us bounds greater than a real one.
>>> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).
>>
>> Yeah, that seems correct.
>>
>>> Also decrement due to peeling for gaps should happen before we scale it
>>> by VF because peeling applies to a scalar loop, not vectorized one.
>>
>> That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration
>> is peeled as scalar operations. We do not account for the amount
>> of known prologue peeling (if peeling for alignment and the misalignment
>> is known at compile-time) - that would be peeling of scalar iterations.
>
> My initial patch didn't change anything for PEELING_FOR_GAP and it caused
> a runfail for one of SPEC2006 benchmarks. My investigation showed number
> of vector iterations calculation doesn't match nb_iterations_upper_bound
> adjustment in a way PEELING_FOR_GAP is accounted.
>
> Looking into vect_generate_tmps_on_preheader I see:
>
> /* If epilogue loop is required because of data accesses with gaps, we
> subtract one iteration from the total number of iterations here for
> correct calculation of RATIO. */
>
> And then we decrement loop counter before dividing it by VF to compute
> ratio and ratio_mult_vf. This doesn't match nb_iterations_upper_bound
> update and that's why I fixed it. This resolved runfail for me.
>
> Thus ratio_mult_vf computation conflicts with your statement we peel a
> vector iteration.
Hum. I stand corrected. So yes, we remove the last vector iteration if
there are not already epilogue iterations.
>>
>> But it would be interesting to know why we need the != 0 check - static
>> cost modelling should have disabled vectorization if the vectorized body
>> isn't run.
>>
>>> This patch modifies nb_iterations_upper_bound computation to resolve
>>> these issues.
>>
>> You do not adjust the ->nb_iterations_estimate accordingly.
>>
>>> Running regression testing I got one fail due to optimized loop. Heres
>>> is a loop:
>>>
>>> foo (signed char s)
>>> {
>>> signed char i;
>>> for (i = 0; i < s; i++)
>>> yy[i] = (signed int) i;
>>> }
>>>
>>> Here we vectorize for AVX512 using VF=64. Original loop has max 127
>>> iterations and therefore vectorized loop may be executed only once.
>>> With the patch applied compiler detects it and transforms loop into
>>> BB with just stores of constants vectors into yy. Test was adjusted
>>> to increase number of possible iterations. A copy of test was added
>>> to check we can optimize out the original loop.
>>>
>>> Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk?
>>
>> I'd like to see testcases covering the corner-cases - have them have
>> upper bound estimates by adjusting known array sizes and also cover
>> the case of peeling for gaps.
>
> OK, I'll make more tests.
Thanks,
Richard.
> Thanks,
> Ilya
>
>>
>> Richard.
>>