This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: [ARM] Disable peeling

From: Richard Biener <richard dot guenther at gmail dot com>
To: Christophe Lyon <christophe dot lyon at linaro dot org>
Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Wed, 2 Oct 2013 10:25:14 +0200
Subject: Re: RFC: [ARM] Disable peeling
Authentication-results: sourceware.org; auth=none
References: <CAKdteOZb66r_0t1LLUdToQkJFo8UnX8f671pduuc4i7vOcL6qQ at mail dot gmail dot com> <50C227C5 dot 4010601 at arm dot com> <CAFiYyc2LNa=MRAn5S0CZV_=Ds0SAsvqH9w1MOi7of1GFhp=ABQ at mail dot gmail dot com> <20121210171057 dot GI671 at atrey dot karlin dot mff dot cuni dot cz> <m2txrtehvu dot fsf at firstfloor dot org> <CAFiYyc1LX1mj0E1MhTYb8AmPrOGdkjeKY7C2KaNVEan=_+3YeA at mail dot gmail dot com> <50C70173 dot 2000107 at arm dot com> <CAFiYyc16m8PRy8tAb14yh3z_ZoHgnnXPvvScfZVZQ6db2e4LFg at mail dot gmail dot com> <50C70774 dot 1060903 at arm dot com> <50C72675 dot 9070302 at aol dot com> <CAKdteOac7Rf3azVOOZPiE3nLDt1bXUbePGeWTgUGTnh_8Hiixw at mail dot gmail dot com> <CAMe9rOq2tFZspy2H10whXsTBGGaU1u22xv0V7CP4fdCQO1kThQ at mail dot gmail dot com> <m2mwxjdus3 dot fsf at firstfloor dot org> <CAFiYyc08ZQ0hFmoCiDeiQpMaVVVwRFYonbT7HmifTMKav3CzKw at mail dot gmail dot com> <CAKdteOYdigHTSt9WHaFzhLgPWGoa3dCmu+rn8_4K_czFxHUZyw at mail dot gmail dot com>

On Tue, Oct 1, 2013 at 5:49 PM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> Hi,
>
> I am resuming investigations about disabling peeling for
> alignment (see thread at
> http://gcc.gnu.org/ml/gcc/2012-12/msg00036.html).
>
> As a reminder, I have a simple patch which disables peeling
> unconditionally and gives some improvement in benchmarks.
>
> However, I've noticed a regression where a reduced test case is:
> #define SIZE 8
> void func(float *data, float d)
> {
>         int i;
>         for (i=0; i<SIZE; i++)
>                 data[i] = d;
> }
>
> With peeling enabled, the compiler generates:
>         fsts    s0, [r0]
>         fsts    s0, [r0, #4]
>         fsts    s0, [r0, #8]
>         fsts    s0, [r0, #12]
>         fsts    s0, [r0, #16]
>         fsts    s0, [r0, #20]
>         fsts    s0, [r0, #24]
>         fsts    s0, [r0, #28]
>
> with my patch, the compiler generates:
>         vdup.32 q0, d0[0]
>         vst1.32 {q0}, [r0]!
>         vst1.32 {q0}, [r0]
>         bx      lr
>
> The performance regression is mostly caused by the dependency
> between vdup and vst1 (removing the dependency on r0
> post-increment did not show any perf improvement).
>
> I have tried to modify the vectorizer cost model such that
> scalar->vector stmts have higher cost than currently with the hope
> that the loop prologue would become too expensive; but to reach this
> level, this cost needs to be increased quite a lot, so this approach
> does not seem right.
>
> The vectorizer estimates the cost of the prologue/epilogue/loop body
> with and without vectorization and computes the number of iterations
> needed for profitability. In the present case, keeping reasonable
> costs, this number is very low (2 or 3 typically), while the compiler
> knows we have 8 iterations for sure.
>
> I think we need something to describe the dependency between vdup
> and vst1.
>
> Otherwise, from the vectorizer point of view, this looks like an
> ideal loop.
>
> Do you have suggestions on how to tackle this?
>
> (I've just had a look at the recent vectorizer cost model
> modification, which doesn't seem to handle this case.)

With the new vectorizer cost model hooks (init_cost, add_stmt_cost,
finish_cost) you can setup target specific data in init_cost, add to it
during add_stmt_cost and thus at finish_cost time you can take
all vectorized stmts into account, modeling this kind of dependency.
Well, if the GIMPLE the vectorizer hands you exposes enough information
to guess the final instructions, of course.

PPC uses this to model vector shift resource constraints.

Richard.

> Thanks,
>
> Christophe.
>
> On 13 December 2012 10:42, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Wed, Dec 12, 2012 at 6:50 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>
>>>> i386.c has
>>>>
>>>>    {
>>>>       /* When not optimize for size, enable vzeroupper optimization for
>>>>          TARGET_AVX with -fexpensive-optimizations and split 32-byte
>>>>          AVX unaligned load/store.  */
>>>
>>> This is only for the load, not for deciding whether peeling is
>>> worthwhile or not.
>>>
>>> I believe it's unimplemented for x86 at this point. There isn't even a
>>> hook for it. Any hook that is added should ideally work for both ARM64
>>> and x86. This would imply it would need to handle different vector
>>> sizes.
>>
>> There is
>>
>> /* Implement targetm.vectorize.builtin_vectorization_cost.  */
>> static int
>> ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
>>                                  tree vectype,
>>                                  int misalign ATTRIBUTE_UNUSED)
>> {
>> ...
>>       case unaligned_load:
>>       case unaligned_store:
>>         return ix86_cost->vec_unalign_load_cost;
>>
>> which indeed doesn't distinguish between unaligned load/store cost.  Still
>> it does distinguish between aligned and unaligned load/store cost.
>>
>> Now look at the cost tables and see different unaligned vs. aligned costs
>> dependent on the target CPU.
>>
>> generic32 and generic64 have:
>>
>>   1,                                    /* vec_align_load_cost.  */
>>   2,                                    /* vec_unalign_load_cost.  */
>>   1,                                    /* vec_store_cost.  */
>>
>> The missed piece in the vectorizer is that peeling for alignment should have the
>> option to turn itself off based on that costs (and analysis).
>>
>> Richard.

References:
- Re: RFC: [ARM] Disable peeling
  - From: Christophe Lyon

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]