This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][AArch64] Add vector permute cost


On Wed, Dec 16, 2015 at 10:32 AM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Tue, Dec 15, 2015 at 11:35:45AM +0000, Wilco Dijkstra wrote:
>>
>> Add support for vector permute cost since various permutes can expand into a complex
>> sequence of instructions.  This fixes major performance regressions due to recent changes
>> in the SLP vectorizer (which now vectorizes more aggressively and emits many complex
>> permutes).
>>
>> Set the cost to > 1 for all microarchitectures so that the number of permutes is usually zero
>> and regressions disappear.  An example of the kind of code that might be emitted for
>> VEC_PERM_EXPR {0, 3} where registers happen to be in the wrong order:
>>
>>         adrp    x4, .LC16
>>         ldr     q5, [x4, #:lo12:.LC16
>>         eor     v1.16b, v1.16b, v0.16b
>>         eor     v0.16b, v1.16b, v0.16b
>>         eor     v1.16b, v1.16b, v0.16b
>>         tbl     v0.16b, {v0.16b - v1.16b}, v5.16b
>>
>> Regress passes. This fixes regressions that were introduced recently, so OK for commit?
>>
>>
>> ChangeLog:
>> 2015-12-15  Wilco Dijkstra  <wdijkstr@arm.com>
>>
>>       * gcc/config/aarch64/aarch64.c (generic_vector_cost):
>>       Set vec_permute_cost.
>>       (cortexa57_vector_cost): Likewise.
>>       (exynosm1_vector_cost): Likewise.
>>       (xgene1_vector_cost): Likewise.
>>       (aarch64_builtin_vectorization_cost): Use vec_permute_cost.
>>       * gcc/config/aarch64/aarch64-protos.h (cpu_vector_cost):
>>       Add vec_permute_cost entry.
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 10754c88c0973d8ef3c847195b727f02b193bbd8..2584f16d345b3d015d577dd28c08a73ee3e0b0fb 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -314,6 +314,7 @@ static const struct cpu_vector_cost generic_vector_cost =
>>    1, /* scalar_load_cost  */
>>    1, /* scalar_store_cost  */
>>    1, /* vec_stmt_cost  */
>> +  2, /* vec_permute_cost  */
>>    1, /* vec_to_scalar_cost  */
>>    1, /* scalar_to_vec_cost  */
>>    1, /* vec_align_load_cost  */
>
> Is there any reasoning behind making this 2? Do we now miss vectorization
> for some of the cheaper permutes? Across the cost models/pipeline
> descriptions that have been contributed to GCC I think that this is a
> sensible change to the generic costs, but I just want to check there
> was some reasoning/experimentation behind the number you picked.
>
> As permutes can have such wildly different costs, this all seems like a good
> candidate for some future much more involved hook from the vectorizer to the
> back-end specifying the candidate permute operation and requesting a cost
> (part of the bigger gimple costs framework?).

Yes, the vectorizer side also needs to improve here.  Not sure if it is possible
to represent this kind of complex cost queries with a single gimple cost hook.
After all we don't really want to generate the full gimple stmt just to query
its cost ...

To better represent permute cost in the short term we'd need another vectorizer
specific hook, not sth for stage3 unless we face some serious regressions
on real-world code (thus not microbenchmarks only)

Richard.

> Thanks,
> James
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]