This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH][AArch64] Improve generic branch cost


On Thu, Mar 9, 2017 at 6:42 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Hi,
>
> Recently we've put a lot of effort into improving ifcvt to use CSEL on AArch64.
> In  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html James determined
> the best value for AArch64 code generation.  Although this setting is used when
> explicitly targeting Cortex cores, it is not otherwise used.  This means by
> default GCC will not use (F)CSEL in many common cases.  Most code is built
> without -mcpu= and thus doesn't use CSEL like this example from GLIBC:
>
> strtok:
>     stp    x29, x30, [sp, -48]!
>     add    x29, sp, 0
>     stp    x21, x22, [sp, 32]
>     mov    x21, x1
>     stp    x19, x20, [sp, 16]
>     adrp    x22, .LANCHOR0
>     mov    x19, x0
>     cbz    x0, .L12
> .L2:    ldrb    w0, [x19]
>
> .L12:
>     ldr    x19, [x22, #:lo12:.LANCHOR0]
>     b    .L2
>
> With -mcpu=cortex-a57 GCC generates:
>
>     stp    x29, x30, [sp, -48]!
>     cmp    x0, 0
>     add    x29, sp, 0
>     stp    x21, x22, [sp, 32]
>     adrp    x21, .LANCHOR0
>     stp    x19, x20, [sp, 16]
>     mov    x19, x0
>     ldr    x0, [x21, #:lo12:.LANCHOR0]
>     csel    x19, x0, x19, eq
>     ldrb    w0, [x19]
>
> This is generally faster and smaller.  On one benchmark the new setting fixes a
> regression since GCC6 and improves performance by 49%.  So I propose to change
> generic_branch_cost to be the same as cortexa57_branch_cost so that all supported
> cores benefit equally from CSEL.  Are there any objections to this?

I have no objections.  In fact thunderx2t99's branch_cost is 1,3.  I
had not looked into improving thunderx branch cost yet but that might
be because I have local patches that improve phiopt for doing ifcvt
earlier.  Also my phiopt change does not have a cost model either so
using csel more is good for thunderx 1 and ThunderX 2.

Thanks,
Andrew

>
> Wilco
>
>
> ChangeLog:
> 2017-03-09  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/aarch64/aarch64.c (generic_branch_cost): Copy cortexa57_branch_cost.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5870b5e5d7e8e48cf925b3a62030346f041a7fd6..ea16074af86087a6200d9895583e05acf43d90e2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -377,8 +377,8 @@ static const struct cpu_vector_cost xgene1_vector_cost =
>  /* Generic costs for branch instructions.  */
>  static const struct cpu_branch_cost generic_branch_cost =
>  {
> -  2,  /* Predictable.  */
> -  2   /* Unpredictable.  */
> +  1,  /* Predictable.  */
> +  3   /* Unpredictable.  */
>  };
>
>  /* Branch costs for Cortex-A57.  */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]