This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC][PATCH][AArch64] Improve generic branch cost
- From: Andrew Pinski <pinskia at gmail dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Evandro Menezes <e dot menezes at samsung dot com>, "Andrew dot pinski at cavium dot com" <Andrew dot pinski at cavium dot com>, "jim dot wilson at linaro dot org" <jim dot wilson at linaro dot org>, nd <nd at arm dot com>
- Date: Thu, 9 Mar 2017 14:06:16 -0800
- Subject: Re: [RFC][PATCH][AArch64] Improve generic branch cost
- Authentication-results: sourceware.org; auth=none
- References: <VI1PR0802MB262138504A5473B8286763C783210@VI1PR0802MB2621.eurprd08.prod.outlook.com>
On Thu, Mar 9, 2017 at 6:42 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Hi,
>
> Recently we've put a lot of effort into improving ifcvt to use CSEL on AArch64.
> In https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html James determined
> the best value for AArch64 code generation. Although this setting is used when
> explicitly targeting Cortex cores, it is not otherwise used. This means by
> default GCC will not use (F)CSEL in many common cases. Most code is built
> without -mcpu= and thus doesn't use CSEL like this example from GLIBC:
>
> strtok:
> stp x29, x30, [sp, -48]!
> add x29, sp, 0
> stp x21, x22, [sp, 32]
> mov x21, x1
> stp x19, x20, [sp, 16]
> adrp x22, .LANCHOR0
> mov x19, x0
> cbz x0, .L12
> .L2: ldrb w0, [x19]
>
> .L12:
> ldr x19, [x22, #:lo12:.LANCHOR0]
> b .L2
>
> With -mcpu=cortex-a57 GCC generates:
>
> stp x29, x30, [sp, -48]!
> cmp x0, 0
> add x29, sp, 0
> stp x21, x22, [sp, 32]
> adrp x21, .LANCHOR0
> stp x19, x20, [sp, 16]
> mov x19, x0
> ldr x0, [x21, #:lo12:.LANCHOR0]
> csel x19, x0, x19, eq
> ldrb w0, [x19]
>
> This is generally faster and smaller. On one benchmark the new setting fixes a
> regression since GCC6 and improves performance by 49%. So I propose to change
> generic_branch_cost to be the same as cortexa57_branch_cost so that all supported
> cores benefit equally from CSEL. Are there any objections to this?
I have no objections. In fact thunderx2t99's branch_cost is 1,3. I
had not looked into improving thunderx branch cost yet but that might
be because I have local patches that improve phiopt for doing ifcvt
earlier. Also my phiopt change does not have a cost model either so
using csel more is good for thunderx 1 and ThunderX 2.
Thanks,
Andrew
>
> Wilco
>
>
> ChangeLog:
> 2017-03-09 Wilco Dijkstra <wdijkstr@arm.com>
>
> * config/aarch64/aarch64.c (generic_branch_cost): Copy cortexa57_branch_cost.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5870b5e5d7e8e48cf925b3a62030346f041a7fd6..ea16074af86087a6200d9895583e05acf43d90e2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -377,8 +377,8 @@ static const struct cpu_vector_cost xgene1_vector_cost =
> /* Generic costs for branch instructions. */
> static const struct cpu_branch_cost generic_branch_cost =
> {
> - 2, /* Predictable. */
> - 2 /* Unpredictable. */
> + 1, /* Predictable. */
> + 3 /* Unpredictable. */
> };
>
> /* Branch costs for Cortex-A57. */