This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH/AARCH64] Improve/correct ThunderX 1 cost model for Arith_shift
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Andrew Pinski <pinskia at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Wed, 21 Jun 2017 12:13:13 +0100
- Subject: Re: [PATCH/AARCH64] Improve/correct ThunderX 1 cost model for Arith_shift
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=pass (sender IP is 217.140.96.140) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com;
- Nodisclaimer: True
- References: <CA+=Sn1mzb055iDq8SNJ2reoywrjCAf99HV5v4ZHNuxdNuwaMjg@mail.gmail.com> <20170607171603.GA36988@arm.com> <CA+=Sn1=HTGzLmch4dkGmgorgd4ybh9+tJkpr50YNHwZhdjixVg@mail.gmail.com> <CA+=Sn1m4QemO73V78LQkL+-nGP3fUy-6pAEMrrQgtQDJ3SDHBg@mail.gmail.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Tue, Jun 20, 2017 at 02:07:22PM -0700, Andrew Pinski wrote:
> On Mon, Jun 19, 2017 at 2:00 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> > On Wed, Jun 7, 2017 at 10:16 AM, James Greenhalgh
> > <james.greenhalgh@arm.com> wrote:
> >> On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote:
> >>> Hi,
> >>> Currently for the following function:
> >>> int f(int a, int b)
> >>> {
> >>> return a + (b <<7);
> >>> }
> >>>
> >>> GCC produces:
> >>> add w0, w0, w1, lsl 7
> >>> But for ThunderX 1, it is better if the instruction was split allowing
> >>> better scheduling to happen in most cases, the latency is the same. I
> >>> get a small improvement in coremarks, ~1%.
> >>>
> >>> Currently the code does not take into account Arith_shift even though
> >>> the comment:
> >>> /* Strip any extend, leave shifts behind as we will
> >>> cost them through mult_cost. */
> >>> Say it does not strip out the shift, aarch64_strip_extend does and has
> >>> always has since the back-end was added to GCC.
> >>>
> >>> Once I fixed the code around aarch64_strip_extend, I got a regression
> >>> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero
> >>> extends) are considered free so I needed to add a new tuning flag.
> >>>
> >>> Note I will get an even more improvement for ThunderX 2 CN99XX, but I
> >>> have not measured it yet as I have not made the change to
> >>> aarch64-cost-tables.h yet as I am waiting for approval of the renaming
> >>> patch first before submitting any of the cost table changes. Also I
> >>> noticed this problem with this tuning first and then looked back at
> >>> what I needed to do for ThunderX 1.
> >>>
> >>> OK? Bootstrapped and tested on aarch64-linux-gnu without any
> >>> regressions (both with and without --with-cpu=thunderx).
> >>
> >> This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap
> >> or free seems better. I have some other minor points below.
> >
> >
> > Ok, that seems like a good idea. I used easy since that was the
> > wording our hardware folks had came up with. I am changing the
> > comments to make clearer when this flag should be used.
> > I should a new patch out by the end of today.
>
> Due to the LSE ICE which I reported in the other thread, it took me
> longer to send out a new patch.
> Anyways here is the updated patch with the changes requested.
>
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
One grammar fix inline below, otherwise this is OK.
Thanks,
James
> * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
> Increment Arith_shift and Arith_shift_reg by 1.
> * config/aarch64/aarch64-tuning-flags.def (cheap_shift_extend): New tuning flag.
> * config/aarch64/aarch64.c (thunderx_tunings): Enable
> AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND.
> (aarch64_strip_extend): Add new argument and test for it.
> (aarch64_cheap_mult_shift_p): New function.
> (aarch64_rtx_mult_cost): Call aarch64_cheap_mult_shift_p and don't add
> a cost if it is true.
> Update calls to aarch64_strip_extend.
> (aarch64_rtx_costs): Update calls to aarch64_strip_extend.
>
> +
> +/* Return true iff X is an cheap shift without a sign extend. */
s/an cheap/a cheap/
> +
> +static bool
> +aarch64_cheap_mult_shift_p (rtx x)
> +{
> + rtx op0, op1;
> +
> + op0 = XEXP (x, 0);
> + op1 = XEXP (x, 1);
> +
> + if (!(aarch64_tune_params.extra_tuning_flags
> + & AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND))
> + return false;
> +
> + if (GET_CODE (op0) == SIGN_EXTEND)
> + return false;
> +
> + if (GET_CODE (x) == ASHIFT && CONST_INT_P (op1)
> + && UINTVAL (op1) <= 4)
> + return true;
> +
> + if (GET_CODE (x) != MULT || !CONST_INT_P (op1))
> + return false;
> +
> + HOST_WIDE_INT l2 = exact_log2 (INTVAL (op1));
> +
> + if (l2 > 0 && l2 <= 4)
> + return true;
> +
> + return false;
> +}
> +
> /* Helper function for rtx cost calculation. Calculate the cost of
> a MULT or ASHIFT, which may be part of a compound PLUS/MINUS rtx.
> Return the calculated cost of the expression, recursing manually in to
> @@ -6164,7 +6200,11 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_c
> {
> if (compound_p)
> {
> - if (REG_P (op1))
> + /* If the shift is considered cheap,
> + then don't add any cost. */
> + if (aarch64_cheap_mult_shift_p (x))
> + ;
> + else if (REG_P (op1))
> /* ARITH + shift-by-register. */
> cost += extra_cost->alu.arith_shift_reg;
> else if (is_extend)
> @@ -6182,7 +6222,7 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_c
> }
> /* Strip extends as we will have costed them in the case above. */
> if (is_extend)
> - op0 = aarch64_strip_extend (op0);
> + op0 = aarch64_strip_extend (op0, true);
>
> cost += rtx_cost (op0, VOIDmode, code, 0, speed);
>
> @@ -7026,13 +7066,13 @@ cost_minus:
> if (speed)
> *cost += extra_cost->alu.extend_arith;
>
> - op1 = aarch64_strip_extend (op1);
> + op1 = aarch64_strip_extend (op1, true);
> *cost += rtx_cost (op1, VOIDmode,
> (enum rtx_code) GET_CODE (op1), 0, speed);
> return true;
> }
>
> - rtx new_op1 = aarch64_strip_extend (op1);
> + rtx new_op1 = aarch64_strip_extend (op1, false);
>
> /* Cost this as an FMA-alike operation. */
> if ((GET_CODE (new_op1) == MULT
> @@ -7105,7 +7145,7 @@ cost_plus:
> if (speed)
> *cost += extra_cost->alu.extend_arith;
>
> - op0 = aarch64_strip_extend (op0);
> + op0 = aarch64_strip_extend (op0, true);
> *cost += rtx_cost (op0, VOIDmode,
> (enum rtx_code) GET_CODE (op0), 0, speed);
> return true;
> @@ -7113,7 +7153,7 @@ cost_plus:
>
> /* Strip any extend, leave shifts behind as we will
> cost them through mult_cost. */
> - new_op0 = aarch64_strip_extend (op0);
> + new_op0 = aarch64_strip_extend (op0, false);
>
> if (GET_CODE (new_op0) == MULT
> || aarch64_shift_p (GET_CODE (new_op0)))