This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH/AARCH64] Improve/correct ThunderX 1 cost model for Arith_shift
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Andrew Pinski <pinskia at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, <nd at arm dot com>
- Date: Wed, 7 Jun 2017 18:16:03 +0100
- Subject: Re: [PATCH/AARCH64] Improve/correct ThunderX 1 cost model for Arith_shift
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=pass (sender IP is 126.96.36.199) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com;
- Nodisclaimer: True
- References: <CA+=Sn1mzb055iDq8SNJ2reoywrjCAf99HV5v4ZHNuxdNuwaMjg@mail.gmail.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote:
> Currently for the following function:
> int f(int a, int b)
> return a + (b <<7);
> GCC produces:
> add w0, w0, w1, lsl 7
> But for ThunderX 1, it is better if the instruction was split allowing
> better scheduling to happen in most cases, the latency is the same. I
> get a small improvement in coremarks, ~1%.
> Currently the code does not take into account Arith_shift even though
> the comment:
> /* Strip any extend, leave shifts behind as we will
> cost them through mult_cost. */
> Say it does not strip out the shift, aarch64_strip_extend does and has
> always has since the back-end was added to GCC.
> Once I fixed the code around aarch64_strip_extend, I got a regression
> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero
> extends) are considered free so I needed to add a new tuning flag.
> Note I will get an even more improvement for ThunderX 2 CN99XX, but I
> have not measured it yet as I have not made the change to
> aarch64-cost-tables.h yet as I am waiting for approval of the renaming
> patch first before submitting any of the cost table changes. Also I
> noticed this problem with this tuning first and then looked back at
> what I needed to do for ThunderX 1.
> OK? Bootstrapped and tested on aarch64-linux-gnu without any
> regressions (both with and without --with-cpu=thunderx).
This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap
or free seems better. I have some other minor points below.
> Index: config/aarch64/aarch64-tuning-flags.def
> --- config/aarch64/aarch64-tuning-flags.def (revision 243974)
> +++ config/aarch64/aarch64-tuning-flags.def (working copy)
> @@ -35,4 +35,8 @@ two load/stores are not at least 8 byte
> pairs. */
> AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW)
> +/* Logical shift left <=4 with/without zero extend are considered easy
> + extended, also zero extends without the shift. */
I'm struggling to parse this comment. "also zero extends without the shift"
is what is getting me. I'm also not certain I follow when I should set this
flag. If all shifts are cheap/free on my platform, should I set this flag?
> +AARCH64_EXTRA_TUNING_OPTION ("easy_shift_extend", EASY_SHIFT_EXTEND)
> #undef AARCH64_EXTRA_TUNING_OPTION
> +/* Return true iff X is an easy shift without a sign extend. */
Again I don't like calling <= 4 "easy", it feels imprecise.