This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][AArch64] Support for LDP/STP of Q-registers
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, Siddhesh Poyarekar <siddhesh at sourceware dot org>, Sameera Deshpande <sameera dot deshpande at linaro dot org>, "sellcey at cavium dot com" <sellcey at cavium dot com>, nd <nd at arm dot com>, "philipp dot tomsich at theobroma-systems dot com" <philipp dot tomsich at theobroma-systems dot com>, "e dot menezes at samsung dot com" <e dot menezes at samsung dot com>, "benedikt dot huber at theobroma-systems dot com" <benedikt dot huber at theobroma-systems dot com>
- Date: Tue, 19 Jun 2018 16:52:11 +0100
- Subject: Re: [PATCH][AArch64] Support for LDP/STP of Q-registers
- Nodisclaimer: True
- References: <5B157981.3010408@foss.arm.com> <5B16BB06.3020709@foss.arm.com> <20180605172844.GA7331@arm.com> <5B190FB9.9030208@foss.arm.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Thu, Jun 07, 2018 at 05:58:01AM -0500, Kyrill Tkachov wrote:
>
> On 05/06/18 18:28, James Greenhalgh wrote:
> > On Tue, Jun 05, 2018 at 11:32:06AM -0500, Kyrill Tkachov wrote:
> >> On 04/06/18 18:40, Kyrill Tkachov wrote:
> >>> Hi all,
> >>>
> >>> This patch adds support for generating LDPs and STPs of Q-registers.
> >>> This allows for more compact code generation and makes better use of the ISA.
> >>>
> >>> It's implemented in a straightforward way by allowing 16-byte modes in the
> >>> sched-fusion machinery and adding appropriate peepholes in aarch64-ldpstp.md
> >>> as well as the patterns themselves in aarch64-simd.md.
> >>>
> >>> I didn't see any non-noise performance effect on SPEC2017 on Cortex-A72 and Cortex-A53.
> >>>
> >> Adding some folks who know more about other CPUs as well.
> >> Are you okay with enabling these instructions in AArch64?
> >>
> >> If you could give this a spin on some benchmarks you
> >> care about on your platforms it would be really useful data.
> > From an architecture perspective, I think this is the right thing for us
> > to do. Given the feedback from Andrew and Siddhesh I think we should support
> > this patch, defaulting to on; but behind a tuning flag for those who want
> > to disable it for their -mcpu tuning.
> >
> > If you can respin it behind a tuning parameter and give the community
> > another 48 hours or so to respond, I think we'd have a good patch here.
> >
> > I'm also adding some more contributors to the AArch64 cores file for their
> > thoughts on the proposal.
>
> Ok, here's an updated patch adding a new no_ldp_stp_qregs tuning flag.
> I use it to restrict the peepholes in aarch64-ldpstp.md from merging the
> operations together into PARALLELs. I also use it to restrict the sched fusion
> check that brings such loads and stores together. This is enough to avoid
> forming the pairs.
>
> Given Philipp's comments I've enabled this flag for xgene1_tunings so that -mtune=xgene1
> will not generate these pairs, but every other tuning will by default.
>
> The tests are updated to explicitly pass -moverride=tune=none so that
> they do not depend on a particular default -mcpu option.
> This -moverride option disables all the tuning flags, so it disables
> the "disabling" of LDP/STP-Q.
>
> I've manually confirmed that for -mcpu=xgene1 the pairs are not formed
> and added a test that the new tuning flag does indeed disable the generation.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> What do folks think of this version?
OK for trunk.
We can tune up other cores ready for the GCC 9 release; but no reason not
to have this in now. If, as we get close to GCC 9 the consensus in our
supported cores is to go back to GCC 8 behaviour, we can flick the switch
to no_ldp_stp_qregs in generic, and cores which will benefit from this can
turn it back on in their private tuning structures.
Thanks,
James
> 2018-06-07 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * config/aarch64/aarch64-tuning-flags.def (no_ldp_stp_qregs): New.
> * config/aarch64/aarch64.c (xgene1_tunings): Add
> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS to tune_flags.
> (aarch64_mode_valid_for_sched_fusion_p):
> Allow 16-byte modes.
> (aarch64_classify_address): Allow 16-byte modes for load_store_pair_p.
> * config/aarch64/aarch64-ldpstp.md: Add peepholes for LDP STP of
> 128-bit modes.
> * config/aarch64/aarch64-simd.md (load_pair<VQ:mode><VQ2:mode>):
> New pattern.
> (vec_store_pair<VQ:mode><VQ2:mode>): Likewise.
> * config/aarch64/iterators.md (VQ2): New mode iterator.
>
> 2018-06-07 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/ldp_stp_q.c: New test.
> * gcc.target/aarch64/stp_vec_128_1.c: Likewise.
> * gcc.target/aarch64/ldp_stp_q_disable.c: Likewise.