[PATCH][AArch64] Accept more addressing modes for PRFM
Richard Earnshaw (lists)
Richard.Earnshaw@arm.com
Wed Feb 15 15:03:00 GMT 2017
On 03/02/17 17:12, Kyrill Tkachov wrote:
> Hi all,
>
> While evaluating Maxim's SW prefetch patches [1] I noticed that the
> aarch64 prefetch pattern is
> overly restrictive in its address operand. It only accepts simple
> register addressing modes.
> In fact, the PRFM instruction accepts almost all modes that a normal
> 64-bit LDR supports.
> The restriction in the pattern leads to explicit address calculation
> code to be emitted which we could avoid.
>
> This patch relaxes the restrictions on the prefetch define_insn. It
> creates a predicate and constraint that
> allow the full addressing modes that PRFM allows. Thus for the testcase
> in the patch (adapted from one of the existing
> __builtin_prefetch tests in the testsuite) we can generate a:
> prfm PLDL1STRM, [x1, 8]
>
> instead of the current
> prfm PLDL1STRM, [x1]
> with an explicit increment of x1 by 8 in a separate instruction.
>
> I've removed the %a output modifier in the output template and wrapped
> the address operand into a DImode MEM before
> passing it down to aarch64_print_operand.
>
> This is because operand 0 is an address operand rather than a memory
> operand and thus doesn't have a mode associated
> with it. When processing the 'a' output modifier the code in final.c
> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
> argument. This will ICE on aarch64 because we need a mode for the
> memory in order for aarch64_classify_address to work
> correctly. Rather than overriding the VOIDmode in
> aarch64_print_operand_address I decided to instead create the DImode
> MEM in the "prefetch" output template and treat it as a normal 64-bit
> memory address, which at the point of assembly output
> is what it is anyway.
>
> With this patch I see a reduction in instruction count in the SPEC2006
> benchmarks when SW prefetching is enabled on top
> of Maxim's patchset because fewer address calculation instructions are
> emitted due to the use of the more expressive
> addressing modes. It also fixes a performance regression that I observed
> in 410.bwaves from Maxim's patches on Cortex-A72.
> I'll be running a full set of benchmarks to evaluate this further, but I
> think this is the right thing to do.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Maxim, do you want to try this on top of your patches on your hardware
> to see if it helps with the regressions you mentioned?
>
> Thanks,
> Kyrill
>
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
>
> 2016-02-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * config/aarch64/aarch64.md (prefetch); Adjust predicate and
> constraint on operand 0 to allow more general addressing modes.
> Adjust output template.
> * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
> New function.
> * config/aarch64/aarch64-protos.h
> (aarch64_address_valid_for_prefetch_p): Declare prototype.
> * config/aarch64/constraints.md (Dp): New address constraint.
> * config/aarch64/predicates.md (aarch64_prefetch_operand): New
> predicate.
>
> 2016-02-03 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
>
> aarch64-prfm-imm.patch
>
Hmm, I'm not sure about this. rtl.texi says that a prefetch code
contains an address, not a MEM. So it's theoretically possible for
generic code to want to look inside the first operand and find an
address directly. This change would break that assumption.
R.
>
> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date: Thu Feb 2 14:46:11 2017 +0000
>
> [AArch64] Accept more addressing modes for PRFM
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index babc327..61706de 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>
> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
> int aarch64_get_condition_code (rtx);
> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
> bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
> unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
> unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index acc093a..c05eff3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info *info,
> }
> }
>
> +/* Return true if the address X is valid for a PRFM instruction.
> + STRICT_P is true if we should do strict checking with
> + aarch64_classify_address. */
> +
> +bool
> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
> +{
> + struct aarch64_address_info addr;
> +
> + /* PRFM accepts the same addresses as DImode... */
> + bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p);
> + if (!res)
> + return false;
> +
> + /* ... except writeback forms. */
> + return addr.type != ADDRESS_REG_WB;
> +}
> +
> bool
> aarch64_symbolic_address_p (rtx x)
> {
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b9e8edf..c6201a5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -518,27 +518,31 @@ (define_insn "nop"
> )
>
> (define_insn "prefetch"
> - [(prefetch (match_operand:DI 0 "register_operand" "r")
> + [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
> (match_operand:QI 1 "const_int_operand" "")
> (match_operand:QI 2 "const_int_operand" ""))]
> ""
> {
> - const char * pftype[2][4] =
> + const char * pftype[2][4] =
> {
> - {"prfm\\tPLDL1STRM, %a0",
> - "prfm\\tPLDL3KEEP, %a0",
> - "prfm\\tPLDL2KEEP, %a0",
> - "prfm\\tPLDL1KEEP, %a0"},
> - {"prfm\\tPSTL1STRM, %a0",
> - "prfm\\tPSTL3KEEP, %a0",
> - "prfm\\tPSTL2KEEP, %a0",
> - "prfm\\tPSTL1KEEP, %a0"},
> + {"prfm\\tPLDL1STRM, %0",
> + "prfm\\tPLDL3KEEP, %0",
> + "prfm\\tPLDL2KEEP, %0",
> + "prfm\\tPLDL1KEEP, %0"},
> + {"prfm\\tPSTL1STRM, %0",
> + "prfm\\tPSTL3KEEP, %0",
> + "prfm\\tPSTL2KEEP, %0",
> + "prfm\\tPSTL1KEEP, %0"},
> };
>
> int locality = INTVAL (operands[2]);
>
> gcc_assert (IN_RANGE (locality, 0, 3));
>
> + /* PRFM accepts the same addresses as a 64-bit LDR so wrap
> + the address into a DImode MEM so that aarch64_print_operand knows
> + how to print it. */
> + operands[0] = gen_rtx_MEM (DImode, operands[0]);
> return pftype[INTVAL(operands[1])][locality];
> }
> [(set_attr "type" "load1")]
> diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
> index 5a252c0..b829337 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -214,3 +214,8 @@ (define_constraint "Dd"
> A constraint that matches an immediate operand valid for AdvSIMD scalar."
> (and (match_code "const_int")
> (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
> +
> +(define_address_constraint "Dp"
> + "@internal
> + An address valid for a prefetch instruction."
> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
> index e83d45b..8e3ea9b 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
> (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
> 0)")))
>
> +(define_predicate "aarch64_prefetch_operand"
> + (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
> +
> (define_predicate "aarch64_valid_symref"
> (match_code "const, symbol_ref, label_ref")
> {
> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
> new file mode 100644
> index 0000000..26ab913
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Check that we can generate the immediate-offset addressing
> + mode for PRFM. */
> +
> +#define ARRSIZE 65
> +int *bad_addr[ARRSIZE];
> +
> +void
> +prefetch_for_read (void)
> +{
> + int i;
> + for (i = 0; i < ARRSIZE; i++)
> + __builtin_prefetch (bad_addr[i] + 2, 0, 0);
> +}
> +
> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */
>
More information about the Gcc-patches
mailing list