[PATCH][AArch64] Accept more addressing modes for PRFM

Richard Earnshaw (lists) Richard.Earnshaw@arm.com
Wed Feb 15 15:03:00 GMT 2017


On 03/02/17 17:12, Kyrill Tkachov wrote:
> Hi all,
> 
> While evaluating Maxim's SW prefetch patches [1] I noticed that the
> aarch64 prefetch pattern is
> overly restrictive in its address operand. It only accepts simple
> register addressing modes.
> In fact, the PRFM instruction accepts almost all modes that a normal
> 64-bit LDR supports.
> The restriction in the pattern leads to explicit address calculation
> code to be emitted which we could avoid.
> 
> This patch relaxes the restrictions on the prefetch define_insn. It
> creates a predicate and constraint that
> allow the full addressing modes that PRFM allows. Thus for the testcase
> in the patch (adapted from one of the existing
> __builtin_prefetch tests in the testsuite) we can generate a:
> prfm    PLDL1STRM, [x1, 8]
> 
> instead of the current
> prfm    PLDL1STRM, [x1]
> with an explicit increment of x1 by 8 in a separate instruction.
> 
> I've removed the %a output modifier in the output template and wrapped
> the address operand into a DImode MEM before
> passing it down to aarch64_print_operand.
> 
> This is because operand 0 is an address operand rather than a memory
> operand and thus doesn't have a mode associated
> with it.  When processing the 'a' output modifier the code in final.c
> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
> argument.  This will ICE on aarch64 because we need a mode for the
> memory in order for aarch64_classify_address to work
> correctly.  Rather than overriding the VOIDmode in
> aarch64_print_operand_address I decided to instead create the DImode
> MEM in the "prefetch" output template and treat it as a normal 64-bit
> memory address, which at the point of assembly output
> is what it is anyway.
> 
> With this patch I see a reduction in instruction count in the SPEC2006
> benchmarks when SW prefetching is enabled on top
> of Maxim's patchset because fewer address calculation instructions are
> emitted due to the use of the more expressive
> addressing modes. It also fixes a performance regression that I observed
> in 410.bwaves from Maxim's patches on Cortex-A72.
> I'll be running a full set of benchmarks to evaluate this further, but I
> think this is the right thing to do.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Maxim, do you want to try this on top of your patches on your hardware
> to see if it helps with the regressions you mentioned?
> 
> Thanks,
> Kyrill
> 
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
> 
> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * config/aarch64/aarch64.md (prefetch); Adjust predicate and
>     constraint on operand 0 to allow more general addressing modes.
>     Adjust output template.
>     * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
>     New function.
>     * config/aarch64/aarch64-protos.h
>     (aarch64_address_valid_for_prefetch_p): Declare prototype.
>     * config/aarch64/constraints.md (Dp): New address constraint.
>     * config/aarch64/predicates.md (aarch64_prefetch_operand): New
>     predicate.
> 
> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
> 
> aarch64-prfm-imm.patch
> 

Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
contains an address, not a MEM.  So it's theoretically possible for
generic code to want to look inside the first operand and find an
address directly.  This change would break that assumption.

R.

> 
> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date:   Thu Feb 2 14:46:11 2017 +0000
> 
>     [AArch64] Accept more addressing modes for PRFM
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index babc327..61706de 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>  
>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
>  int aarch64_get_condition_code (rtx);
> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index acc093a..c05eff3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info *info,
>      }
>  }
>  
> +/* Return true if the address X is valid for a PRFM instruction.
> +   STRICT_P is true if we should do strict checking with
> +   aarch64_classify_address.  */
> +
> +bool
> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
> +{
> +  struct aarch64_address_info addr;
> +
> +  /* PRFM accepts the same addresses as DImode...  */
> +  bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p);
> +  if (!res)
> +    return false;
> +
> +  /* ... except writeback forms.  */
> +  return addr.type != ADDRESS_REG_WB;
> +}
> +
>  bool
>  aarch64_symbolic_address_p (rtx x)
>  {
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b9e8edf..c6201a5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -518,27 +518,31 @@ (define_insn "nop"
>  )
>  
>  (define_insn "prefetch"
> -  [(prefetch (match_operand:DI 0 "register_operand" "r")
> +  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
>              (match_operand:QI 1 "const_int_operand" "")
>              (match_operand:QI 2 "const_int_operand" ""))]
>    ""
>    {
> -    const char * pftype[2][4] = 
> +    const char * pftype[2][4] =
>      {
> -      {"prfm\\tPLDL1STRM, %a0",
> -       "prfm\\tPLDL3KEEP, %a0",
> -       "prfm\\tPLDL2KEEP, %a0",
> -       "prfm\\tPLDL1KEEP, %a0"},
> -      {"prfm\\tPSTL1STRM, %a0",
> -       "prfm\\tPSTL3KEEP, %a0",
> -       "prfm\\tPSTL2KEEP, %a0",
> -       "prfm\\tPSTL1KEEP, %a0"},
> +      {"prfm\\tPLDL1STRM, %0",
> +       "prfm\\tPLDL3KEEP, %0",
> +       "prfm\\tPLDL2KEEP, %0",
> +       "prfm\\tPLDL1KEEP, %0"},
> +      {"prfm\\tPSTL1STRM, %0",
> +       "prfm\\tPSTL3KEEP, %0",
> +       "prfm\\tPSTL2KEEP, %0",
> +       "prfm\\tPSTL1KEEP, %0"},
>      };
>  
>      int locality = INTVAL (operands[2]);
>  
>      gcc_assert (IN_RANGE (locality, 0, 3));
>  
> +    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
> +       the address into a DImode MEM so that aarch64_print_operand knows
> +       how to print it.  */
> +    operands[0] = gen_rtx_MEM (DImode, operands[0]);
>      return pftype[INTVAL(operands[1])][locality];
>    }
>    [(set_attr "type" "load1")]
> diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
> index 5a252c0..b829337 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -214,3 +214,8 @@ (define_constraint "Dd"
>   A constraint that matches an immediate operand valid for AdvSIMD scalar."
>   (and (match_code "const_int")
>        (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
> +
> +(define_address_constraint "Dp"
> +  "@internal
> + An address valid for a prefetch instruction."
> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
> index e83d45b..8e3ea9b 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
>         (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
>  					       0)")))
>  
> +(define_predicate "aarch64_prefetch_operand"
> +  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
> +
>  (define_predicate "aarch64_valid_symref"
>    (match_code "const, symbol_ref, label_ref")
>  {
> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
> new file mode 100644
> index 0000000..26ab913
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Check that we can generate the immediate-offset addressing
> +   mode for PRFM.  */
> +
> +#define ARRSIZE 65
> +int *bad_addr[ARRSIZE];
> +
> +void
> +prefetch_for_read (void)
> +{
> +  int i;
> +  for (i = 0; i < ARRSIZE; i++)
> +    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
> +}
> +
> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */
> 



More information about the Gcc-patches mailing list