[PATCH][AArch64] Accept more addressing modes for PRFM

Thu May 4 16:15:00 GMT 2017

On 15/02/17 15:30, Richard Earnshaw (lists) wrote:
> On 15/02/17 15:03, Kyrill Tkachov wrote:
>> Hi Richard,
>>
>> On 15/02/17 15:00, Richard Earnshaw (lists) wrote:
>>> On 03/02/17 17:12, Kyrill Tkachov wrote:
>>>> Hi all,
>>>>
>>>> While evaluating Maxim's SW prefetch patches [1] I noticed that the
>>>> aarch64 prefetch pattern is
>>>> overly restrictive in its address operand. It only accepts simple
>>>> register addressing modes.
>>>> In fact, the PRFM instruction accepts almost all modes that a normal
>>>> 64-bit LDR supports.
>>>> The restriction in the pattern leads to explicit address calculation
>>>> code to be emitted which we could avoid.
>>>>
>>>> This patch relaxes the restrictions on the prefetch define_insn. It
>>>> creates a predicate and constraint that
>>>> allow the full addressing modes that PRFM allows. Thus for the testcase
>>>> in the patch (adapted from one of the existing
>>>> __builtin_prefetch tests in the testsuite) we can generate a:
>>>> prfm    PLDL1STRM, [x1, 8]
>>>>
>>>> instead of the current
>>>> prfm    PLDL1STRM, [x1]
>>>> with an explicit increment of x1 by 8 in a separate instruction.
>>>>
>>>> I've removed the %a output modifier in the output template and wrapped
>>>> the address operand into a DImode MEM before
>>>> passing it down to aarch64_print_operand.
>>>>
>>>> This is because operand 0 is an address operand rather than a memory
>>>> operand and thus doesn't have a mode associated
>>>> with it.  When processing the 'a' output modifier the code in final.c
>>>> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
>>>> argument.  This will ICE on aarch64 because we need a mode for the
>>>> memory in order for aarch64_classify_address to work
>>>> correctly.  Rather than overriding the VOIDmode in
>>>> aarch64_print_operand_address I decided to instead create the DImode
>>>> MEM in the "prefetch" output template and treat it as a normal 64-bit
>>>> memory address, which at the point of assembly output
>>>> is what it is anyway.
>>>>
>>>> With this patch I see a reduction in instruction count in the SPEC2006
>>>> benchmarks when SW prefetching is enabled on top
>>>> of Maxim's patchset because fewer address calculation instructions are
>>>> emitted due to the use of the more expressive
>>>> addressing modes. It also fixes a performance regression that I observed
>>>> in 410.bwaves from Maxim's patches on Cortex-A72.
>>>> I'll be running a full set of benchmarks to evaluate this further, but I
>>>> think this is the right thing to do.
>>>>
>>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>>
>>>> Maxim, do you want to try this on top of your patches on your hardware
>>>> to see if it helps with the regressions you mentioned?
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>>
>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
>>>>
>>>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * config/aarch64/aarch64.md (prefetch); Adjust predicate and
>>>>       constraint on operand 0 to allow more general addressing modes.
>>>>       Adjust output template.
>>>>       * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
>>>>       New function.
>>>>       * config/aarch64/aarch64-protos.h
>>>>       (aarch64_address_valid_for_prefetch_p): Declare prototype.
>>>>       * config/aarch64/constraints.md (Dp): New address constraint.
>>>>       * config/aarch64/predicates.md (aarch64_prefetch_operand): New
>>>>       predicate.
>>>>
>>>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
>>>>
>>>> aarch64-prfm-imm.patch
>>>>
>>> Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
>>> contains an address, not a MEM.  So it's theoretically possible for
>>> generic code to want to look inside the first operand and find an
>>> address directly.  This change would break that assumption.
>> With this change the prefetch operand is still an address, not a MEM
>> during all the
>> optimisation passes.
>> It's wrapped in a MEM only during the ultimate printing of the assembly
>> string
>> during 'final'.
>>
> Ah!  I'd missed that.
>
> This is OK for stage1.

I've bootstrapped and tested the patch against current trunk and committed
it as r247603.

Thanks,
Kyrill

> R.
>
>> Kyrill
>>
>>> R.
>>>
>>>> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
>>>> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>> Date:   Thu Feb 2 14:46:11 2017 +0000
>>>>
>>>>       [AArch64] Accept more addressing modes for PRFM
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64-protos.h
>>>> b/gcc/config/aarch64/aarch64-protos.h
>>>> index babc327..61706de 100644
>>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>>> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>>>>      HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
>>>> unsigned);
>>>>    int aarch64_get_condition_code (rtx);
>>>> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>>>>    bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>>>    unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>>>>    unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index acc093a..c05eff3 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct
>>>> aarch64_address_info *info,
>>>>        }
>>>>    }
>>>>    +/* Return true if the address X is valid for a PRFM instruction.
>>>> +   STRICT_P is true if we should do strict checking with
>>>> +   aarch64_classify_address.  */
>>>> +
>>>> +bool
>>>> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
>>>> +{
>>>> +  struct aarch64_address_info addr;
>>>> +
>>>> +  /* PRFM accepts the same addresses as DImode...  */
>>>> +  bool res = aarch64_classify_address (&addr, x, DImode, MEM,
>>>> strict_p);
>>>> +  if (!res)
>>>> +    return false;
>>>> +
>>>> +  /* ... except writeback forms.  */
>>>> +  return addr.type != ADDRESS_REG_WB;
>>>> +}
>>>> +
>>>>    bool
>>>>    aarch64_symbolic_address_p (rtx x)
>>>>    {
>>>> diff --git a/gcc/config/aarch64/aarch64.md
>>>> b/gcc/config/aarch64/aarch64.md
>>>> index b9e8edf..c6201a5 100644
>>>> --- a/gcc/config/aarch64/aarch64.md
>>>> +++ b/gcc/config/aarch64/aarch64.md
>>>> @@ -518,27 +518,31 @@ (define_insn "nop"
>>>>    )
>>>>      (define_insn "prefetch"
>>>> -  [(prefetch (match_operand:DI 0 "register_operand" "r")
>>>> +  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
>>>>                (match_operand:QI 1 "const_int_operand" "")
>>>>                (match_operand:QI 2 "const_int_operand" ""))]
>>>>      ""
>>>>      {
>>>> -    const char * pftype[2][4] =
>>>> +    const char * pftype[2][4] =
>>>>        {
>>>> -      {"prfm\\tPLDL1STRM, %a0",
>>>> -       "prfm\\tPLDL3KEEP, %a0",
>>>> -       "prfm\\tPLDL2KEEP, %a0",
>>>> -       "prfm\\tPLDL1KEEP, %a0"},
>>>> -      {"prfm\\tPSTL1STRM, %a0",
>>>> -       "prfm\\tPSTL3KEEP, %a0",
>>>> -       "prfm\\tPSTL2KEEP, %a0",
>>>> -       "prfm\\tPSTL1KEEP, %a0"},
>>>> +      {"prfm\\tPLDL1STRM, %0",
>>>> +       "prfm\\tPLDL3KEEP, %0",
>>>> +       "prfm\\tPLDL2KEEP, %0",
>>>> +       "prfm\\tPLDL1KEEP, %0"},
>>>> +      {"prfm\\tPSTL1STRM, %0",
>>>> +       "prfm\\tPSTL3KEEP, %0",
>>>> +       "prfm\\tPSTL2KEEP, %0",
>>>> +       "prfm\\tPSTL1KEEP, %0"},
>>>>        };
>>>>          int locality = INTVAL (operands[2]);
>>>>          gcc_assert (IN_RANGE (locality, 0, 3));
>>>>    +    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
>>>> +       the address into a DImode MEM so that aarch64_print_operand
>>>> knows
>>>> +       how to print it.  */
>>>> +    operands[0] = gen_rtx_MEM (DImode, operands[0]);
>>>>        return pftype[INTVAL(operands[1])][locality];
>>>>      }
>>>>      [(set_attr "type" "load1")]
>>>> diff --git a/gcc/config/aarch64/constraints.md
>>>> b/gcc/config/aarch64/constraints.md
>>>> index 5a252c0..b829337 100644
>>>> --- a/gcc/config/aarch64/constraints.md
>>>> +++ b/gcc/config/aarch64/constraints.md
>>>> @@ -214,3 +214,8 @@ (define_constraint "Dd"
>>>>     A constraint that matches an immediate operand valid for AdvSIMD
>>>> scalar."
>>>>     (and (match_code "const_int")
>>>>          (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
>>>> +
>>>> +(define_address_constraint "Dp"
>>>> +  "@internal
>>>> + An address valid for a prefetch instruction."
>>>> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
>>>> diff --git a/gcc/config/aarch64/predicates.md
>>>> b/gcc/config/aarch64/predicates.md
>>>> index e83d45b..8e3ea9b 100644
>>>> --- a/gcc/config/aarch64/predicates.md
>>>> +++ b/gcc/config/aarch64/predicates.md
>>>> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
>>>>           (match_test "aarch64_legitimate_address_p (mode, XEXP (op,
>>>> 0), PARALLEL,
>>>>                               0)")))
>>>>    +(define_predicate "aarch64_prefetch_operand"
>>>> +  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
>>>> +
>>>>    (define_predicate "aarch64_valid_symref"
>>>>      (match_code "const, symbol_ref, label_ref")
>>>>    {
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>>> b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>>> new file mode 100644
>>>> index 0000000..26ab913
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>>> @@ -0,0 +1,18 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2" } */
>>>> +
>>>> +/* Check that we can generate the immediate-offset addressing
>>>> +   mode for PRFM.  */
>>>> +
>>>> +#define ARRSIZE 65
>>>> +int *bad_addr[ARRSIZE];
>>>> +
>>>> +void
>>>> +prefetch_for_read (void)
>>>> +{
>>>> +  int i;
>>>> +  for (i = 0; i < ARRSIZE; i++)
>>>> +    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1
>>>> } } */
>>>>