[PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

Thu Sep 10 18:18:01 GMT 2020

Alex Coplan <alex.coplan@arm.com> writes:
> Hello,
>
> Since r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf introduced a
> canonicalization from mult to shift on address reloads, a missing
> pattern in the AArch64 backend was exposed.
>
> In particular, we were missing the ashift variant of
> *add_<optab><mode>_multp2 (this mult variant is redundant and was
> removed in r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8).
>
> This patch adds that missing pattern (fixing PR96998), updates
> aarch64_is_extend_from_extract() to work for the shift pattern (instead
> of the redundant mult variant) and updates callers in the cost
> calculations to apply the costing for the shift variant instead.

Hmm.  I think we should instead fix this in combine.  For the related
testcase:

void g(void) {
  int g;
  for (;; g = h()) {
    asm volatile ("" :: "r"(&d.e[g]));
  }
}

combine tries:

Trying 9 -> 10:
    9: r99:DI=sign_extend(r93:SI)<<0x2
      REG_DEAD r93:SI
   10: r100:DI=r96:DI+r99:DI
      REG_DEAD r99:DI
Successfully matched this instruction:
(set (reg:DI 100)
    (plus:DI (ashift:DI (sign_extend:DI (reg/v:SI 93 [ g ]))
            (const_int 2 [0x2]))
        (reg/f:DI 96)))
allowing combination of insns 9 and 10
original costs 4 + 4 = 8
replacement cost 8
deferring deletion of insn with uid = 9.
modifying insn i3    10: r100:DI=sign_extend(r93:SI)<<0x2+r96:DI
      REG_DEAD r93:SI
deferring rescan insn with uid = 10.

where (shift (extend …) …) is IMO the correct representation of this
operation.  The new pattern comes from this code in make_extraction:

  else if (GET_CODE (inner) == ASHIFT
	   && CONST_INT_P (XEXP (inner, 1))
	   && pos_rtx == 0 && pos == 0
	   && len > UINTVAL (XEXP (inner, 1)))
    {
      /* We're extracting the least significant bits of an rtx
	 (ashift X (const_int C)), where LEN > C.  Extract the
	 least significant (LEN - C) bits of X, giving an rtx
	 whose mode is MODE, then shift it left C times.  */
      new_rtx = make_extraction (mode, XEXP (inner, 0),
			     0, 0, len - INTVAL (XEXP (inner, 1)),
			     unsignedp, in_dest, in_compare);
      if (new_rtx != 0)
	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
    }

But because of the unfortunate special case that shifts need
to be MULTs inside MEMs, make_compound_operation_int has:

    case ASHIFT:
      /* Convert shifts by constants into multiplications if inside
	 an address.  */
      if (in_code == MEM && CONST_INT_P (XEXP (x, 1))
	  && INTVAL (XEXP (x, 1)) < HOST_BITS_PER_WIDE_INT
	  && INTVAL (XEXP (x, 1)) >= 0)
	{
	  HOST_WIDE_INT count = INTVAL (XEXP (x, 1));
	  HOST_WIDE_INT multval = HOST_WIDE_INT_1 << count;

	  new_rtx = make_compound_operation (XEXP (x, 0), next_code);
	  if (GET_CODE (new_rtx) == NEG)
	    {
	      new_rtx = XEXP (new_rtx, 0);
	      multval = -multval;
	    }
	  multval = trunc_int_for_mode (multval, mode);
	  new_rtx = gen_rtx_MULT (mode, new_rtx, gen_int_mode (multval, mode));
	}
      break;

Thus for:

    case ASHIFTRT:
      lhs = XEXP (x, 0);
      rhs = XEXP (x, 1);

      /* If we have (ashiftrt (ashift foo C1) C2) with C2 >= C1,
	 this is a SIGN_EXTRACT.  */
      if (CONST_INT_P (rhs)
	  && GET_CODE (lhs) == ASHIFT
	  && CONST_INT_P (XEXP (lhs, 1))
	  && INTVAL (rhs) >= INTVAL (XEXP (lhs, 1))
	  && INTVAL (XEXP (lhs, 1)) >= 0
	  && INTVAL (rhs) < mode_width)
	{
	  new_rtx = make_compound_operation (XEXP (lhs, 0), next_code);
	  new_rtx = make_extraction (mode, new_rtx,
				     INTVAL (rhs) - INTVAL (XEXP (lhs, 1)),
				     NULL_RTX, mode_width - INTVAL (rhs),
				     code == LSHIFTRT, 0, in_code == COMPARE);
	  break;
	}

the first new_rtx is a MULT rather than an ASHIFT when the operation
occurs inside a MEM.  make_extraction then doesn't do the same
canonicalisation as it does for ASHIFT above.  So for the original
testcase we get:

Trying 8 -> 9:
    8: r98:DI=sign_extend(r92:SI)
      REG_DEAD r92:SI
    9: [r98:DI*0x4+r96:DI]=asm_operands
      REG_DEAD r98:DI
Failed to match this instruction:
(set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
                    (const_int 4 [0x4]))
                (const_int 34 [0x22])
                (const_int 0 [0]))
            (reg/f:DI 96)) [3 *i_5+0 S4 A32])
    (asm_operands:SI ("") ("=Q") 0 []
         []
         [] /tmp/foo.c:13))
allowing combination of insns 8 and 9
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 8.
modifying insn i3     9: [sign_extract(r92:SI#0*0x4,0x22,0)+r96:DI]=asm_operands
      REG_DEAD r92:SI
deferring rescan insn with uid = 9.
starting the processing of deferred insns
rescanning insn with uid = 9.
ending the processing of deferred insns

So IMO make_extraction should recognise and handle the “mem form”
of the shift too.

It's all a bit unfortunate really.  Having different representations
for shifts inside and outside MEMs is the Second Great RTL Mistake.
(The first was modeless const_ints. :-))

If we do that, we should be able to remove the handling of
extract-based addresses in aarch64_classify_index & co.

Thanks,
Richard

>
> Testing:
>  * Bootstrap and regtest on aarch64-none-linux-gnu, no regressions.
>  * Checked new unit test passes on arm-none-linux-gnueabihf.
>  * Checked new unit test isn't run on x86 (inline asm uses
>    arm/aarch64-specific constraint).
>  * Tested linux-next tree no longer encounters this ICE on arm64 with
>    patched GCC (unfortunately still doesn't compile successfully due to
>    a fix for PR96475 which introduces an ICE on arm64).
>
> OK for trunk?
>
> Thanks,
> Alex
>
> ---
>
> gcc/ChangeLog:
>
> 	PR target/96998
> 	* config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Update to
> 	work for shift pattern instead of mult.
> 	(aarch64_strip_extend): Cost extract+shift pattern instead of
> 	now-removed extract+mult pattern.
> 	(aarch64_rtx_arith_op_extract_p): Likewise.
> 	* config/aarch64/aarch64.md (*add_<optab><mode>_shftex): New.
>
> gcc/testsuite/ChangeLog:
>
> 	PR target/96998
> 	* gcc.c-torture/compile/pr96998.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b6d74496cd0..55e0fc4e683 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -2815,27 +2815,24 @@ aarch64_is_noplt_call_p (rtx sym)
>     represent an expression that matches an extend operation.  The
>     operands represent the parameters from
>  
> -   (extract:MODE (mult (reg) (MULT_IMM)) (EXTRACT_IMM) (const_int 0)).  */
> +   (extract:MODE (ashift (reg) (SHIFT_IMM)) (EXTRACT_IMM) (const_int 0)).  */
>  bool
> -aarch64_is_extend_from_extract (scalar_int_mode mode, rtx mult_imm,
> +aarch64_is_extend_from_extract (scalar_int_mode mode, rtx shift_imm,
>  				rtx extract_imm)
>  {
> -  HOST_WIDE_INT mult_val, extract_val;
> +  HOST_WIDE_INT shift_val, extract_val;
>  
> -  if (! CONST_INT_P (mult_imm) || ! CONST_INT_P (extract_imm))
> +  if (! CONST_INT_P (shift_imm) || ! CONST_INT_P (extract_imm))
>      return false;
>  
> -  mult_val = INTVAL (mult_imm);
> +  shift_val = INTVAL (shift_imm);
>    extract_val = INTVAL (extract_imm);
>  
> -  if (extract_val > 8
> -      && extract_val < GET_MODE_BITSIZE (mode)
> -      && exact_log2 (extract_val & ~7) > 0
> -      && (extract_val & 7) <= 4
> -      && mult_val == (1 << (extract_val & 7)))
> -    return true;
> -
> -  return false;
> +  return extract_val > 8
> +	  && extract_val < GET_MODE_BITSIZE (mode)
> +	  && exact_log2 (extract_val & ~7) > 0
> +	  && shift_val <= 4
> +	  && shift_val == (extract_val & 7);
>  }
>  
>  /* Emit an insn that's a simple single-set.  Both the operands must be
> @@ -11262,7 +11259,7 @@ aarch64_strip_extend (rtx x, bool strip_shift)
>    /* Zero and sign extraction of a widened value.  */
>    if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
>        && XEXP (op, 2) == const0_rtx
> -      && GET_CODE (XEXP (op, 0)) == MULT
> +      && GET_CODE (XEXP (op, 0)) == ASHIFT
>        && aarch64_is_extend_from_extract (mode, XEXP (XEXP (op, 0), 1),
>  					 XEXP (op, 1)))
>      return XEXP (XEXP (op, 0), 0);
> @@ -11617,7 +11614,7 @@ aarch64_rtx_arith_op_extract_p (rtx x, scalar_int_mode mode)
>        rtx op1 = XEXP (x, 1);
>        rtx op2 = XEXP (x, 2);
>  
> -      if (GET_CODE (op0) == MULT
> +      if (GET_CODE (op0) == ASHIFT
>  	  && CONST_INT_P (op1)
>  	  && op2 == const0_rtx
>  	  && CONST_INT_P (XEXP (op0, 1))
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index dbc6b1db176..4bb7b318b99 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2471,6 +2471,19 @@ (define_insn "*add_<optab><ALLX:mode>_<GPI:mode>"
>    [(set_attr "type" "alu_ext")]
>  )
>  
> +(define_insn "*add_<optab><mode>_shftex"
> +  [(set (match_operand:GPI 0 "register_operand" "=rk")
> +	(plus:GPI (ANY_EXTRACT:GPI
> +		    (ashift:GPI (match_operand:GPI 1 "register_operand" "r")
> +				(match_operand 2 "aarch64_shift_imm_<mode>" "n"))
> +		    (match_operand 3 "const_int_operand" "n")
> +		    (const_int 0))
> +		  (match_operand:GPI 4 "register_operand" "r")))]
> +  "aarch64_is_extend_from_extract (<MODE>mode, operands[2], operands[3])"
> +  "add\\t%<w>0, %<w>4, %<w>1, <su>xt%e3 %2"
> +  [(set_attr "type" "alu_ext")]
> +)
> +
>  ;; zero_extend version of above
>  (define_insn "*add_<optab><SHORT:mode>_si_uxtw"
>    [(set (match_operand:DI 0 "register_operand" "=rk")
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr96998.c b/gcc/testsuite/gcc.c-torture/compile/pr96998.c
> new file mode 100644
> index 00000000000..a75d5dcfe08
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr96998.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target arm*-*-* aarch64*-*-* } } */
> +
> +int h(void);
> +struct c d;
> +struct c {
> +  int e[1];
> +};
> +
> +void f(void) {
> +  int g;
> +  for (;; g = h()) {
> +    int *i = &d.e[g];
> +    asm("" : "=Q"(*i));
> +  }
> +}