[PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]
Richard Sandiford
richard.sandiford@arm.com
Thu Sep 10 18:18:01 GMT 2020
Alex Coplan <alex.coplan@arm.com> writes:
> Hello,
>
> Since r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf introduced a
> canonicalization from mult to shift on address reloads, a missing
> pattern in the AArch64 backend was exposed.
>
> In particular, we were missing the ashift variant of
> *add_<optab><mode>_multp2 (this mult variant is redundant and was
> removed in r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8).
>
> This patch adds that missing pattern (fixing PR96998), updates
> aarch64_is_extend_from_extract() to work for the shift pattern (instead
> of the redundant mult variant) and updates callers in the cost
> calculations to apply the costing for the shift variant instead.
Hmm. I think we should instead fix this in combine. For the related
testcase:
void g(void) {
int g;
for (;; g = h()) {
asm volatile ("" :: "r"(&d.e[g]));
}
}
combine tries:
Trying 9 -> 10:
9: r99:DI=sign_extend(r93:SI)<<0x2
REG_DEAD r93:SI
10: r100:DI=r96:DI+r99:DI
REG_DEAD r99:DI
Successfully matched this instruction:
(set (reg:DI 100)
(plus:DI (ashift:DI (sign_extend:DI (reg/v:SI 93 [ g ]))
(const_int 2 [0x2]))
(reg/f:DI 96)))
allowing combination of insns 9 and 10
original costs 4 + 4 = 8
replacement cost 8
deferring deletion of insn with uid = 9.
modifying insn i3 10: r100:DI=sign_extend(r93:SI)<<0x2+r96:DI
REG_DEAD r93:SI
deferring rescan insn with uid = 10.
where (shift (extend …) …) is IMO the correct representation of this
operation. The new pattern comes from this code in make_extraction:
else if (GET_CODE (inner) == ASHIFT
&& CONST_INT_P (XEXP (inner, 1))
&& pos_rtx == 0 && pos == 0
&& len > UINTVAL (XEXP (inner, 1)))
{
/* We're extracting the least significant bits of an rtx
(ashift X (const_int C)), where LEN > C. Extract the
least significant (LEN - C) bits of X, giving an rtx
whose mode is MODE, then shift it left C times. */
new_rtx = make_extraction (mode, XEXP (inner, 0),
0, 0, len - INTVAL (XEXP (inner, 1)),
unsignedp, in_dest, in_compare);
if (new_rtx != 0)
return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
}
But because of the unfortunate special case that shifts need
to be MULTs inside MEMs, make_compound_operation_int has:
case ASHIFT:
/* Convert shifts by constants into multiplications if inside
an address. */
if (in_code == MEM && CONST_INT_P (XEXP (x, 1))
&& INTVAL (XEXP (x, 1)) < HOST_BITS_PER_WIDE_INT
&& INTVAL (XEXP (x, 1)) >= 0)
{
HOST_WIDE_INT count = INTVAL (XEXP (x, 1));
HOST_WIDE_INT multval = HOST_WIDE_INT_1 << count;
new_rtx = make_compound_operation (XEXP (x, 0), next_code);
if (GET_CODE (new_rtx) == NEG)
{
new_rtx = XEXP (new_rtx, 0);
multval = -multval;
}
multval = trunc_int_for_mode (multval, mode);
new_rtx = gen_rtx_MULT (mode, new_rtx, gen_int_mode (multval, mode));
}
break;
Thus for:
case ASHIFTRT:
lhs = XEXP (x, 0);
rhs = XEXP (x, 1);
/* If we have (ashiftrt (ashift foo C1) C2) with C2 >= C1,
this is a SIGN_EXTRACT. */
if (CONST_INT_P (rhs)
&& GET_CODE (lhs) == ASHIFT
&& CONST_INT_P (XEXP (lhs, 1))
&& INTVAL (rhs) >= INTVAL (XEXP (lhs, 1))
&& INTVAL (XEXP (lhs, 1)) >= 0
&& INTVAL (rhs) < mode_width)
{
new_rtx = make_compound_operation (XEXP (lhs, 0), next_code);
new_rtx = make_extraction (mode, new_rtx,
INTVAL (rhs) - INTVAL (XEXP (lhs, 1)),
NULL_RTX, mode_width - INTVAL (rhs),
code == LSHIFTRT, 0, in_code == COMPARE);
break;
}
the first new_rtx is a MULT rather than an ASHIFT when the operation
occurs inside a MEM. make_extraction then doesn't do the same
canonicalisation as it does for ASHIFT above. So for the original
testcase we get:
Trying 8 -> 9:
8: r98:DI=sign_extend(r92:SI)
REG_DEAD r92:SI
9: [r98:DI*0x4+r96:DI]=asm_operands
REG_DEAD r98:DI
Failed to match this instruction:
(set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
(const_int 4 [0x4]))
(const_int 34 [0x22])
(const_int 0 [0]))
(reg/f:DI 96)) [3 *i_5+0 S4 A32])
(asm_operands:SI ("") ("=Q") 0 []
[]
[] /tmp/foo.c:13))
allowing combination of insns 8 and 9
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 8.
modifying insn i3 9: [sign_extract(r92:SI#0*0x4,0x22,0)+r96:DI]=asm_operands
REG_DEAD r92:SI
deferring rescan insn with uid = 9.
starting the processing of deferred insns
rescanning insn with uid = 9.
ending the processing of deferred insns
So IMO make_extraction should recognise and handle the “mem form”
of the shift too.
It's all a bit unfortunate really. Having different representations
for shifts inside and outside MEMs is the Second Great RTL Mistake.
(The first was modeless const_ints. :-))
If we do that, we should be able to remove the handling of
extract-based addresses in aarch64_classify_index & co.
Thanks,
Richard
>
> Testing:
> * Bootstrap and regtest on aarch64-none-linux-gnu, no regressions.
> * Checked new unit test passes on arm-none-linux-gnueabihf.
> * Checked new unit test isn't run on x86 (inline asm uses
> arm/aarch64-specific constraint).
> * Tested linux-next tree no longer encounters this ICE on arm64 with
> patched GCC (unfortunately still doesn't compile successfully due to
> a fix for PR96475 which introduces an ICE on arm64).
>
> OK for trunk?
>
> Thanks,
> Alex
>
> ---
>
> gcc/ChangeLog:
>
> PR target/96998
> * config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Update to
> work for shift pattern instead of mult.
> (aarch64_strip_extend): Cost extract+shift pattern instead of
> now-removed extract+mult pattern.
> (aarch64_rtx_arith_op_extract_p): Likewise.
> * config/aarch64/aarch64.md (*add_<optab><mode>_shftex): New.
>
> gcc/testsuite/ChangeLog:
>
> PR target/96998
> * gcc.c-torture/compile/pr96998.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b6d74496cd0..55e0fc4e683 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -2815,27 +2815,24 @@ aarch64_is_noplt_call_p (rtx sym)
> represent an expression that matches an extend operation. The
> operands represent the parameters from
>
> - (extract:MODE (mult (reg) (MULT_IMM)) (EXTRACT_IMM) (const_int 0)). */
> + (extract:MODE (ashift (reg) (SHIFT_IMM)) (EXTRACT_IMM) (const_int 0)). */
> bool
> -aarch64_is_extend_from_extract (scalar_int_mode mode, rtx mult_imm,
> +aarch64_is_extend_from_extract (scalar_int_mode mode, rtx shift_imm,
> rtx extract_imm)
> {
> - HOST_WIDE_INT mult_val, extract_val;
> + HOST_WIDE_INT shift_val, extract_val;
>
> - if (! CONST_INT_P (mult_imm) || ! CONST_INT_P (extract_imm))
> + if (! CONST_INT_P (shift_imm) || ! CONST_INT_P (extract_imm))
> return false;
>
> - mult_val = INTVAL (mult_imm);
> + shift_val = INTVAL (shift_imm);
> extract_val = INTVAL (extract_imm);
>
> - if (extract_val > 8
> - && extract_val < GET_MODE_BITSIZE (mode)
> - && exact_log2 (extract_val & ~7) > 0
> - && (extract_val & 7) <= 4
> - && mult_val == (1 << (extract_val & 7)))
> - return true;
> -
> - return false;
> + return extract_val > 8
> + && extract_val < GET_MODE_BITSIZE (mode)
> + && exact_log2 (extract_val & ~7) > 0
> + && shift_val <= 4
> + && shift_val == (extract_val & 7);
> }
>
> /* Emit an insn that's a simple single-set. Both the operands must be
> @@ -11262,7 +11259,7 @@ aarch64_strip_extend (rtx x, bool strip_shift)
> /* Zero and sign extraction of a widened value. */
> if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
> && XEXP (op, 2) == const0_rtx
> - && GET_CODE (XEXP (op, 0)) == MULT
> + && GET_CODE (XEXP (op, 0)) == ASHIFT
> && aarch64_is_extend_from_extract (mode, XEXP (XEXP (op, 0), 1),
> XEXP (op, 1)))
> return XEXP (XEXP (op, 0), 0);
> @@ -11617,7 +11614,7 @@ aarch64_rtx_arith_op_extract_p (rtx x, scalar_int_mode mode)
> rtx op1 = XEXP (x, 1);
> rtx op2 = XEXP (x, 2);
>
> - if (GET_CODE (op0) == MULT
> + if (GET_CODE (op0) == ASHIFT
> && CONST_INT_P (op1)
> && op2 == const0_rtx
> && CONST_INT_P (XEXP (op0, 1))
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index dbc6b1db176..4bb7b318b99 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2471,6 +2471,19 @@ (define_insn "*add_<optab><ALLX:mode>_<GPI:mode>"
> [(set_attr "type" "alu_ext")]
> )
>
> +(define_insn "*add_<optab><mode>_shftex"
> + [(set (match_operand:GPI 0 "register_operand" "=rk")
> + (plus:GPI (ANY_EXTRACT:GPI
> + (ashift:GPI (match_operand:GPI 1 "register_operand" "r")
> + (match_operand 2 "aarch64_shift_imm_<mode>" "n"))
> + (match_operand 3 "const_int_operand" "n")
> + (const_int 0))
> + (match_operand:GPI 4 "register_operand" "r")))]
> + "aarch64_is_extend_from_extract (<MODE>mode, operands[2], operands[3])"
> + "add\\t%<w>0, %<w>4, %<w>1, <su>xt%e3 %2"
> + [(set_attr "type" "alu_ext")]
> +)
> +
> ;; zero_extend version of above
> (define_insn "*add_<optab><SHORT:mode>_si_uxtw"
> [(set (match_operand:DI 0 "register_operand" "=rk")
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr96998.c b/gcc/testsuite/gcc.c-torture/compile/pr96998.c
> new file mode 100644
> index 00000000000..a75d5dcfe08
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr96998.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target arm*-*-* aarch64*-*-* } } */
> +
> +int h(void);
> +struct c d;
> +struct c {
> + int e[1];
> +};
> +
> +void f(void) {
> + int g;
> + for (;; g = h()) {
> + int *i = &d.e[g];
> + asm("" : "=Q"(*i));
> + }
> +}
More information about the Gcc-patches
mailing list