[PATCH] lra: Canonicalize mult to shift in address reloads
Alex Coplan
alex.coplan@arm.com
Tue Aug 25 10:18:55 GMT 2020
Hello,
Inside a (mem) RTX, it is canonical to write multiplications by powers
of two using a (mult) [0]. For example, given the following C function:
long f(long *p, long x)
{
return p[x];
}
AArch64 GCC generates the following RTL insn (in final):
(set (reg/i:DI 0 x0)
(mem:DI (plus:DI (mult:DI (reg:DI 1 x1 [101])
(const_int 8 [0x8]))
(reg:DI 0 x0 [100])) [1 *_3+0 S8 A64]))
However, outside of a (mem), the canonical way to write multiplications
by powers of two is using (ashift). E.g. for the following C function:
long *g(long *p, long x)
{
return p + x;
}
AArch64 GCC generates:
(set (reg/i:DI 0 x0)
(plus:DI (ashift:DI (reg:DI 1 x1 [100])
(const_int 3 [0x3]))
(reg:DI 0 x0 [99])))
Now I observed that LRA does not quite respect this RTL canonicalization
rule. When compiling gcc/testsuite/gcc.dg/torture/pr34330.c with -Os
-ftree-vectorize, the RTL in the dump "281r.ira" has the insn:
(set (reg:SI 111 [ MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B] ])
(mem:SI (plus:DI (mult:DI (reg:DI 101 [ ivtmp.9 ])
(const_int 4 [0x4]))
(reg/v/f:DI 105 [ b ])) [2 MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B]+0 S4 A32]))
but LRA then proceeds to generate a reload, and we get the following
non-canonical insn in "282r.reload":
(set (reg:DI 7 x7 [121])
(plus:DI (mult:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
(const_int 4 [0x4]))
(reg/v/f:DI 1 x1 [orig:105 b ] [105])))
This patch fixes LRA to ensure that we generate canonical RTL in this
case. After the patch, we get the following insn in "282r.reload":
(set (reg:DI 7 x7 [121])
(plus:DI (ashift:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
(const_int 2 [0x2]))
(reg/v/f:DI 1 x1 [orig:105 b ] [105])))
The motivation here is to be able to remove several redundant patterns
in the AArch64 backend. See the previous thread [1] for context.
Testing:
* Bootstrapped and regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu.
* New unit test which checks that we're using the shift pattern (rather
than the problematic mult pattern) on AArch64.
OK for master?
Thanks,
Alex
[0] : https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html
[1] : https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552066.html
---
gcc/ChangeLog:
* lra-constraints.c (canonicalize_reload_addr): New.
(curr_insn_transform): Use canonicalize_reload_addr to ensure we
generate canonical RTL for an address reload.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/mem-shift-canonical.c: New test.
-------------- next part --------------
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 421c453997b..630a4f167cd 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -570,6 +570,34 @@ init_curr_insn_input_reloads (void)
curr_insn_input_reloads_num = 0;
}
+/* The canonical form of an rtx inside a MEM is not necessarily the same as the
+ canonical form of the rtx outside the MEM. Fix this up in the case that
+ we're reloading an address (and therefore pulling it outside a MEM). */
+static rtx canonicalize_reload_addr (rtx addr)
+{
+ const enum rtx_code code = GET_CODE (addr);
+
+ if (code == PLUS)
+ {
+ rtx inner = XEXP (addr, 0);
+ if (GET_CODE (inner) == MULT && CONST_INT_P (XEXP (inner, 1)))
+ {
+ const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
+ const int pwr2 = exact_log2 (ci);
+ if (pwr2 > 0)
+ {
+ /* Rewrite this to use a shift instead, which is canonical
+ when outside of a MEM. */
+ XEXP (addr, 0) = gen_rtx_ASHIFT (GET_MODE (inner),
+ XEXP (inner, 0),
+ GEN_INT (pwr2));
+ }
+ }
+ }
+
+ return addr;
+}
+
/* Create a new pseudo using MODE, RCLASS, ORIGINAL or reuse already
created input reload pseudo (only if TYPE is not OP_OUT). Don't
reuse pseudo if IN_SUBREG_P is true and the reused pseudo should be
@@ -4362,12 +4390,19 @@ curr_insn_transform (bool check_only_p)
{
rtx addr = *loc;
enum rtx_code code = GET_CODE (addr);
-
+ bool align_p = false;
+
if (code == AND && CONST_INT_P (XEXP (addr, 1)))
- /* (and ... (const_int -X)) is used to align to X bytes. */
- addr = XEXP (*loc, 0);
+ {
+ /* (and ... (const_int -X)) is used to align to X bytes. */
+ align_p = true;
+ addr = XEXP (*loc, 0);
+ }
+ else
+ addr = canonicalize_reload_addr (addr);
+
lra_emit_move (new_reg, addr);
- if (addr != *loc)
+ if (align_p)
emit_move_insn (new_reg, gen_rtx_AND (GET_MODE (new_reg), new_reg, XEXP (*loc, 1)));
}
before = get_insns ();
diff --git a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
new file mode 100644
index 00000000000..36beed497a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
@@ -0,0 +1,27 @@
+/* This test is a copy of gcc.dg/torture/pr34330.c: here we are looking for
+ specific patterns being matched in the AArch64 backend. */
+
+/* { dg-do compile } */
+/* { dg-options "-Os -ftree-vectorize -dp" } */
+
+
+struct T
+{
+ int t;
+ struct { short s1, s2, s3, s4; } *s;
+};
+
+void
+foo (int *a, int *b, int *c, int *d, struct T *e)
+{
+ int i;
+ for (i = 0; i < e->t; i++)
+ {
+ e->s[i].s1 = a[i];
+ e->s[i].s2 = b[i];
+ e->s[i].s3 = c[i];
+ e->s[i].s4 = d[i];
+ }
+}
+
+/* { dg-final { scan-assembler-times "add_lsl_di" 3 } } */
More information about the Gcc-patches
mailing list