This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH/RFC] Simplify wrapped RTL op
- From: Robin Dapp <rdapp at linux dot ibm dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 27 Aug 2019 11:12:32 +0200
- Subject: [PATCH/RFC] Simplify wrapped RTL op
Hi,
as announced in the wrapped-binop gimple patch mail, on s390 we still
emit odd code in front of loops:
void v1 (unsigned long *in, unsigned long *out, unsigned int n)
{
int i;
for (i = 0; i < n; i++)
{
out[i] = in[i];
}
}
-->
aghi %r1,-8
srlg %r1,%r1,3
aghi %r1,1
This is created by doloop after getting niter from the loop as n - 1 or
"n * 8 - 8" with a step width of 8. Realizing s390's doloop pattern
compares against 1, we add 1 to niter resulting in the code above.
When going a similar route as with the gimple patch, something like
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 9359a3cdb4d..9c06c9b6ee9 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2364,6 +2364,24 @@ simplify_binary_operation_1 (enum rtx_code code,
machine_mode mode,
in1, in2));
}
+ /* Transform (plus (lshiftrt (plus A -C1) C2) C3) to (lshiftrt A C2)
+ if C1 == -C3 * (1 << C2). */
+ if (CONST_SCALAR_INT_P (op1)
+ && GET_CODE (op0) == LSHIFTRT
+ && CONST_SCALAR_INT_P (XEXP (op0, 1))
+ && GET_CODE (XEXP (op0, 0)) == PLUS
+ && CONST_SCALAR_INT_P (XEXP (XEXP (op0, 0), 1)))
+ {
+ rtx c3 = op1;
+ rtx c2 = XEXP (op0, 1);
+ rtx c1 = XEXP (XEXP (op0, 0), 1);
+
+ rtx a = XEXP (XEXP (op0, 0), 0);
+
+ if (-INTVAL (c3) * (1 << INTVAL (c2)) == INTVAL (c1))
+ return simplify_gen_binary (LSHIFTRT, mode, a, c2);
+ }
+
/* (plus (comparison A B) C) can become (neg (rev-comp A B)) if
C is 1 and STORE_FLAG_VALUE is -1 or if C is -1 and
STORE_FLAG_VALUE
is 1. */
helps immediately, yet overflow/range information is not considered. Do
we somehow guarantee that the niter-related we created until doloop do
not overflow? I did not note something when looking through the code.
Granted, the simplification seems oddly specific and is probably not
useful for a wide range of targets and situations.
Another approach would be to store "niter+1" (== n) when niter (== n-1)
is calculated and, when we need to do the increment, use the niter+1
that we already have without needing to simplify (n - 8) >> 3 + 1.
Any comments on this?
The patch above bootstraps and test suite is without regressions on s390
fwiw.
Regards
Robin