This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH][RTL-ifcvt] Improve conditional select ops on immediates
- From: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 29 Jul 2015 14:49:31 +0100
- Subject: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
- Authentication-results: sourceware.org; auth=none
Hi all,
This patch improves RTL if-conversion on sequences that perform a conditional select on integer constants.
Most of the smart logic to do that already exists in the noce_try_store_flag_constants function.
However, currently that function is tried after noce_try_cmove.
noce_try_cmove is a simple catch-all function that just loads the two immediates and performs a conditional
select between them. It returns true and then the caller noce_process_if_block doesn't try any other transformations,
completely skipping the more aggressive transformations that noce_try_store_flag_constants allows!
Calling noce_try_store_flag_constants before noce_try_cmove allows for the smarter if-conversion transformations
to be used. An example that occurs a lot in the gcc code itself is for the C code:
int
foo (int a, int b)
{
return ((a & (1 << 25)) ? 5 : 4);
}
i.e. test a bit in a and return 5 or 4. Currently on aarch64 this generates the naive:
and w2, w0, 33554432 // mask away all bits except bit 25
mov w1, 4
cmp w2, wzr
mov w0, 5
csel w0, w0, w1, ne
whereas with this patch this can be transformed into the much better:
ubfx x0, x0, 25, 1 // extract bit 25
add w0, w0, 4
Another issue I encountered is that the code that claims to perform the transformation:
/* if (test) x = 3; else x = 4;
=> x = 3 + (test == 0); */
doesn't seem to do exactly that in all cases. In fact for that case it will try something like:
x = 4 - (test == 0)
which is suboptimal for targets like aarch64 which have a conditional increment operation.
This patch tweaks that code to always try to generate an addition of the condition rather than
a subtraction.
Anyway, for code:
int
fooinc (int x)
{
return x ? 1025 : 1026;
}
we currently generate:
mov w2, 1025
mov w1, 1026
cmp w0, wzr
csel w0, w2, w1, ne
whereas with this patch we will generate:
cmp w0, wzr
cset w0, eq
add w0, w0, 1025
Bootstrapped and tested on arm, aarch64, x86_64.
Ok for trunk?
Thanks,
Kyrill
P.S. noce_try_store_flag_constants is currently gated on !targetm.have_conditional_execution () but I don't see
any reason to restrict it on targets with conditional execution. For example, I think the first example above
would show a benefit on arm if it was enabled there. But that can be a separate investigation.
2015-07-29 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
-STORE_FLAG and condition is reversable. Prefer to add to the
flag value.
(noce_process_if_block): Try noce_try_store_flag_constants before
noce_try_cmove.
2015-07-29 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* gcc.target/aarch64/csel_bfx_1.c: New test.
* gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date: Tue Jul 28 14:59:46 2015 +0100
[RTL-ifcvt] Improve conditional increment ops on immediates
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a57d78c..80d0285 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
reversep = 0;
if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
- normalize = 0;
+ normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
else if (ifalse == 0 && exact_log2 (itrue) >= 0
&& (STORE_FLAG_VALUE == 1
|| if_info->branch_cost >= 2))
@@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
=> x = 3 + (test == 0); */
if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
{
- target = expand_simple_binop (mode,
- (diff == STORE_FLAG_VALUE
- ? PLUS : MINUS),
- gen_int_mode (ifalse, mode), target,
+ rtx_code code = reversep ? PLUS :
+ (diff == STORE_FLAG_VALUE ? PLUS
+ : MINUS);
+ HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
+
+ target = expand_simple_binop (mode, code,
+ gen_int_mode (to_add, mode), target,
if_info->x, 0, OPTAB_WIDEN);
}
@@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info)
goto success;
if (noce_try_abs (if_info))
goto success;
+ if (!targetm.have_conditional_execution ()
+ && noce_try_store_flag_constants (if_info))
+ goto success;
if (HAVE_conditional_move
&& noce_try_cmove (if_info))
goto success;
if (! targetm.have_conditional_execution ())
{
- if (noce_try_store_flag_constants (if_info))
- goto success;
if (noce_try_addcc (if_info))
goto success;
if (noce_try_store_flag_mask (if_info))
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
new file mode 100644
index 0000000..c20597f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+int
+foo (int a, int b)
+{
+ return ((a & (1 << 25)) ? 5 : 4);
+}
+
+/* { dg-final { scan-assembler "ubfx\t\[xw\]\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
new file mode 100644
index 0000000..2ae434d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+
+extern void abort (void);
+
+int
+fooinc (int x)
+{
+ if (x)
+ return 1025;
+ else
+ return 1026;
+}
+
+int
+fooinc2 (int x)
+{
+ if (x)
+ return 1026;
+ else
+ return 1025;
+}
+
+int
+main (void)
+{
+ if (fooinc (0) != 1026)
+ abort ();
+
+ if (fooinc (1) != 1025)
+ abort ();
+
+ if (fooinc2 (0) != 1025)
+ abort ();
+
+ if (fooinc2 (1) != 1026)
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */