This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH, ARM] Fix two IT issues
- From: "Zhenqiang Chen" <zhenqiang dot chen at arm dot com>
- To: <gcc-patches at gcc dot gnu dot org>
- Cc: "Richard Earnshaw" <Richard dot Earnshaw at arm dot com>, "Ramana Radhakrishnan" <Ramana dot Radhakrishnan at arm dot com>
- Date: Tue, 14 Jan 2014 17:06:22 +0800
- Subject: [PATCH, ARM] Fix two IT issues
- Authentication-results: sourceware.org; auth=none
Hi,
The patch fixes two IT issues:
1) IT block is splitted for A15 (and Cortex-M).
For A15 tune, max cond insns (max_insns_skipped) is 2, which is set as the
maximum allowed insns in an IT block (see thumb2_final_prescan_insn). So IT
which has 3 or 4 instructions, will be splitted. Take the first if-then-else
in the test case of the patch as example, it will generate ITs like:
cmp r0, #10
ite gt
subgt r0, r0, r1
suble r0, r1, r0
ite gt
addgt r0, r0, #10
suble r0, r0, #7
It does not make sense to split the IT. For cortex-m, the IT can not be
folded if the previous insn is 4 BYTES.
2) For arm_v7m_tune, max cond insns is not aligned with its branch cost. In
ifcvt.c, the relation between MAX_CONDITIONAL_EXECUTE (max cond insns) and
branch cost is:
#ifndef MAX_CONDITIONAL_EXECUTE
#define MAX_CONDITIONAL_EXECUTE \
(BRANCH_COST (optimize_function_for_speed_p (cfun), false) \
+ 1)
#endif
So the best value of max cond insns for v7m_tune should be 2.
Bootstrap and no make check regression on ARM Chrome book. No make check
regression for Cortex-M3.
Cortex-M4 O2 performance changes on coremark, dhrystone and eembc-v1:
coremark: -0.11%
dhrystone: 1.26%.
a2time01_lite: 2.63%
canrdr01_lite: 4.27%
iirflt01_lite: 6.51%
rspeed01_lite: 6.51%
dither01_lite: 7.36%
The biggest regression in eembc-v1 is pntrch01_lite: -0.51%
All other regressions < 0.1%
Cortex-M4 O3 performance changes are similar with O2, except one regression
due to loop alignment change.
OK for trunk?
Thanks!
-Zhenqiang
2014-01-14 Zhenqiang Chen <zhenqiang.chen@arm.com>
* config/arm/arm.c (arm_v7m_tune): Set max_insns_skipped to 2.
(thumb2_final_prescan_insn): Set max to MAX_INSN_PER_IT_BLOCK.
testsuite/ChangeLog:
2014-01-14 Zhenqiang Chen <zhenqiang.chen@arm.com>
* gcc.target/arm/its.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 39d23cc..8751675 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1696,7 +1696,7 @@ const struct tune_params arm_v7m_tune =
&v7m_extra_costs,
NULL, /* Sched adj cost.
*/
1, /* Constant limit. */
- 5, /* Max cond insns. */
+ 2, /* Max cond insns. */
ARM_PREFETCH_NOT_BENEFICIAL,
true, /* Prefer constant
pool. */
arm_cortex_m_branch_cost,
@@ -22131,11 +22131,11 @@ thumb2_final_prescan_insn (rtx insn)
int mask;
int max;
- /* Maximum number of conditionally executed instructions in a block
- is minimum of the two max values: maximum allowed in an IT block
- and maximum that is beneficial according to the cost model and tune.
*/
- max = (max_insns_skipped < MAX_INSN_PER_IT_BLOCK) ?
- max_insns_skipped : MAX_INSN_PER_IT_BLOCK;
+ /* max_insns_skipped in the tune was already taken into account in the
+ cost model of ifcvt pass when generating COND_EXEC insns. At this
stage
+ just emit the IT blocks as we can. It does not make sense to split
+ the IT blocks. */
+ max = MAX_INSN_PER_IT_BLOCK;
/* Remove the previous insn from the count of insns to be output. */
if (arm_condexec_count)
diff --git a/gcc/testsuite/gcc.target/arm/its.c
b/gcc/testsuite/gcc.target/arm/its.c
new file mode 100644
index 0000000..8eecdf4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/its.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+int test (int a, int b)
+{
+ int r;
+ if (a > 10)
+ {
+ r = a - b;
+ r += 10;
+ }
+ else
+ {
+ r = b - a;
+ r -= 7;
+ }
+ if (r > 0)
+ r -= 3;
+ return r;
+}
+/* { dg-final { scan-assembler-times "\tit" 2 { target arm_thumb2 } } } */