This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, ARM] Zero latency between compare and branch on Cortex-A8


Hi,

We've found that ARM Cortex-A8 backend assumes 2 cycle latency between compare and branch instructions, while according to Technical Reference Manual these instructions can be issued on the same cycle:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/Babcagee.html
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/Babefjfb.html


The attached patch adds bypass and predicate to implement zero latency between instructions that set flags and branches. This mainly affects latencies and priorities for instruction scheduling. We have tested this fix on libevas rasterization library. Though with haifa scheduler impact is minimal (about +0.5%), with selective scheduling there's speedup of 3%.

Ok for trunk?


-- Best regards, Dmitry

gcc/

2010-12-27  Dmitry Melnik  <dm@ispras.ru>

	* config/arm/arm-protos.h (arm_producer_sets_flags): Declare.
	* config/arm/arm.c (arm_producer_sets_flags): New function.
	* config/arm/cortex-a8.md: Add compare-branch bypass.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 59e1c50..c659461 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -90,6 +90,7 @@ extern int arm_no_early_alu_shift_dep (rtx, rtx);
 extern int arm_no_early_alu_shift_value_dep (rtx, rtx);
 extern int arm_no_early_mul_dep (rtx, rtx);
 extern int arm_mac_accumulator_is_mul_result (rtx, rtx);
+extern int arm_producer_sets_flags (rtx, rtx);
 
 extern int tls_mentioned_p (rtx);
 extern int symbol_mentioned_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f0c9c29..2500cd8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -21600,6 +21600,14 @@ arm_no_early_mul_dep (rtx producer, rtx consumer)
   return 0;
 }
 
+/* Returns whether PRODUCER sets flags register.  */
+
+int
+arm_producer_sets_flags (rtx producer, rtx consumer)
+{
+  return get_attr_conds (producer) == CONDS_SET;
+}
+
 /* We can't rely on the caller doing the proper promotion when
    using APCS or ATPCS.  */
 
diff --git a/gcc/config/arm/cortex-a8.md b/gcc/config/arm/cortex-a8.md
index 8ac754e..d657456 100644
--- a/gcc/config/arm/cortex-a8.md
+++ b/gcc/config/arm/cortex-a8.md
@@ -261,6 +261,13 @@
        (eq_attr "type" "branch"))
   "cortex_a8_branch")
 
+;; ALU instruction sets flags on E2, while branch requires them on E3,
+;; so these insns can be issued on the same cycle.
+(define_bypass 0
+    "cortex_a8_alu"
+    "cortex_a8_branch"
+    "arm_producer_sets_flags")
+
 ;; Call latencies are not predictable.  A semi-arbitrary very large
 ;; number is used as "positive infinity" so that everything should be
 ;; finished by the time of return.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]