This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[Patch/ARM] Cortex-M4 core pipeline patch to tune LDR/STR pairs
- From: "Terry Guo" <terry dot guo at arm dot com>
- To: <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 29 Mar 2013 17:59:52 +0800
- Subject: [Patch/ARM] Cortex-M4 core pipeline patch to tune LDR/STR pairs
Hello,
The attached pipeline patch intends to turn following code generation
ldr r5, [r4, #12]
adds r2, r2, #16
str r5, [r3, #8]
to
ldr r5, [r4, #12]
str r5, [r3, #8]
adds r2, r2, #16
The reason is that the STR can be started from the second cycle of its
preceding LDR which takes 2 cycles, as long as the result of LDR isn't used
as memory address of STR.
Tested with various benchmarks on Cortex-M4 MPS. Except one regression
caused by register allocation, the others either show performance
improvement or no change.
Is it OK to trunk?
BR,
Terry
2013-03-29 Terry Guo <terry.guo@arm.com>
* gcc/config/arm/cortex-m4.md: New bypass to tune LDR/STR
pairs.
From 19dd8bdc9a03f78690700ded911e0cee66328c01 Mon Sep 17 00:00:00 2001
From: Terry Guo <terry.guo@arm.com>
Date: Wed, 27 Mar 2013 17:23:09 +0800
Subject: [PATCH] improve m4 pipeline description
---
gcc/config/arm/cortex-m4.md | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/gcc/config/arm/cortex-m4.md b/gcc/config/arm/cortex-m4.md
index 187867b..47b0364 100644
--- a/gcc/config/arm/cortex-m4.md
+++ b/gcc/config/arm/cortex-m4.md
@@ -84,6 +84,10 @@
(eq_attr "type" "store4"))
"cortex_m4_ex*5")
+(define_bypass 1 "cortex_m4_load1"
+ "cortex_m4_store1_1,cortex_m4_store1_2"
+ "arm_no_early_store_addr_dep")
+
;; If the address of load or store depends on the result of the preceding
;; instruction, the latency is increased by one.
--
1.7.9.5