This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH][AArch64] Improve Cortex-A53 scheduling of int/fp transfers
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: nd <nd at arm dot com>
- Date: Tue, 10 Jan 2017 17:18:15 +0000
- Subject: [PATCH][AArch64] Improve Cortex-A53 scheduling of int/fp transfers
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
- Nodisclaimer: True
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
My previous change to the Cortex-A53 scheduler resulted in a 13% regression on a
proprietary benchmark. This turned out to be due to non-optimal scheduling of int
to float conversions. This patch separates int to FP transfers from int to float
conversions based on experiments to determine the best schedule. As a result of
these tweaks the performance of the benchmark improves by 20%.
ChangeLog:
2017-01-10 Wilco Dijkstra <wdijkstr@arm.com>
* config/arm/cortex-a53.md: Add bypasses for
cortex_a53_r2f_cvt.
(cortex_a53_r2f): Only use for transfers.
(cortex_a53_f2r): Likewise.
(cortex_a53_r2f_cvt): Add reservation for conversions.
(cortex_a53_f2r_cvt): Likewise.
--
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 14822ba0ac0532aaf0dd29cff7a87e32e745cbe8..b367ad403a4a641da34521c17669027b87092737 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -252,9 +252,18 @@
"cortex_a53_r2f")
(define_bypass 1 "cortex_a53_mul,
- cortex_a53_load*"
+ cortex_a53_load1,
+ cortex_a53_load2"
"cortex_a53_r2f")
+(define_bypass 2 "cortex_a53_alu*"
+ "cortex_a53_r2f_cvt")
+
+(define_bypass 3 "cortex_a53_mul,
+ cortex_a53_load1,
+ cortex_a53_load2"
+ "cortex_a53_r2f_cvt")
+
;; Model flag forwarding to branches.
(define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*"
@@ -514,16 +523,24 @@
;; Floating-point to/from core transfers.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-(define_insn_reservation "cortex_a53_r2f" 6
+(define_insn_reservation "cortex_a53_r2f" 2
(and (eq_attr "tune" "cortexa53")
- (eq_attr "type" "f_mcr,f_mcrr,f_cvti2f,
- neon_from_gp, neon_from_gp_q"))
- "cortex_a53_slot_any,nothing*2,cortex_a53_fp_alu")
+ (eq_attr "type" "f_mcr,f_mcrr"))
+ "cortex_a53_slot_any,cortex_a53_fp_alu")
+
+(define_insn_reservation "cortex_a53_f2r" 4
+ (and (eq_attr "tune" "cortexa53")
+ (eq_attr "type" "f_mrc,f_mrrc"))
+ "cortex_a53_slot_any,cortex_a53_fp_alu")
+
+(define_insn_reservation "cortex_a53_r2f_cvt" 4
+ (and (eq_attr "tune" "cortexa53")
+ (eq_attr "type" "f_cvti2f, neon_from_gp, neon_from_gp_q"))
+ "cortex_a53_slot_any,cortex_a53_fp_alu")
-(define_insn_reservation "cortex_a53_f2r" 6
+(define_insn_reservation "cortex_a53_f2r_cvt" 5
(and (eq_attr "tune" "cortexa53")
- (eq_attr "type" "f_mrc,f_mrrc,f_cvtf2i,
- neon_to_gp, neon_to_gp_q"))
+ (eq_attr "type" "f_cvtf2i, neon_to_gp, neon_to_gp_q"))
"cortex_a53_slot_any,cortex_a53_fp_alu")
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;