This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH][ARM] PR target/70473: Reduce size of Cortex-A8 automaton
- From: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>
- Date: Fri, 26 Aug 2016 11:14:03 +0100
- Subject: [PATCH][ARM] PR target/70473: Reduce size of Cortex-A8 automaton
- Authentication-results: sourceware.org; auth=none
Hi all,
The scheduling automata sizes are getting a bit out of control (as the PR complains about) and the Cortex-A8
one is one of the largest offenders. An easy, low-hanging fruit in dealing with this are some of the FP/NEON operations
that have very large reservation durations specified for them. They bloat the state space by quite a lot and it's not
likely that there is enough parallelism present in the program to fill the (for example) 64 cycles that are modelled
for the double-precision division. In the past we've dealt with this by decreasing the modelled reservation duration
to keep the state space down.
This patch does that for the cortex_a8_neon automaton and caps the reservation duration for a particular reservation
to 15 cycles. This should be plenty to demonstrate that these are high latency instructions.
With this patch the number of NDFA states is massively reduced by more than 70% (26796 -> 6020).
As I don't have access to reasonable Cortex-A8 hardware I benchmarked it on SPEC2000 on a Cortex-A15.
The idea (from Ramana) is that since Cortex-A8 tuning is the default tuning for armv7-a the patch shouldn't hurt
the more widely accessible Cortex-A15 targets. There were no regressions in performance there.
Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk?
Thanks,
Kyrill
2016-08-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
PR target/70473
* config/arm/cortex-a8-neon.md (cortex_a8_vfp_muld): Reduce
reservation duration to 15 cycles.
(cortex_a8_vfp_macs): Likewise.
(cortex_a8_vfp_macd): Likewise.
(cortex_a8_vfp_divs): Likewise.
(cortex_a8_vfp_divd): Likewise.
diff --git a/gcc/config/arm/cortex-a8-neon.md b/gcc/config/arm/cortex-a8-neon.md
index 45f861f6c6f840bd113e468eeec5373e06058f6d..b16c29974a7278e70d64dc83b5b388aebb51718b 100644
--- a/gcc/config/arm/cortex-a8-neon.md
+++ b/gcc/config/arm/cortex-a8-neon.md
@@ -357,30 +357,34 @@ (define_insn_reservation "cortex_a8_vfp_muls" 12
(eq_attr "type" "fmuls"))
"cortex_a8_vfp,cortex_a8_vfplite*11")
+;; Don't model a reservation for more than 15 cycles as this explodes the
+;; state space of the automaton for little gain. It is unlikely that the
+;; scheduler will find enough instructions to hide the full latency of the
+;; instructions.
(define_insn_reservation "cortex_a8_vfp_muld" 17
(and (eq_attr "tune" "cortexa8")
(eq_attr "type" "fmuld"))
- "cortex_a8_vfp,cortex_a8_vfplite*16")
+ "cortex_a8_vfp,cortex_a8_vfplite*15")
(define_insn_reservation "cortex_a8_vfp_macs" 21
(and (eq_attr "tune" "cortexa8")
(eq_attr "type" "fmacs,ffmas"))
- "cortex_a8_vfp,cortex_a8_vfplite*20")
+ "cortex_a8_vfp,cortex_a8_vfplite*15")
(define_insn_reservation "cortex_a8_vfp_macd" 26
(and (eq_attr "tune" "cortexa8")
(eq_attr "type" "fmacd,ffmad"))
- "cortex_a8_vfp,cortex_a8_vfplite*25")
+ "cortex_a8_vfp,cortex_a8_vfplite*15")
(define_insn_reservation "cortex_a8_vfp_divs" 37
(and (eq_attr "tune" "cortexa8")
(eq_attr "type" "fdivs, fsqrts"))
- "cortex_a8_vfp,cortex_a8_vfplite*36")
+ "cortex_a8_vfp,cortex_a8_vfplite*15")
(define_insn_reservation "cortex_a8_vfp_divd" 65
(and (eq_attr "tune" "cortexa8")
(eq_attr "type" "fdivd, fsqrtd"))
- "cortex_a8_vfp,cortex_a8_vfplite*64")
+ "cortex_a8_vfp,cortex_a8_vfplite*15")
;; Comparisons can actually take 7 cycles sometimes instead of four,
;; but given all the other instructions lumped into type=ffarith that