This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][ARM] PR target/70473: Reduce size of Cortex-A8 automaton


Hi all,

The scheduling automata sizes are getting a bit out of control (as the PR complains about) and the Cortex-A8
one is one of the largest offenders. An easy, low-hanging fruit in dealing with this are some of the FP/NEON operations
that have very large reservation durations specified for them. They bloat the state space by quite a lot and it's not
likely that there is enough parallelism present in the program to fill the (for example) 64 cycles that are modelled
for the double-precision division. In the past we've dealt with this by decreasing the modelled reservation duration
to keep the state space down.

This patch does that for the cortex_a8_neon automaton and caps the reservation duration for a particular reservation
to 15 cycles. This should be plenty to demonstrate that these are high latency instructions.
With this patch the number of NDFA states is massively reduced by more than 70% (26796 -> 6020).

As I don't have access to reasonable Cortex-A8 hardware I benchmarked it on SPEC2000 on a Cortex-A15.
The idea (from Ramana) is that since Cortex-A8 tuning is the default tuning for armv7-a the patch shouldn't hurt
the more widely accessible Cortex-A15 targets. There were no regressions in performance there.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk?

Thanks,
Kyrill

2016-08-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    PR target/70473
    * config/arm/cortex-a8-neon.md (cortex_a8_vfp_muld): Reduce
    reservation duration to 15 cycles.
    (cortex_a8_vfp_macs): Likewise.
    (cortex_a8_vfp_macd): Likewise.
    (cortex_a8_vfp_divs): Likewise.
    (cortex_a8_vfp_divd): Likewise.
diff --git a/gcc/config/arm/cortex-a8-neon.md b/gcc/config/arm/cortex-a8-neon.md
index 45f861f6c6f840bd113e468eeec5373e06058f6d..b16c29974a7278e70d64dc83b5b388aebb51718b 100644
--- a/gcc/config/arm/cortex-a8-neon.md
+++ b/gcc/config/arm/cortex-a8-neon.md
@@ -357,30 +357,34 @@ (define_insn_reservation "cortex_a8_vfp_muls" 12
        (eq_attr "type" "fmuls"))
   "cortex_a8_vfp,cortex_a8_vfplite*11")
 
+;; Don't model a reservation for more than 15 cycles as this explodes the
+;; state space of the automaton for little gain.  It is unlikely that the
+;; scheduler will find enough instructions to hide the full latency of the
+;; instructions.
 (define_insn_reservation "cortex_a8_vfp_muld" 17
   (and (eq_attr "tune" "cortexa8")
        (eq_attr "type" "fmuld"))
-  "cortex_a8_vfp,cortex_a8_vfplite*16")
+  "cortex_a8_vfp,cortex_a8_vfplite*15")
 
 (define_insn_reservation "cortex_a8_vfp_macs" 21
   (and (eq_attr "tune" "cortexa8")
        (eq_attr "type" "fmacs,ffmas"))
-  "cortex_a8_vfp,cortex_a8_vfplite*20")
+  "cortex_a8_vfp,cortex_a8_vfplite*15")
 
 (define_insn_reservation "cortex_a8_vfp_macd" 26
   (and (eq_attr "tune" "cortexa8")
        (eq_attr "type" "fmacd,ffmad"))
-  "cortex_a8_vfp,cortex_a8_vfplite*25")
+  "cortex_a8_vfp,cortex_a8_vfplite*15")
 
 (define_insn_reservation "cortex_a8_vfp_divs" 37
   (and (eq_attr "tune" "cortexa8")
        (eq_attr "type" "fdivs, fsqrts"))
-  "cortex_a8_vfp,cortex_a8_vfplite*36")
+  "cortex_a8_vfp,cortex_a8_vfplite*15")
 
 (define_insn_reservation "cortex_a8_vfp_divd" 65
   (and (eq_attr "tune" "cortexa8")
        (eq_attr "type" "fdivd, fsqrtd"))
-  "cortex_a8_vfp,cortex_a8_vfplite*64")
+  "cortex_a8_vfp,cortex_a8_vfplite*15")
 
 ;; Comparisons can actually take 7 cycles sometimes instead of four,
 ;; but given all the other instructions lumped into type=ffarith that

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]