This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Tweak ALAP calculation in SCHED_PRESSURE_MODEL

From: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
To: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Cc: Richard Sandiford <richard dot sandiford at arm dot com>
Date: Thu, 08 Nov 2018 12:10:33 +0000
Subject: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

<Sending this on behalf of Richard Sandiford>

This patch fixes a flaw in the relationship between the way that
SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses
them in model_order_p.  A comment in model_order_p says:

  /* Combine the length of the longest path of satisfied true dependencies
     that leads to each instruction (depth) with the length of the longest
     path of any dependencies that leads from the instruction (alap).
     Prefer instructions with the greatest combined length.  If the combined
     lengths are equal, prefer instructions with the greatest depth.

     The idea is that, if we have a set S of "equal" instructions that each
     have ALAP value X, and we pick one such instruction I, any true-dependent
     successors of I that have ALAP value X - 1 should be preferred over S.
     This encourages the schedule to be "narrow" rather than "wide".
     However, if I is a low-priority instruction that we decided to
     schedule because of its model_classify_pressure, and if there
     is a set of higher-priority instructions T, the aforementioned
     successors of I should not have the edge over T.  */

The expectation was that scheduling an instruction X would give a
greater priority to the highest-priority successor instructions Y than
X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1,
giving an equal combined height, but with the greater depth winning as
a tie-breaker. But this doesn't work if the alap value was ultimately
determined by an anti-dependence.

This is particularly bad when --param max-pending-list-length kicks in,
since we then start adding fake anti-dependencies in order to keep the
list length down.  These fake dependencies tend to be on the critical
path.

The attached patch avoids that by making the alap calculation only
look at true dependencies.  This shouldn't be too bad, since we use
INSN_PRIORITY as the final tie-breaker than that does take
anti-dependencies into account.

This reduces the number of spills in the hot function from 436.cactusADM
by 14% on aarch64 at -O3 (and the number of instructions in general).
SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall).
Thanks to Wilco for the benchmarking.

Bootstrapped and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks,
Kyrill

2018-11-08  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
    * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's
    ALAP if the dependence is a true dependence.

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 1fdc9df9fb26f23758ec8326cec91eecc4c917c1..01825de440c2e818eceab5ab7411b20b05ee54f1 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -3504,8 +3504,10 @@ model_analyze_insns (void)
 	FOR_EACH_DEP (iter, SD_LIST_FORW, sd_it, dep)
 	  {
 	    con = MODEL_INSN_INFO (DEP_CON (dep));
-	    if (con->insn && insn->alap < con->alap + 1)
-	      insn->alap = con->alap + 1;
+	    unsigned int min_alap
+	      = con->alap + (DEP_TYPE (dep) == REG_DEP_TRUE);
+	    if (con->insn && insn->alap < min_alap)
+	      insn->alap = min_alap;
 	  }
 
 	insn->old_queue = QUEUE_INDEX (iter);

Follow-Ups:
- Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
  - From: Jeff Law
- Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
  - From: Pat Haugen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]