This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IA-64 speculation patches have bad impact on ARM


Daniel Jacobowitz wrote:
Hi Maxim and Vlad,

I just tracked an ICE while building glibc for ARM to this patch,
which introduced --param max-sched-extend-regions-iters with a default
of two:

http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00998.html

...


The register variables and their initializations get hoisted all the way out
of the first if.  On ia64, with a million execution units to spare and a
fat pipeline, this may make sense.  On targets with a simpler execution
model, though, it's pretty awful.  If the condition (which we have no
information on the likelihood of) is false, we've added lots of cycles for
no gain.  It's not like the scheduler was filling holes; the initializations
were scheduled as early as possible because they had no dependencies.

With the parameter turned back down to one, the testcase compiles, and the
code looks sensible again.  No, I wasn't able to work out why profiling was
necessary to trigger this problem; I suspect it makes some register
unavailable, but I'm not sure which.  I didn't look into that further.

What's your opinion?  We could easily change the default of the parameter
for ARM, but I assume there are other affected targets.  I don't know if we
need the extended region scheduling to be smarter, or if it should simply be
turned off for some targets.


Hi Daniel!


Sorry for the delay, I needed time to investigate the cause of the problem.

The real problem lies in the computation of the instruction priorities. ARM has fairly simple scheduling model: on each cycle insn standing first in the ready list gets scheduled. This behavior puts *all* the responsibility for the resulting schedule on how the ready list is arranged. The main decision factor of the sorting of the ready list is INSN_PRIORITY.

Instructions in inner 'if' get somewhat greater priority then the instructions from the dominator block and hence get hoisted from their original block. The good solution for this case would be more precise evaluation of the insn priorities. This includes transformation of the insn priority from the region-scope to the block-scope value: e.g. in this case, while scheduling the first block, the priorities of the insns from the 'if'-block will be multiplied by probability of the 'then'-branch and, therefore, will be significantly lower than the priority of the insns from the current block. I've started to implement this idea some time ago, but never finished :(

Anyway, this work is for stage 1 or 2 and for now I propose following fix: implement targetm.sched.reorder hook so that it will ensure that if there is an insn from the current block in the ready list, then insn from the other block won't stand first in the line (and, therefore, won't be chosen for schedule). I feel that this will be what you are calling 'filling holes'. Please find an example patch attached (arm.patch).

While debugging the testcase I found two somewhat unrelated bugs in the handling of INSN_PRIORITY: first one is in the haifa-sched.c: priority (). When insn has no forward dependencies its priority is set to its latency. The bug occurs when insn has some deps and all of them get rejected by current_sched_info->contributes_to_priority () hook - in this case INSN_PRIORITY should also be initialized with insn latency, but present code misses that.

The second one is not as critical as the first one. It is in haifa-sched.c: adjust_priority (). This function plainly calls the targetm.sched_adjust_priority () hook when insn is being added to the ready list. As I understand all targets assume this hook to be invoked once: after all priorities are are computed, but before insn is added to the ready list. But for insns with no dependencies from the source blocks this hook can be called many times - therefore priorities of that insns can become sensibly inadequate.

The patch for these two small bugs is also attached (priority-bugs.patch) . Is it ok for trunk? If so I will repost it to gcc-patches list.


Best regards, Maxim
--- config/arm/arm.c	(/gcc-local/trunk/gcc)	(revision 19877)
+++ config/arm/arm.c	(/gcc-local/arm-bug/gcc)	(revision 19877)
@@ -52,6 +52,7 @@
 #include "target-def.h"
 #include "debug.h"
 #include "langhooks.h"
+#include "sched-int.h"
 
 /* Forward definitions of types.  */
 typedef struct minipool_node    Mnode;
@@ -118,6 +119,9 @@ static void thumb_output_function_prolog
 static int arm_comp_type_attributes (tree, tree);
 static void arm_set_default_type_attributes (tree);
 static int arm_adjust_cost (rtx, rtx, rtx, int);
+static void arm_reorder (rtx *, int);
+static int arm_reorder1 (FILE *, int, rtx *, int *, int);
+static int arm_reorder2 (FILE *, int, rtx *, int *, int);
 static int count_insns_for_constant (HOST_WIDE_INT, int);
 static int arm_get_strip_length (int);
 static bool arm_function_ok_for_sibcall (tree, tree);
@@ -245,6 +249,12 @@ static bool arm_tls_symbol_p (rtx x);
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST arm_adjust_cost
 
+#undef TARGET_SCHED_REORDER
+#define TARGET_SCHED_REORDER arm_reorder1
+
+#undef TARGET_SCHED_REORDER2
+#define TARGET_SCHED_REORDER2 arm_reorder2
+
 #undef TARGET_ENCODE_SECTION_INFO
 #ifdef ARM_PE
 #define TARGET_ENCODE_SECTION_INFO  arm_pe_encode_section_info
@@ -5229,6 +5239,50 @@ arm_adjust_cost (rtx insn, rtx link, rtx
   return cost;
 }
 
+static void
+arm_reorder (rtx *ready, int n_ready)
+{
+  if (n_ready > 1)
+    {
+      /* This is correct for sched-rgn.c only.  */
+      basic_block bb = BLOCK_FOR_INSN (current_sched_info->prev_head);
+
+      if (BLOCK_FOR_INSN (ready[n_ready - 1]) != bb)
+        {
+          int i;
+
+          for (i = n_ready - 1; i >= 0; i--)
+            {
+              rtx insn = ready[i];
+
+              if (BLOCK_FOR_INSN (insn) != bb)
+                continue;
+
+              memcpy (ready + i, ready + i + 1,
+                      (n_ready - i - 1) * sizeof (*ready));
+              ready[n_ready - 1] = insn;
+              break;
+            }
+        }
+    }
+}
+
+static int
+arm_reorder1 (FILE *dump ATTRIBUTE_UNUSED, int sched_verbose ATTRIBUTE_UNUSED,
+              rtx *ready, int *pn_ready, int clock_var ATTRIBUTE_UNUSED)
+{
+  arm_reorder (ready, *pn_ready);
+  return 1;
+}
+
+static int
+arm_reorder2 (FILE *dump ATTRIBUTE_UNUSED, int sched_verbose ATTRIBUTE_UNUSED,
+              rtx *ready, int *pn_ready, int clock_var ATTRIBUTE_UNUSED)
+{
+  arm_reorder (ready, *pn_ready);
+  return 0;
+}
+
 static int fp_consts_inited = 0;
 
 /* Only zero is valid for VFP.  Other values are also valid for FPA.  */
2006-05-30  Maxim Kuvyrkov  <mkuvyrkov@ispras.ru>

	* haifa-sched.c (priority): Set INSN_PRIORITY to INSN_COST if all
	dependencies of the insn are being ignored.
	(adjust_priority): Don't adjust priority twice.
	* sched-int.h (haifa_insn_data.priority_adjusted): New bitfield.
	(INSN_PRIORITY_ADJUSTED): New access-macro.
--- haifa-sched.c	(/gcc-local/trunk/gcc)	(revision 19877)
+++ haifa-sched.c	(/gcc-local/arm-bug/gcc)	(revision 19877)
@@ -751,6 +751,8 @@ priority (rtx insn)
 
 	  do
 	    {
+              bool priority_inited_p = false;
+
 	      for (link = INSN_DEPEND (twin); link; link = XEXP (link, 1))
 		{
 		  rtx next;
@@ -785,9 +787,14 @@ priority (rtx insn)
 
 		      if (next_priority > this_priority)
 			this_priority = next_priority;
+
+                      priority_inited_p = true;
 		    }
 		}
 	      
+              if (!priority_inited_p)
+                this_priority = insn_cost (insn, 0, 0);                
+
 	      twin = PREV_INSN (twin);
 	    }
 	  while (twin != prev_first);
@@ -1110,9 +1117,13 @@ adjust_priority (rtx prev)
 
      Revisit when we have a machine model to work with and not before.  */
 
-  if (targetm.sched.adjust_priority)
-    INSN_PRIORITY (prev) =
-      targetm.sched.adjust_priority (prev, INSN_PRIORITY (prev));
+  if (targetm.sched.adjust_priority
+      && !INSN_PRIORITY_ADJUSTED (prev))
+    {
+      INSN_PRIORITY (prev) =
+        targetm.sched.adjust_priority (prev, INSN_PRIORITY (prev));
+      INSN_PRIORITY_ADJUSTED (prev) = 1;
+    }
 }
 
 /* Advance time on one cycle.  */
@@ -4478,6 +4489,7 @@ clear_priorities (rtx insn)
       if (INSN_PRIORITY_KNOWN (pro))
 	{
 	  INSN_PRIORITY_KNOWN (pro) = 0;
+          INSN_PRIORITY_ADJUSTED (pro) = 0;
 	  clear_priorities (pro);
 	}
     }
--- sched-int.h	(/gcc-local/trunk/gcc)	(revision 19877)
+++ sched-int.h	(/gcc-local/arm-bug/gcc)	(revision 19877)
@@ -317,6 +317,9 @@ struct haifa_insn_data
   /* Nonzero if priority has been computed already.  */
   unsigned int priority_known : 1;
 
+  /* Nonzero if priority has been adjusted already.  */
+  unsigned int priority_adjusted : 1;
+
   /* Nonzero if instruction has internal dependence
      (e.g. add_dependence was invoked with (insn == elem)).  */
   unsigned int has_internal_dep : 1;
@@ -350,6 +353,7 @@ extern regset *glat_start, *glat_end;
 #define INSN_DEP_COUNT(INSN)	(h_i_d[INSN_UID (INSN)].dep_count)
 #define INSN_PRIORITY(INSN)	(h_i_d[INSN_UID (INSN)].priority)
 #define INSN_PRIORITY_KNOWN(INSN) (h_i_d[INSN_UID (INSN)].priority_known)
+#define INSN_PRIORITY_ADJUSTED(INSN) (h_i_d[INSN_UID (INSN)].priority_adjusted)
 #define INSN_COST(INSN)		(h_i_d[INSN_UID (INSN)].cost)
 #define INSN_REG_WEIGHT(INSN)	(h_i_d[INSN_UID (INSN)].reg_weight)
 #define HAS_INTERNAL_DEP(INSN)  (h_i_d[INSN_UID (INSN)].has_internal_dep)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]