fix scheduling antideps

Mike Stump mikestump@comcast.net
Fri Dec 11 08:13:00 GMT 2015


This patch allows a target to increase the cost of anti-deps to better reflect the actual cost on the machine.

This gets me get 5% more performance on an important inner loop by exposing the actual cost of long dep chains that have lots of anti-deps in them.  Be scheduling the longer chain first, we have more opportunities to fill in the holes with content from the other less critical chains.

I’m unsure if all machines should have a cost of 1, or just some machines.  I suspect that OOO can hide the del chains well enough so that the value 0 is more appropriate.

Ok?


Index: defaults.h
===================================================================
--- defaults.h  (revision 231539)
+++ defaults.h  (working copy)
@@ -1486,6 +1486,10 @@
 #define TARGET_VTABLE_USES_DESCRIPTORS 0
 #endif
 
+#ifndef TARGET_ANTI_DEP_COST
+#define TARGET_ANTI_DEP_COST 0
+#endif
+
 #endif /* GCC_INSN_FLAGS_H  */
 
 #endif  /* ! GCC_DEFAULTS_H */
Index: doc/tm.texi
===================================================================
--- doc/tm.texi (revision 231539)
+++ doc/tm.texi (working copy)
@@ -6970,6 +6970,13 @@
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@defmac TARGET_ANTI_DEP_COST
+The cost in cycles for an anti-dependency.  Defaults to 0.  On non-OOO
+multi-issue machines that can't issue instructions that have
+overlapping registers in the same cycle, a value of 1 will better
+reflect the actual cost of the instruction sequence.
+@end defmac
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in      (revision 231539)
+++ doc/tm.texi.in      (working copy)
@@ -4852,6 +4852,13 @@
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@defmac TARGET_ANTI_DEP_COST
+The cost in cycles for an anti-dependency.  Defaults to 0.  On non-OOO
+multi-issue machines that can't issue instructions that have
+overlapping registers in the same cycle, a value of 1 will better
+reflect the actual cost of the instruction sequence.
+@end defmac
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
Index: haifa-sched.c
===================================================================
--- haifa-sched.c       (revision 231539)
+++ haifa-sched.c       (working copy)
@@ -1470,7 +1470,7 @@
       if (INSN_CODE (insn) >= 0)
        {
          if (dep_type == REG_DEP_ANTI)
-           cost = 0;
+           cost = TARGET_ANTI_DEP_COST;
          else if (dep_type == REG_DEP_OUTPUT)
            {
              cost = (insn_default_latency (insn)



More information about the Gcc-patches mailing list