This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Patch] Undermine the jump threading cost model to fix PR77445.


Hi,

As mentioned in PR77445, the improvements to the jump threading cost model
this year have caused substantial regressions in the amount of jump threading
we do and the performance of workloads which rely on that threading.

This patch represents the low-bar in fixing the performance issues reported
in PR77445 - by weakening the cost model enough that we thread in a way much
closer to GCC 6. I don't think this patch is likely to be acceptable for
trunk, but I'm posting it for consideration regardless. 

Under the new cost model, if the edge doesn't pass optimize_edge_for_speed_p,
then we don't thread. The problem in late threading is bad edge profile
data makes the edge look cold, and thus it fails optimize_edge_for_speed_p
and is no longer considered a candidate for threading. As an aside, I think
this is the wrong cost model for jump threading, where you get the most
impact if you can resolve unpredictable switch statements - which by their
nature may have multiple cold edges in need of threading.

Early threading should avoid these issues, as there is no edge profile
info yet. optimize_edge_for_speed_p is therefore more likely to hold, but
the condition for threading is:

  if (speed_p && optimize_edge_for_speed_p (taken_edge))
    {
      if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
	{
          [...reject threading...]
        }
    }
  else if (n_insns > 1)
    {
      [...reject threading...]
    }

With speed_p is hardwired to false for the early threader
( pass_early_thread_jumps::execute ):

	find_jump_threads_backwards (bb, false);

So we always fall to the n_insns > 1 case and thus only rarely get to
thread.

In this patch I change that call in pass_early_thread_jumps::execute to
instead look at optimize_bb_for_speed_p (bb) . That allows the speed_p
check to pass in the main threading cost model, and then the
optimize_edge_for_speed_p can also pass. That gets the first stage of
jump-threading back working in a proprietary benchmark which is sensitive
to this optimisation.

To get the rest of the required jump threading, I also have to weaken the
cost model - and this is obviously a hack! The easy hack is to special case
when the taken edge has frequency zero, and permit jump threading there.

I know this patch is likely not the preferred way to fix this. For me that
would be a change to the cost model, which as I mentioned above I think
misses the point about which edges we want to thread. By far the best
fix would be to the junk edge profiling data we create during threading.

However, this patch does fix the performance issues identified in PR77445,
and does highlight a fundamental issue with the early threader (which
doesn't seem to me like it will be effective while it sets speed_p to
false), so I'd like it to be considered for trunk if no better fix appears
before stage 4.

Bootstrapped on x86_64 with no issues. The testsuite changes just reshuffle
which passes spot the threading opportunities.

OK?

Thanks,
James

---
gcc/

2016-12-15  James Greenhalgh  <james.greenhalgh@arm.com>

	PR tree-optimization/77445
	* tree-ssa-threadbackward.c (profitable_jump_thread_path) Work
	around sometimes corrupt edge frequency data.
	(pass_early_thread_jumps::execute): Pass
	optimize_bb_for_speed_p as the speed_p parameter to
	find_jump_threads_backwards to enable threading in more cases.

gcc/testsuite/

2016-12-15  James Greenhalgh  <james.greenhalgh@arm.com>

	PR tree-optimization/77445
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust options and dump passes.
	* gcc.dg/tree-ssa/pr66752-3.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index 896c8bf..39ec3d6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
+/* { dg-options "-O2 -fdump-tree-ethread-details -fdump-tree-dce2" } */
 
 extern int status, pt;
 extern int count;
@@ -34,7 +34,7 @@ foo (int N, int c, int b, int *a)
 
 /* There are 4 FSM jump threading opportunities, all of which will be
    realized, which will eliminate testing of FLAG, completely.  */
-/* { dg-final { scan-tree-dump-times "Registering FSM" 4 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Registering FSM" 4 "ethread"} } */
 
 /* There should be no assignments or references to FLAG, verify they're
    eliminated as early as possible.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 9a9d1cb..5b087fb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,8 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 16"  "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
+/* { dg-options "-O2 -fdump-tree-ethread-stats -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 16"  "ethread" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 6"  "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 4" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 2" "thread3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" } } */
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 203e20e..0d29ab5 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -311,7 +311,20 @@ profitable_jump_thread_path (vec<basic_block, va_gc> *&path,
       return NULL;
     }
 
-  if (speed_p && optimize_edge_for_speed_p (taken_edge))
+
+  /* FIXME: Edge frequency can get badly out shape as a result of
+     the jump threading passes.  In those cases,
+     EDGE_FREQUENCY (taken_edge) == 0 , and so trivially fails the
+     test for optimize_edge_for_speed_p.  The correct fix would
+     be to ensure that profiling information coming out of jump threading
+     is meaningful, but in lieu of that add a hack check to this cost model
+     which permits jump threading in the case EDGE_FREQUENCY has been
+     corrupted.  Only do this if the profile info is present and corrupt,
+     not if it is absent.  */
+  if (speed_p
+      && (optimize_edge_for_speed_p (taken_edge)
+	  || (profile_status_for_fn (cfun) != PROFILE_ABSENT
+	      && EDGE_FREQUENCY (taken_edge) == 0)))
     {
       if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
 	{
@@ -870,7 +883,7 @@ pass_early_thread_jumps::execute (function *fun)
   FOR_EACH_BB_FN (bb, fun)
     {
       if (EDGE_COUNT (bb->succs) > 1)
-	find_jump_threads_backwards (bb, false);
+	find_jump_threads_backwards (bb, optimize_bb_for_speed_p (bb));
     }
   thread_through_all_blocks (true);
   return 0;

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]