This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: backward threading heuristics tweek


On 06/06/2016 04:19 AM, Jan Hubicka wrote:
Hi,
while looking into profile mismatches introduced by the backward threading pass
I noticed that the heuristics seems quite simplistics.  First it should be
profile sensitive and disallow duplication when optimizing cold paths. Second
it should use estimate_num_insns because gimple statement count is not really
very realistic estimate of final code size effect and third there seems to be
no reason to disable the pass for functions optimized for size.
I've never been happy with the size estimation code -- it's a series of hacks to address a couple pathological codesize regressions that were seen when comparing gcc-6 to prior versions. I won't lose any sleep if we move to estimate_num_insns and rely more on profile data.



If we block duplication for more than 1 insns for size optimized paths the pass
is able to do majority of threading decisions that are for free and improve codegen.
The code size benefit was between 0.5% to 2.7% on testcases I tried (tramp3d,
GCC modules, xlanancbmk and some other stuff around my hd).

Bootstrapped/regtested x86_64-linux, seems sane?

The pass should also avoid calling cleanup_cfg when no trheading was done
and i do not see why it is guarded by expensive_optimizations. What are the
main compile time complexity limitations?
The pass essentially walks up the use-def links in the CFG. WHen it encounters a PHI, we walk up every PHI argument. That's where it gets expensive. I think there's a better algorithm to do those walks that doesn't start at the beginning each time, but I haven't had time to experiment with coding it up.



Honza

	* tree-ssa-threadbackward.c: Include tree-inline.h
	(profitable_jump_thread_path): Use estimate_num_insns to estimate
	size of copied block; for cold paths reduce duplication.
	(find_jump_threads_backwards): Remove redundant tests.
	(pass_thread_jumps::gate): Enable for -Os.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update testcase.
Index: tree-ssa-threadbackward.c
===================================================================
--- tree-ssa-threadbackward.c	(revision 237101)
+++ tree-ssa-threadbackward.c	(working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
+#include "tree-inline.h"
Probably an indication we want estimate_num_insns broken out into a more generic place that could be used by the threader, inlining, etc. Please consider that as a follow-up.




 static int max_threaded_paths;

@@ -210,7 +211,7 @@ profitable_jump_thread_path (vec<basic_b
 		  && !(gimple_code (stmt) == GIMPLE_ASSIGN
 		       && gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
 		  && !is_gimple_debug (stmt))
-		++n_insns;
+	        n_insns += estimate_num_insns (stmt, &eni_size_weights);
 	    }

 	  /* We do not look at the block with the threaded branch
@@ -238,13 +239,15 @@ profitable_jump_thread_path (vec<basic_b
 	threaded_through_latch = true;
     }

+  gimple *stmt = get_gimple_control_stmt ((*path)[0]);
+  gcc_assert (stmt);
+
   /* We are going to remove the control statement at the end of the
      last block in the threading path.  So don't count it against our
      statement count.  */
-  n_insns--;

-  gimple *stmt = get_gimple_control_stmt ((*path)[0]);
-  gcc_assert (stmt);
+  n_insns-= estimate_num_insns (stmt, &eni_size_weights);
Whitespace nit here before the "-=".

I think this is fine with the whitespace fix. And again, consider moving estimate_num_insns to a new location outside the inliner.

jeff



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]