This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something


This tries to get rid of the sledge-hammer that disables PRE if
the current function is not optimized for speed (which is always
true if optimize_size is set ...).

The idea is to allow regular and phi-translation triggered full redundancy
elimination to be performed even if a path is to be optimized for size.
Thus, we limit us to only perform insertions when we can remove a full
redundancy on a path in the CFG we want to optimize for speed.

The effect of this patch is that PRE is now enabled at -Os and performs
only full redundancy elimination (thus as if PRE would run but never
insert anything).  This should reduce code-size and remove the odd
behavior that a VN before loop optimizations is missing at -Os.

I wonder if we don't want to specialize the case where the value is
available in all preds but is not the same value (thus, we'd only
need to insert a PHI node).  For low indegree blocks this might
result in smaller code as well.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2009-07-14  Richard Guenther  <rguenther@suse.de>

	* tree-ssa-pre.c (do_regular_insertion): Only insert if a
	redundancy along a path in the CFG we want to optimize for speed
	is going to be removed.
	(execute_pre): Do partial-PRE only if the function is to be
	optimized for speed.
	(gate_pre): Do not turn off all of PRE when not optimizing a
	function for speed.

Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c	(revision 149626)
--- gcc/tree-ssa-pre.c	(working copy)
*************** do_regular_insertion (basic_block block,
*** 3352,3357 ****
--- 3352,3358 ----
  	  pre_expr eprime = NULL;
  	  edge_iterator ei;
  	  pre_expr edoubleprime = NULL;
+ 	  bool do_insertion = false;
  
  	  val = get_expr_value_id (expr);
  	  if (bitmap_set_contains_value (PHI_GEN (block), val))
*************** do_regular_insertion (basic_block block,
*** 3403,3408 ****
--- 3404,3413 ----
  		{
  		  avail[bprime->index] = edoubleprime;
  		  by_some = true;
+ 		  /* We want to perform insertions to remove a redundancy on
+ 		     a path in the CFG we want to optimize for speed.  */
+ 		  if (optimize_edge_for_speed_p (pred))
+ 		    do_insertion = true;
  		  if (first_s == NULL)
  		    first_s = edoubleprime;
  		  else if (!pre_expr_eq (first_s, edoubleprime))
*************** do_regular_insertion (basic_block block,
*** 3413,3419 ****
  	     already existing along every predecessor, and
  	     it's defined by some predecessor, it is
  	     partially redundant.  */
! 	  if (!cant_insert && !all_same && by_some && dbg_cnt (treepre_insert))
  	    {
  	      if (insert_into_preds_of_block (block, get_expression_id (expr),
  					      avail))
--- 3418,3425 ----
  	     already existing along every predecessor, and
  	     it's defined by some predecessor, it is
  	     partially redundant.  */
! 	  if (!cant_insert && !all_same && by_some && do_insertion
! 	      && dbg_cnt (treepre_insert))
  	    {
  	      if (insert_into_preds_of_block (block, get_expression_id (expr),
  					      avail))
*************** fini_pre (bool do_fre)
*** 4475,4485 ****
     only wants to do full redundancy elimination.  */
  
  static unsigned int
! execute_pre (bool do_fre ATTRIBUTE_UNUSED)
  {
    unsigned int todo = 0;
  
!   do_partial_partial = optimize > 2;
  
    /* This has to happen before SCCVN runs because
       loop_optimizer_init may create new phis, etc.  */
--- 4481,4491 ----
     only wants to do full redundancy elimination.  */
  
  static unsigned int
! execute_pre (bool do_fre)
  {
    unsigned int todo = 0;
  
!   do_partial_partial = optimize > 2 && optimize_function_for_speed_p (cfun);
  
    /* This has to happen before SCCVN runs because
       loop_optimizer_init may create new phis, etc.  */
*************** do_pre (void)
*** 4563,4570 ****
  static bool
  gate_pre (void)
  {
!   /* PRE tends to generate bigger code.  */
!   return flag_tree_pre != 0 && optimize_function_for_speed_p (cfun);
  }
  
  struct gimple_opt_pass pass_pre =
--- 4569,4575 ----
  static bool
  gate_pre (void)
  {
!   return flag_tree_pre != 0;
  }
  
  struct gimple_opt_pass pass_pre =


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]