This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PR80155: Code hoisting and register pressure


Hi,
I am trying to work on PR80155, which exposes a problem with code
hoisting and register pressure on a leading embedded benchmark for ARM
cortex-m7, where code-hoisting causes an extra register spill.

I have attached two test-cases which (hopefully) are representative of
the original test-case.
The first one (trans_dfa.c) is bigger and somewhat similar to the
original test-case and trans_dfa_2.c is hand-reduced version of
trans_dfa.c. There's 2 spills caused with trans_dfa.c
and one spill with trans_dfa_2.c due to lesser amount of cases.
The test-cases in the PR are probably not relevant.

Initially I thought the spill was happening because of "too many
hoistings" taking place in original test-case thus increasing the
register pressure, but it seems the spill is possibly caused because
expression gets hoisted out of a block that is on loop exit.

For example, the following hoistings take place with trans_dfa_2.c:

(1) Inserting expression in block 4 for code hoisting:
{mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)

(2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006)

(3) Inserting expression in block 4 for code hoisting:
{pointer_plus_expr,s_33,1} (0023)

(4) Inserting expression in block 3 for code hoisting:
{pointer_plus_expr,s_33,1} (0023)

The issue seems to be hoisting of (*tab + 1) which consists of first
two hoistings in block 4
from blocks 5 and 9, which causes the extra spill. I verified that by
disabling hoisting into block 4,
which resulted in no extra spills.

I wonder if that's because the expression (*tab + 1) is getting
hoisted from blocks 5 and 9,
which are on loop exit ? So the expression that was previously
computed in a block on loop exit, gets hoisted outside that block
which possibly makes the allocator more defensive ? Similarly
disabling hoisting of expressions which appeared in blocks on loop
exit in original test-case prevented the extra spill. The other
hoistings didn't seem to matter.

If that's the case, would it make sense to add another constraint to
hoisting to not hoist expression if it appears in a block that is on
loop exit (exception is if block has only single successor), at-least
for targets like cortex-m7 where the effect of spill is more
pronounced ?

I tried to write an untested prototype patch (approach-8.diff) on
these lines, to refuse hoisting if an expression appears in block on
loop exit. The patch adds a new map pre_expr_block_d, that maps
pre_expr to the set of blocks (as a bitmap) it appears in and are on
loop exit, which is computed during compute_avail.
During do_hoist_insertion, it simply checks if the bitmap of blocks is
not empty.
It works for the attached test-cases and passes ssa-pre-*.c and
ssa-hoist-*.c tests.

Alternatively, we could restrict replacing expression by it's leader
in eliminate_dom_walker::before_dom_children if the expression appears
in a block on loop exit.
In principle this is more general than hoisting, but I have restricted
it to only hoisted expressions to avoid being overly conservative.
Similarly, this constrained could be made conditional, only for
targets like cortex-m7. I have attached a prototype patch based on
this approach (approach-9.diff). Although it works for attached
test-cases, unfortunately it ICE's with the original test-case in
tail-merge, I am working on that.

I am not sure though if either of these approaches are in the correct
direction and would
be grateful for suggestions on the issue!

Thanks,
Prathamesh
int foo(char **save_s, int *tab)
{
  char *s;
  int state = 0;
  int c;

  for (s = *save_s; *s; s++)
    {
      c = *s;
      switch (state)
	{
	  case 0:
	   if (c == 'A')
	     {
	       state = 1;
	       tab[0]++;
	     }
	   else
	     {
	       state = -1;
	       tab[0]++;
	     }
	  break;

	  case 1:
	    if (c == 'B')
	      {
		state = 2;
		tab[1]++;
	      }
	    else
	      {
		state = -1;
		tab[1]++;
	      }
	   break;

	  case 2:
	    if (c == 'C')
	      {
		state = 3;
		tab[2]++;
	      }
	    else
	      {
		state = -1;
		tab[2]++;
	      }
	   break;
	  
	  case 3:
	    if (c == 'D')
	      {
		state = 4;
		tab[3]++;
	      }
	    else
	      {
		state = -1;
		tab[3]++;
	      }
	  break;

	 case 4:
	    if (c == 'E')
	      {
		state = 5;
		tab[4]++;
	      }
	    else
	      {
		state = -1;
		tab[4]++;
	      }
	 break;
	 
	 default:
	   break;
	}
    }

  tab[state]++;
  *save_s = s;
  return state;
}
int foo(char **save_s, int *tab) 
{
  char *s; 
  int c;

  int state = 0;
  for (s = *save_s; *s; s++)
    {
      c = *s;
      switch (state)
	{
	  case 0:
	   if (c == 'A')
	     {
	       state = 1;
	       tab[0]++;
	     }
	   else
	     {
	       state = -1;
	       tab[0]++;
	     }
	  break;

	 default:
	   break;
	}
    }

  tab[state]++;
  *save_s = s;
  return state;
}
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 11b1938216e..d7663bee1ee 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dce.h"
 #include "tree-cfgcleanup.h"
 #include "alias.h"
+#include "graph.h"
 
 /* Even though this file is called tree-ssa-pre.c, we actually
    implement a bit more than just PRE here.  All of them piggy-back
@@ -2433,6 +2434,7 @@ compute_antic (void)
    for performing quick dead code elimination of insertions we made
    that didn't turn out to be necessary.   */
 static bitmap inserted_exprs;
+static bitmap hoisted_exprs;
 
 /* The actual worker for create_component_ref_by_pieces.  */
 
@@ -3549,6 +3551,13 @@ do_hoist_insertion (basic_block block)
 	  continue;
 	}
 
+#if 0
+	  fprintf (stderr,
+		   "Inserting expression in block %d for code hoisting: ",
+		   block->index);
+	  print_pre_expr (stderr, expr);
+	  fprintf (stderr, " (%04d)\n", value_id);
+#endif
       /* OK, we should hoist this value.  Perform the transformation.  */
       pre_stats.hoist_insert++;
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3583,6 +3592,8 @@ do_hoist_insertion (basic_block block)
 	continue;
 
       new_stuff = true;
+      bitmap_set_bit (hoisted_exprs, SSA_NAME_VERSION (res));
+//      fprintf (stderr, "\nres = "); debug_generic_expr (res);
     }
 
   exprs.release ();
@@ -4066,6 +4077,7 @@ init_pre (void)
   name_to_id.create (0);
 
   inserted_exprs = BITMAP_ALLOC (NULL);
+  hoisted_exprs = BITMAP_ALLOC (NULL);
 
   connect_infinite_loops_to_exit ();
   memset (&pre_stats, 0, sizeof (pre_stats));
@@ -4183,7 +4195,7 @@ pass_pre::execute (function *fun)
   statistics_counter_event (fun, "New PHIs", pre_stats.phis);
 
   /* Remove all the redundant expressions.  */
-  todo |= vn_eliminate (inserted_exprs);
+  todo |= vn_eliminate (inserted_exprs, hoisted_exprs);
 
   /* Because we don't follow exactly the standard PRE algorithm, and decide not
      to insert PHI nodes sometimes, and because value numbering of casts isn't
@@ -4217,6 +4229,7 @@ pass_pre::execute (function *fun)
      the todo.  */
   update_ssa (TODO_update_ssa_only_virtuals);
 
+  debug_dot_cfg (cfun, 0, "after-pre.dot");
   return todo;
 }
 
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 1463c1d4116..2ae88dc574e 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -5171,7 +5171,7 @@ vn_nary_may_trap (vn_nary_op_t nary)
 class eliminate_dom_walker : public dom_walker
 {
 public:
-  eliminate_dom_walker (cdi_direction, bitmap);
+  eliminate_dom_walker (cdi_direction, bitmap, bitmap hoisted_exprs = NULL);
   ~eliminate_dom_walker ();
 
   virtual edge before_dom_children (basic_block);
@@ -5188,6 +5188,7 @@ public:
 
   /* SSA names that had their defs inserted by PRE if do_pre.  */
   bitmap inserted_exprs;
+  bitmap hoisted_exprs;
 
   /* Blocks with statements that have had their EH properties changed.  */
   bitmap need_eh_cleanup;
@@ -5202,10 +5203,12 @@ public:
 };
 
 eliminate_dom_walker::eliminate_dom_walker (cdi_direction direction,
-					    bitmap inserted_exprs_)
+					    bitmap inserted_exprs_,
+					    bitmap hoisted_exprs_)
   : dom_walker (direction), do_pre (inserted_exprs_ != NULL),
     el_todo (0), eliminations (0), insertions (0),
-    inserted_exprs (inserted_exprs_)
+    inserted_exprs (inserted_exprs_),
+    hoisted_exprs (hoisted_exprs_)
 {
   need_eh_cleanup = BITMAP_ALLOC (NULL);
   need_ab_cleanup = BITMAP_ALLOC (NULL);
@@ -5338,7 +5341,18 @@ eliminate_dom_walker::eliminate_insert (gimple_stmt_iterator *gsi, tree val)
   return res;
 }
 
+static bool
+block_loop_exit_p (basic_block b)
+{
+  edge e;
+  edge_iterator ei;
 
+  FOR_EACH_EDGE (e, ei, b->succs)
+    if (b->loop_father != e->dest->loop_father)
+      return true;
+
+  return false;
+}
 
 /* Perform elimination for the basic-block B during the domwalk.  */
 
@@ -5369,6 +5383,17 @@ eliminate_dom_walker::before_dom_children (basic_block b)
 	}
 
       tree sprime = eliminate_avail (res);
+#if 1
+      if (block_loop_exit_p (b)
+	  && hoisted_exprs
+	  && sprime
+	  && bitmap_bit_p (hoisted_exprs, SSA_NAME_VERSION (sprime))
+	  && !single_succ_p (b))
+	{
+	  gsi_next (&gsi);
+	  continue;
+	}
+#endif
       if (sprime
 	  && sprime != res)
 	{
@@ -5432,6 +5458,19 @@ eliminate_dom_walker::before_dom_children (basic_block b)
 		   && is_global_var (gimple_assign_rhs1 (stmt)))))
 	{
 	  sprime = eliminate_avail (lhs);
+#if 1 
+	  if (block_loop_exit_p (b)
+	      && sprime
+	      && hoisted_exprs
+	      && bitmap_bit_p (hoisted_exprs, SSA_NAME_VERSION (sprime))
+	      && !single_succ_p (b)) 
+	    {
+//	      fprintf (stderr, "\nblock = %d, stmt = ", b->index); debug_gimple_stmt (stmt);
+//	      fprintf (stderr, "b single succ = %d\n", single_succ_p (b));
+	      continue;
+	    }
+#endif
+
 	  if (!sprime)
 	    {
 	      /* If there is no existing usable leader but SCCVN thinks
@@ -5697,6 +5736,13 @@ eliminate_dom_walker::before_dom_children (basic_block b)
 	  if (TREE_CODE (use) != SSA_NAME)
 	    continue;
 	  tree sprime = eliminate_avail (use);
+	  if (block_loop_exit_p (b)
+	      && sprime
+	      && hoisted_exprs
+	      && bitmap_bit_p (hoisted_exprs, SSA_NAME_VERSION (sprime))
+	      && !single_succ_p (b)) 
+	    continue;
+
 	  if (sprime && sprime != use
 	      && may_propagate_copy (use, sprime)
 	      /* We substitute into debug stmts to avoid excessive
@@ -5877,6 +5923,13 @@ eliminate_dom_walker::before_dom_children (basic_block b)
 	      || virtual_operand_p (arg))
 	    continue;
 	  tree sprime = eliminate_avail (arg);
+	  if (block_loop_exit_p (b)
+	      && sprime
+	      && hoisted_exprs
+	      && bitmap_bit_p (hoisted_exprs, SSA_NAME_VERSION (sprime))
+	      && !single_succ_p (b)) 
+	    continue;
+
 	  if (sprime && may_propagate_copy (arg, sprime))
 	    propagate_value (use_p, sprime);
 	}
@@ -5903,9 +5956,9 @@ eliminate_dom_walker::after_dom_children (basic_block)
 /* Eliminate fully redundant computations.  */
 
 unsigned int
-vn_eliminate (bitmap inserted_exprs)
+vn_eliminate (bitmap inserted_exprs, bitmap hoisted_exprs)
 {
-  eliminate_dom_walker el (CDI_DOMINATORS, inserted_exprs);
+  eliminate_dom_walker el (CDI_DOMINATORS, inserted_exprs, hoisted_exprs);
   el.avail.reserve (num_ssa_names);
 
   el.walk (cfun->cfg->x_entry_block_ptr);
diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
index 9356520cbe5..5f9bab7b83f 100644
--- a/gcc/tree-ssa-sccvn.h
+++ b/gcc/tree-ssa-sccvn.h
@@ -214,7 +214,7 @@ extern vn_ssa_aux_t VN_INFO (tree);
 extern vn_ssa_aux_t VN_INFO_GET (tree);
 tree vn_get_expr_for (tree);
 void run_scc_vn (vn_lookup_kind);
-unsigned int vn_eliminate (bitmap);
+unsigned int vn_eliminate (bitmap, bitmap hoisted_exprs = NULL);
 void free_scc_vn (void);
 void scc_vn_restore_ssa_info (void);
 tree vn_nary_op_lookup (tree, vn_nary_op_t *);
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 11b1938216e..d10a276daf2 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dce.h"
 #include "tree-cfgcleanup.h"
 #include "alias.h"
+#include "graph.h"
 
 /* Even though this file is called tree-ssa-pre.c, we actually
    implement a bit more than just PRE here.  All of them piggy-back
@@ -500,7 +501,6 @@ typedef struct bb_bitmap_sets
 #define BB_MAY_NOTRETURN(BB) ((bb_value_sets_t) ((BB)->aux))->contains_may_not_return_call
 #define BB_LIVE_VOP_ON_EXIT(BB) ((bb_value_sets_t) ((BB)->aux))->vop_on_exit
 
-
 /* This structure is used to keep track of statistics on what
    optimization PRE was able to perform.  */
 static struct
@@ -2434,6 +2434,25 @@ compute_antic (void)
    that didn't turn out to be necessary.   */
 static bitmap inserted_exprs;
 
+struct pre_expr_block_d : nofree_ptr_hash <pre_expr_block_d>
+{
+  pre_expr expr;
+  bitmap blocks;
+
+  /* hash_table support.  */
+  static inline hashval_t hash(const pre_expr_block_d *x)
+  {
+    return pre_expr_d::hash (x->expr);
+  }
+
+  static inline int equal(const pre_expr_block_d *x1, const pre_expr_block_d *x2)
+  {
+    return pre_expr_d::equal (x1->expr, x2->expr);
+  }
+};
+
+static hash_table<pre_expr_block_d> * expr_to_blocks;
+
 /* The actual worker for create_component_ref_by_pieces.  */
 
 static tree
@@ -3549,6 +3568,15 @@ do_hoist_insertion (basic_block block)
 	  continue;
 	}
 
+      pre_expr_block_d x;
+      x.expr = expr;
+      pre_expr_block_d **slot = expr_to_blocks->find_slot (&x, NO_INSERT);
+      if (slot)
+	{
+	  gcc_assert (!bitmap_empty_p ((*slot)->blocks));
+	  continue;
+	}
+
       /* OK, we should hoist this value.  Perform the transformation.  */
       pre_stats.hoist_insert++;
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3678,6 +3706,40 @@ insert (void)
   statistics_histogram_event (cfun, "insert iterations", num_iterations);
 }
 
+static void
+add_expr_to_blocks (pre_expr expr, basic_block block)
+{
+  pre_expr_block_d x;
+  x.expr = expr;
+
+  edge e;
+  edge_iterator ei;
+  bool loop_exit = false;
+
+  if (single_succ_p (block))
+    return;
+
+  FOR_EACH_EDGE (e, ei, block->succs)
+    if (block->loop_father != e->dest->loop_father)
+      {
+	loop_exit = true;
+	break;
+      }
+
+  if (!loop_exit)
+    return;
+
+  pre_expr_block_d **slot = expr_to_blocks->find_slot (&x, INSERT);
+  if (!*slot)
+    {
+      *slot = XNEW (struct pre_expr_block_d);
+      (*slot)->expr = expr;
+      (*slot)->blocks = BITMAP_ALLOC (NULL);
+    }
+
+   
+  bitmap_set_bit ((*slot)->blocks, block->index);
+}
 
 /* Compute the AVAIL set for all basic blocks.
 
@@ -3711,6 +3773,7 @@ compute_avail (void)
 
       e = get_or_alloc_expr_for_name (name);
       add_to_value (get_expr_value_id (e), e);
+      add_expr_to_blocks (e, ENTRY_BLOCK_PTR_FOR_FN (cfun));
       bitmap_insert_into_set (TMP_GEN (ENTRY_BLOCK_PTR_FOR_FN (cfun)), e);
       bitmap_value_insert_into_set (AVAIL_OUT (ENTRY_BLOCK_PTR_FOR_FN (cfun)),
 				    e);
@@ -3771,6 +3834,7 @@ compute_avail (void)
 
 	  pre_expr e = get_or_alloc_expr_for_name (result);
 	  add_to_value (get_expr_value_id (e), e);
+	  add_expr_to_blocks (e, block);
 	  bitmap_value_insert_into_set (AVAIL_OUT (block), e);
 	  bitmap_insert_into_set (PHI_GEN (block), e);
 	}
@@ -3808,6 +3872,7 @@ compute_avail (void)
 	      pre_expr e = get_or_alloc_expr_for_name (op);
 
 	      add_to_value (get_expr_value_id (e), e);
+	      add_expr_to_blocks (e, block);
 	      bitmap_insert_into_set (TMP_GEN (block), e);
 	      bitmap_value_insert_into_set (AVAIL_OUT (block), e);
 	    }
@@ -3862,6 +3927,7 @@ compute_avail (void)
 		    PRE_EXPR_REFERENCE (result) = ref;
 
 		    get_or_alloc_expression_id (result);
+		    add_expr_to_blocks (result, block);
 		    add_to_value (get_expr_value_id (result), result);
 		    bitmap_value_insert_into_set (EXP_GEN (block), result);
 		  }
@@ -4019,6 +4085,7 @@ compute_avail (void)
 
 		get_or_alloc_expression_id (result);
 		add_to_value (get_expr_value_id (result), result);
+		add_expr_to_blocks (result, block);
 		bitmap_value_insert_into_set (EXP_GEN (block), result);
 		continue;
 	      }
@@ -4077,6 +4144,7 @@ init_pre (void)
   bitmap_obstack_initialize (&grand_bitmap_obstack);
   phi_translate_table = new hash_table<expr_pred_trans_d> (5110);
   expression_to_id = new hash_table<pre_expr_d> (num_ssa_names * 3);
+  expr_to_blocks = new hash_table<pre_expr_block_d> (num_ssa_names * 3);
   FOR_ALL_BB_FN (bb, cfun)
     {
       EXP_GEN (bb) = bitmap_set_new ();

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]