[PATCH 2/2] loops: Invoke lim after successful loop interchange

Martin Jambor mjambor@suse.cz
Mon Nov 9 19:58:02 GMT 2020


Hi,

this patch modifies the loop invariant pass so that is can operate
only on a single requested loop and its sub-loops and ignore the rest
of the function, much like it currently ignores basic blocks that are
not in any real loop.  It then invokes it from within the loop
interchange pass when it successfully swaps two loops.  This avoids
the non-LTO -Ofast run-time regressions of 410.bwaves and 503.bwaves_r
(which are 19% and 15% faster than current master on an AMD zen2
machine) while not introducing a full LIM pass into the pass pipeline.

I have not modified the LIM data structures, this means that it still
contains vectors indexed by loop->num even though only a single loop
nest is actually processed.  I also did not replace the uses of
pre_and_rev_post_order_compute_fn with a function that would count a
postorder only for a given loop.  I can of course do so if the
approach is otherwise deemed viable.

The patch adds one additional global variable requested_loop to the
pass and then at various places behaves differently when it is set.  I
was considering storing the fake root loop into it for normal
operation, but since this loop often requires special handling anyway,
I came to the conclusion that the code would actually end up less
straightforward.

I have bootstrapped and tested the patch on x86_64-linux and a very
similar one on aarch64-linux.  I have also tested it by modifying the
tree_ssa_lim function to run loop_invariant_motion_from_loop on each
real outermost loop in a function and this variant also passed
bootstrap and all tests, including dump scans, of all languages.

I have built the entire SPEC 2006 FPrate monitoring the activity of
the LIM pass without and with the patch (on top of commit b642fca1c31
with which 526.blender_r and 538.imagick_r seemed to be failing) and
it only examined 0.2% more loops, 0.02% more BBs and even fewer
percent of statements because it is invoked only in a rather special
circumstance.  But the patch allows for more such need-based uses at
hopefully reasonable cost.

Since I do not have much experience with loop optimizers, I expect
that there will be requests to adjust the patch during the review.
Still, it fixes a performance regression against GCC 9 and so I hope
to address the concerns in time to get it into GCC 11.

Thanks,

Martin


gcc/ChangeLog:

2020-11-08  Martin Jambor  <mjambor@suse.cz>

	* gimple-loop-interchange.cc (pass_linterchange::execute): Call
	loop_invariant_motion_from_loop on affected loop nests.
	* tree-ssa-loop-im.c (requested_loop): New variable.
	(get_topmost_lim_loop): New function.
	(outermost_invariant_loop): Use it, cap discovered topmost loop at
	requested_loop.
	(determine_max_movement): Use get_topmost_lim_loop.
	(set_level): Assert that the selected loop is not outside of
	requested_loop.
	(compute_invariantness): Do not process loops outside of
	requested_loop, if non-NULL.
	(move_computations_worker): Likewise.
	(mark_ref_stored): Stop iteration at requested_loop, if non-NULL.
	(mark_ref_loaded): Likewise.
	(analyze_memory_references): If non-NULL, only process basic
	blocks and loops in requested_loop.  Compute contains_call bitmap.
	(do_store_motion): Only process requested_loop if non-NULL.
	(fill_always_executed_in): Likewise.  Also accept contains_call as
	a parameter rather than computing it.
	(tree_ssa_lim_initialize): New parameter which is stored into
	requested_loop.  Additonal dumping. Only initialize
	bb_loop_postorder for loops within requested_loop, if non-NULL.
	(tree_ssa_lim_finalize): Clear requested_loop, additional dumping.
	(loop_invariant_motion_from_loop): New function.
	(tree_ssa_lim): Move all functionality to
	loop_invariant_motion_from_loop, call it.
	* tree-ssa-loop-manip.h (loop_invariant_motion_from_loop): Declare.

---
 gcc/gimple-loop-interchange.cc |  30 +++++-
 gcc/tree-ssa-loop-im.c         | 176 ++++++++++++++++++++++++---------
 gcc/tree-ssa-loop-manip.h      |   2 +
 3 files changed, 156 insertions(+), 52 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 1656004ecf0..8c376228779 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2068,6 +2068,7 @@ pass_linterchange::execute (function *fun)
     return 0;
 
   bool changed_p = false;
+  auto_vec<class loop *, 4> loops_to_lim;
   class loop *loop;
   FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
     {
@@ -2077,7 +2078,11 @@ pass_linterchange::execute (function *fun)
       if (prepare_perfect_loop_nest (loop, &loop_nest, &datarefs, &ddrs))
 	{
 	  tree_loop_interchange loop_interchange (loop_nest);
-	  changed_p |= loop_interchange.interchange (datarefs, ddrs);
+	  if (loop_interchange.interchange (datarefs, ddrs))
+	    {
+	      changed_p = true;
+	      loops_to_lim.safe_push (loop_nest[0]);
+	    }
 	}
       free_dependence_relations (ddrs);
       free_data_refs_with_aux (datarefs);
@@ -2085,8 +2090,27 @@ pass_linterchange::execute (function *fun)
     }
 
   if (changed_p)
-    scev_reset ();
-  return changed_p ? (TODO_update_ssa_only_virtuals) : 0;
+    {
+      unsigned todo = TODO_update_ssa_only_virtuals;
+      unsigned len  = loops_to_lim.length ();
+      for (unsigned i = len - 1; i > 0; i--)
+	for (int j = i - 1; j >= 0; j--)
+	  if (loops_to_lim[j] == loops_to_lim[i]
+	      || flow_loop_nested_p (loops_to_lim[j], loops_to_lim[i]))
+	    {
+	      loops_to_lim.pop ();
+	      break;
+	    }
+
+      len  = loops_to_lim.length ();
+      for (unsigned i = 0; i < len; i++)
+	todo |= loop_invariant_motion_from_loop (cfun, loops_to_lim[i]);
+
+      scev_reset ();
+      return todo;
+    }
+  else
+    return 0;
 }
 
 } // anon namespace
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 6bb07e133cd..24b541dfb17 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -244,6 +244,11 @@ static struct
 static bitmap_obstack lim_bitmap_obstack;
 static obstack mem_ref_obstack;
 
+/* If LIM has been requested only for a particular loop (and all of the
+   enclosed loops this variable points to it, otherwise it points to the dummy
+   loop spanning whole function.  */
+static class loop *requested_loop;
+
 static bool ref_indep_loop_p (class loop *, im_mem_ref *, dep_kind);
 static bool ref_always_accessed_p (class loop *, im_mem_ref *, bool);
 static bool refs_independent_p (im_mem_ref *, im_mem_ref *, bool = true);
@@ -410,6 +415,21 @@ movement_possibility (gimple *stmt)
   return ret;
 }
 
+/* Return either the topmost real loop in which LOOP is nested if processing
+   the whole function, or requested_loop if not.  */
+
+static class loop *
+get_topmost_lim_loop (class loop *loop)
+{
+  if (requested_loop)
+    {
+      gcc_assert (loop == requested_loop
+		  || flow_loop_nested_p (requested_loop, loop));
+      return requested_loop;
+    }
+  return superloop_at_depth (loop, 1);
+}
+
 /* Suppose that operand DEF is used inside the LOOP.  Returns the outermost
    loop to that we could move the expression using DEF if it did not have
    other operands, i.e. the outermost loop enclosing LOOP in that the value
@@ -424,18 +444,18 @@ outermost_invariant_loop (tree def, class loop *loop)
   struct lim_aux_data *lim_data;
 
   if (!def)
-    return superloop_at_depth (loop, 1);
+    return get_topmost_lim_loop (loop);
 
   if (TREE_CODE (def) != SSA_NAME)
     {
       gcc_assert (is_gimple_min_invariant (def));
-      return superloop_at_depth (loop, 1);
+      return get_topmost_lim_loop (loop);
     }
 
   def_stmt = SSA_NAME_DEF_STMT (def);
   def_bb = gimple_bb (def_stmt);
   if (!def_bb)
-    return superloop_at_depth (loop, 1);
+    return get_topmost_lim_loop (loop);
 
   max_loop = find_common_loop (loop, def_bb->loop_father);
 
@@ -443,6 +463,16 @@ outermost_invariant_loop (tree def, class loop *loop)
   if (lim_data != NULL && lim_data->max_loop != NULL)
     max_loop = find_common_loop (max_loop,
 				 loop_outer (lim_data->max_loop));
+  if (requested_loop)
+    {
+      class loop *req_outer = loop_outer (requested_loop);
+      if (flow_loop_nested_p (max_loop, req_outer))
+	max_loop = req_outer;
+      else
+	gcc_assert (max_loop == req_outer
+		    || flow_loop_nested_p (req_outer, max_loop));
+    }
+
   if (max_loop == loop)
     return NULL;
   max_loop = superloop_at_depth (loop, loop_depth (max_loop) + 1);
@@ -677,7 +707,7 @@ determine_max_movement (gimple *stmt, bool must_preserve_exec)
   if (must_preserve_exec)
     level = ALWAYS_EXECUTED_IN (bb);
   else
-    level = superloop_at_depth (loop, 1);
+    level = get_topmost_lim_loop (loop);
   lim_data->max_loop = level;
 
   if (gphi *phi = dyn_cast <gphi *> (stmt))
@@ -813,6 +843,9 @@ set_level (gimple *stmt, class loop *orig_loop, class loop *level)
 
   gcc_assert (level == lim_data->max_loop
 	      || flow_loop_nested_p (lim_data->max_loop, level));
+  gcc_assert (!requested_loop
+	      || requested_loop == lim_data->max_loop
+	      || flow_loop_nested_p (requested_loop, level));
 
   lim_data->tgt_loop = level;
   FOR_EACH_VEC_ELT (lim_data->depends, i, dep_stmt)
@@ -983,7 +1016,12 @@ compute_invariantness (basic_block bb)
   class loop *outermost = ALWAYS_EXECUTED_IN (bb);
   struct lim_aux_data *lim_data;
 
-  if (!loop_outer (bb->loop_father))
+  if (requested_loop)
+    {
+      if (!flow_bb_inside_loop_p (requested_loop, bb))
+	return;
+    }
+  else if (!loop_outer (bb->loop_father))
     return;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -1122,7 +1160,12 @@ move_computations_worker (basic_block bb)
   struct lim_aux_data *lim_data;
   unsigned int todo = 0;
 
-  if (!loop_outer (bb->loop_father))
+  if (requested_loop)
+    {
+      if (!flow_bb_inside_loop_p (requested_loop, bb))
+	return todo;
+    }
+  else if (!loop_outer (bb->loop_father))
     return todo;
 
   for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); )
@@ -1414,7 +1457,9 @@ set_ref_stored_in_loop (im_mem_ref *ref, class loop *loop)
 static void
 mark_ref_stored (im_mem_ref *ref, class loop *loop)
 {
-  while (loop != current_loops->tree_root
+  class loop *stop_loop = requested_loop
+    ? loop_outer (requested_loop) : current_loops->tree_root;
+  while (loop != stop_loop
 	 && set_ref_stored_in_loop (ref, loop))
     loop = loop_outer (loop);
 }
@@ -1435,7 +1480,9 @@ set_ref_loaded_in_loop (im_mem_ref *ref, class loop *loop)
 static void
 mark_ref_loaded (im_mem_ref *ref, class loop *loop)
 {
-  while (loop != current_loops->tree_root
+  class loop *stop_loop = requested_loop
+    ? loop_outer (requested_loop) : current_loops->tree_root;
+  while (loop != stop_loop
 	 && set_ref_loaded_in_loop (ref, loop))
     loop = loop_outer (loop);
 }
@@ -1619,24 +1666,35 @@ sort_locs_in_loop_postorder_cmp (const void *loc1_, const void *loc2_,
   return bb_loop_postorder[loop1->num] < bb_loop_postorder[loop2->num] ? -1 : 1;
 }
 
-/* Gathers memory references in loops.  */
+/* Gathers memory references in loops.  Set a bit in CONTAINS_CALL
+   corresponding to index of each basic block which contains a non-pure call
+   statement.  */
 
 static void
-analyze_memory_references (void)
+analyze_memory_references (sbitmap contains_call)
 {
   gimple_stmt_iterator bsi;
   basic_block bb, *bbs;
   class loop *loop, *outer;
   unsigned i, n;
 
-  /* Collect all basic-blocks in loops and sort them after their
-     loops postorder.  */
+  /* Collect all basic-blocks in loops (either in the whole function or just in
+     the requested loop) and sort them after their loops postorder.  */
   i = 0;
-  bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
-  FOR_EACH_BB_FN (bb, cfun)
-    if (bb->loop_father != current_loops->tree_root)
-      bbs[i++] = bb;
-  n = i;
+  if (requested_loop)
+    {
+      bbs = get_loop_body (requested_loop);
+      n = requested_loop->num_nodes;
+    }
+  else
+    {
+      bbs = XNEWVEC (basic_block,
+		     n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+      FOR_EACH_BB_FN (bb, cfun)
+	if (bb->loop_father != current_loops->tree_root)
+	  bbs[i++] = bb;
+      n = i;
+    }
   gcc_sort_r (bbs, n, sizeof (basic_block), sort_bbs_in_loop_postorder_cmp,
 	      bb_loop_postorder);
 
@@ -1648,7 +1706,11 @@ analyze_memory_references (void)
     {
       basic_block bb = bbs[i];
       for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
-        gather_mem_refs_stmt (bb->loop_father, gsi_stmt (bsi));
+	{
+	  gather_mem_refs_stmt (bb->loop_father, gsi_stmt (bsi));
+	  if (nonpure_call_p (gsi_stmt (bsi)))
+	    bitmap_set_bit (contains_call, bb->index);
+	}
     }
 
   /* Verify the list of gathered memory references is sorted after their
@@ -1667,7 +1729,9 @@ analyze_memory_references (void)
 
   /* Propagate the information about accessed memory references up
      the loop hierarchy.  */
-  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+  class loop *stop_loop = requested_loop
+    ? loop_outer (requested_loop) : current_loops->tree_root;
+  FOR_EACH_ENCLOSED_LOOP (requested_loop, loop, LI_FROM_INNERMOST)
     {
       /* Finalize the overall touched references (including subloops).  */
       bitmap_ior_into (&memory_accesses.all_refs_stored_in_loop[loop->num],
@@ -1676,7 +1740,7 @@ analyze_memory_references (void)
       /* Propagate the information about accessed memory references up
 	 the loop hierarchy.  */
       outer = loop_outer (loop);
-      if (outer == current_loops->tree_root)
+      if (outer == stop_loop)
 	continue;
 
       bitmap_ior_into (&memory_accesses.all_refs_stored_in_loop[outer->num],
@@ -2895,8 +2959,11 @@ do_store_motion (void)
   class loop *loop;
   bitmap sm_executed = BITMAP_ALLOC (&lim_bitmap_obstack);
 
-  for (loop = current_loops->tree_root->inner; loop != NULL; loop = loop->next)
-    store_motion_loop (loop, sm_executed);
+  if (requested_loop)
+    store_motion_loop (requested_loop, sm_executed);
+  else
+    for (loop = current_loops->tree_root->inner; loop != NULL; loop = loop->next)
+      store_motion_loop (loop, sm_executed);
 
   BITMAP_FREE (sm_executed);
 }
@@ -2982,27 +3049,13 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
    of its header implies execution of bb.  */
 
 static void
-fill_always_executed_in (void)
+fill_always_executed_in (sbitmap contains_call)
 {
-  basic_block bb;
   class loop *loop;
 
-  auto_sbitmap contains_call (last_basic_block_for_fn (cfun));
-  bitmap_clear (contains_call);
-  FOR_EACH_BB_FN (bb, cfun)
-    {
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	{
-	  if (nonpure_call_p (gsi_stmt (gsi)))
-	    break;
-	}
-
-      if (!gsi_end_p (gsi))
-	bitmap_set_bit (contains_call, bb->index);
-    }
-
-  for (loop = current_loops->tree_root->inner; loop; loop = loop->next)
+  for (loop = requested_loop ? requested_loop : current_loops->tree_root->inner;
+       loop;
+       loop = loop->next)
     fill_always_executed_in_1 (loop, contains_call);
 }
 
@@ -3010,11 +3063,17 @@ fill_always_executed_in (void)
 /* Compute the global information needed by the loop invariant motion pass.  */
 
 static void
-tree_ssa_lim_initialize (void)
+tree_ssa_lim_initialize (class loop *req_loop)
 {
   class loop *loop;
   unsigned i;
 
+  gcc_assert (!requested_loop || requested_loop != current_loops->tree_root);
+  requested_loop = req_loop;
+
+  if (dump_file && (dump_flags & TDF_DETAILS) && requested_loop)
+    fprintf (dump_file, "Initializing LIM for loop %d.\n\n", req_loop->num);
+
   bitmap_obstack_initialize (&lim_bitmap_obstack);
   gcc_obstack_init (&mem_ref_obstack);
   lim_aux_data_map = new hash_map<gimple *, lim_aux_data *>;
@@ -3051,7 +3110,7 @@ tree_ssa_lim_initialize (void)
      its postorder index.  */
   i = 0;
   bb_loop_postorder = XNEWVEC (unsigned, number_of_loops (cfun));
-  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+  FOR_EACH_ENCLOSED_LOOP (req_loop, loop, LI_FROM_INNERMOST)
     bb_loop_postorder[loop->num] = i++;
 }
 
@@ -3086,23 +3145,32 @@ tree_ssa_lim_finalize (void)
     free_affine_expand_cache (&memory_accesses.ttae_cache);
 
   free (bb_loop_postorder);
+  requested_loop = NULL;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "LIM finished.\n\n");
 }
 
-/* Moves invariants from loops.  Only "expensive" invariants are moved out --
-   i.e. those that are likely to be win regardless of the register pressure.  */
+/* Moves invariants from LOOP in FUN.  If LOOP is NULL, perform the
+   tranformation on all loops within FUN.  Only "expensive" invariants are
+   moved out -- i.e. those that are likely to be win regardless of the register
+   pressure.  Return the pass TODO flags that need to be carried out after the
+   transformation.  */
 
-static unsigned int
-tree_ssa_lim (function *fun)
+unsigned int
+loop_invariant_motion_from_loop (function *fun, class loop *loop)
 {
   unsigned int todo = 0;
 
-  tree_ssa_lim_initialize ();
+  tree_ssa_lim_initialize (loop);
 
+  auto_sbitmap contains_call (last_basic_block_for_fn (cfun));
+  bitmap_clear (contains_call);
   /* Gathers information about memory accesses in the loops.  */
-  analyze_memory_references ();
+  analyze_memory_references (contains_call);
 
   /* Fills ALWAYS_EXECUTED_IN information for basic blocks.  */
-  fill_always_executed_in ();
+  fill_always_executed_in (contains_call);
 
   int *rpo = XNEWVEC (int, last_basic_block_for_fn (fun));
   int n = pre_and_rev_post_order_compute_fn (fun, NULL, rpo, false);
@@ -3135,6 +3203,16 @@ tree_ssa_lim (function *fun)
   return todo;
 }
 
+/* Moves invariants from all loops in FUN.  Only "expensive" invariants are
+   moved out -- i.e. those that are likely to be win regardless of the register
+   pressure.  */
+
+static unsigned int
+tree_ssa_lim (function *fun)
+{
+  return loop_invariant_motion_from_loop (fun, NULL);
+}
+
 /* Loop invariant motion pass.  */
 
 namespace {
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index e789e4fcb0b..c89a86aef4e 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -56,6 +56,8 @@ extern void tree_unroll_loop (class loop *, unsigned,
 			      edge, class tree_niter_desc *);
 extern tree canonicalize_loop_ivs (class loop *, tree *, bool);
 
+extern unsigned int loop_invariant_motion_from_loop (function *fun,
+						     class loop *loop);
 
 
 #endif /* GCC_TREE_SSA_LOOP_MANIP_H */
-- 
2.29.2



More information about the Gcc-patches mailing list