[patch] PR middle-end/39326 - limit LIM

Richard Biener richard.guenther@gmail.com
Mon Mar 11 09:52:00 GMT 2013

On Sun, Mar 10, 2013 at 2:08 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hello,
> The attached patch fixes another one of the scalability problems
> reported in PR middle-end/39326.
> This problem is that tree loop-invariant code motion explodes on basic
> blocks with many memory references. Compile time is quadratic in the
> number of memory references in the basic block, and so are the memory
> requirements when the dependences or independences are propagated
> bottom-up through the loop tree.
> The fix is to give up on loops with too many memory references to
> handle. There is already a param for that for loop dependence
> analysis, and this patch makes LIM use the same param.
> Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu.
> OK for trunk?


+   well.  Return true if all is well, false if something happened
+   that is fatal to the rest of the LIM pass.  */

-static void
+static bool
 gather_mem_refs_stmt (struct loop *loop, gimple stmt)


   FOR_EACH_BB (bb)
+      for (bsi = gsi_start_bb (bb);
+          !gsi_end_p (bsi) && all_ok;
+          gsi_next (&bsi))
+       all_ok = gather_mem_refs_stmt (loop, gsi_stmt (bsi));
+      if (! all_ok)
+        bitmap_set_bit (loops_with_too_many_memrefs, loop->num);
+    }
+  /* Propagate the information about loops with too many memory
+     references up the loop hierarchy.  */
+    {
+      struct loop *outer = loop_outer (loop);
+      if (outer == current_loops->tree_root
+         || ! bitmap_bit_p (loops_with_too_many_memrefs, loop->num))
+       continue;
+      bitmap_set_bit (loops_with_too_many_memrefs, outer->num);

I don't see how this propagation works correctly as you start to mark
BBs as not-ok starting from a "random" basic-block in the loop tree.
You of course also end up disqualifying very small loops completely
if they happen to be analyzed after a very big one you disqualify.
Of course that's partly because memory_accesses contains all
memory accesses in the function - so I think rather than limiting
on length of memory_accesses you want to limit on the length of
memory_accesses.refs_in_loop (well, on memory_accesses.all_refs_in_loop).
And you want the initial walk over all BBs to instead walk on BBs
FOR_EACH_LOOP and LI_FROM_INNERMOST (you can then do the
propagation to fill all_refs_in_loop there, too).  I realize there isn't a good
existing BB walker for this (with a suitable place to re-set all-ok) - a simple
walk over get_loop_body via LI_FROM_INNERMOST would get you
visit sub-loop bodies multiple times (easily skipped by a bb->loop_father
test, of course, but still ...)

That is, sth like the following preparatory patch to change the iteration:

Index: gcc/tree-ssa-loop-im.c
--- gcc/tree-ssa-loop-im.c      (revision 196547)
+++ gcc/tree-ssa-loop-im.c      (working copy)
@@ -1629,29 +1629,30 @@ gather_mem_refs_in_loops (void)
   loop_iterator li;
   bitmap lrefs, alrefs, alrefso;

-  FOR_EACH_BB (bb)
-    {
-      loop = bb->loop_father;
-      if (loop == current_loops->tree_root)
-       continue;
-      for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
-       gather_mem_refs_stmt (loop, gsi_stmt (bsi));
-    }
-  /* Propagate the information about accessed memory references up
-     the loop hierarchy.  */
+      basic_block bbs = get_loop_body (loop);
+      for (i = 0; i < loop->num_nodes; ++i)
+       {
+         bb = bbs[i];
+         if (bb->loop_father != loop)
+           continue;
+         for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+           gather_mem_refs_stmt (loop, gsi_stmt (bsi));
+       }
+      free (bbs);
+      /* Propagate the information about accessed memory references up
+        the loop hierarchy.  */
       lrefs = memory_accesses.refs_in_loop[loop->num];
       alrefs = memory_accesses.all_refs_in_loop[loop->num];
       bitmap_ior_into (alrefs, lrefs);

-      if (loop_outer (loop) == current_loops->tree_root)
-       continue;
-      alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num];
-      bitmap_ior_into (alrefso, alrefs);
+      if (loop_outer (loop) != current_loops->tree_root)
+       {
+         alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num];
+         bitmap_ior_into (alrefso, alrefs);
+       }

At this point this should be stage1 material, eventually backported for 4.8.1.

And yes, aside from the above the rest of the patch looks good to me
(move loops_with_too_many_memrefs into the memory_accesses struct?)


> Ciao!
> Steven
> (The ChangeLog is a bit long but the patch is relatively straight-forward.)
>         * tree-flow.h (enum move_pos): Moved to tree-ssa-loop-im.c.
>         * tree-ssa-loop-im.c: Include diagnostic-core.h for warning_at()
>         (enum move_pos): Moved here.
>         (movement_possibility): Made static.  Reject memory operations in
>         loops with too many memory references to handle.
>         (determine_max_movement): Take loops_with_too_many_memrefs argument.
>         For statements referencing memory, find the outermost superloop
>         that is not in the loops_with_too_many_memrefs set.
>         (determine_invariantness_stmt): Take loops_with_too_many_memrefs
>         via dom_walk_data.global_data, and pass it along where necessary.
>         Hoist "pos == MOVE_POSSIBLE" test.
>         (determine_invariantness): Take loops_with_too_many_memrefs argument.
>         (move_computations): Likewise, but unused for now.
>         (gather_mem_refs_stmt): Fail if there are too many memory references.
>         Use PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS as threshold.  Add disabled
>         optimization warning.
>         (gather_mem_refs_in_loops): Take loops_with_too_many_memrefs argument.
>         Propagate it from inner loops to outer loops.  Do not propagate
>         recorded memory references for loops on which memory optimizations
>         are disabled.
>         (create_vop_ref_mapping): Take loops_with_too_many_memrefs argument.
>         Don't create a mapping on loops that are in this set.
>         (analyze_memory_references): Take loops_with_too_many_memrefs argument
>         and let subroutines fill it.
>         (store_motion): Take loops_with_too_many_memrefs argument.
>         Skip loops that are in this set.
>         (tree_ssa_lim): Allocate, pass, and free loops_with_too_many_memrefs.

More information about the Gcc-patches mailing list