This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fix PR83323 (crafty miscompile by unroll-and-jam)


On December 8, 2017 5:44:31 PM GMT+01:00, Michael Matz <matz@suse.de> wrote:
>Hi,
>
>the predicate for feasiblity of unroll-and-jam was a bit nonsensical.  
>Properly check for non-head-controlled loops, and also check all BBs of
>
>the outer loop for any instructions that would inhibit fusion (before
>we 
>never checked header or latch).
>
>Richi also asked me to reuse the -floop-unrool-and-jam option
>(currently 
>aliased to graphites loop-nest-optimize, like a couple other old
>options), 
>instead of adding a new one with slightly different spelling.  This
>patch 
>does that in addition.
>
>Fixes the added testcase and crafty from cpu2000.  Regstrapping on 
>x86_64-linux in progress.  Okay if that passes?

OK. 

Don't you need to adjust existing unroll and jam testcases? 

Thanks, 
Richard. 

>
>Ciao,
>Michael.
>---------------------------
>    Fix PR83323
>    
>    	* gimple-loop-jam (unroll_jam_possible_p): Correct test for
>    	head-controlled loops and loop BBs.
>    	* common.opt (funroll-and-jam): Remove, instead ...
>    	(floop-unroll-and-jam): ... reuse this option.
>    	* opts.c (default_options_table): Use OPT_floop_unroll_and_jam.
>    	* doc/invoke.texi (-funroll-and-jam): Move docu to ...
>    	(-floop-unroll-and-jam): ... this option.
>    
>    testsuite/
>    	* gcc.dg/pr83323.c: New test.
>
>diff --git a/gcc/common.opt b/gcc/common.opt
>index 6fab2ab..57b3cd7 100644
>--- a/gcc/common.opt
>+++ b/gcc/common.opt
>@@ -1512,8 +1512,8 @@ Common Alias(floop-nest-optimize)
> Enable loop nest transforms.  Same as -floop-nest-optimize.
> 
> floop-unroll-and-jam
>-Common Alias(floop-nest-optimize)
>-Enable loop nest transforms.  Same as -floop-nest-optimize.
>+Common Report Var(flag_unroll_jam) Optimization
>+Perform unroll-and-jam on loops.
> 
> fgnu-tm
> Common Report Var(flag_tm)
>@@ -2695,10 +2695,6 @@ fsplit-loops
> Common Report Var(flag_split_loops) Optimization
> Perform loop splitting.
> 
>-funroll-and-jam
>-Common Report Var(flag_unroll_jam) Optimization
>-Perform unroll-and-jam on loops.
>-
> funwind-tables
> Common Report Var(flag_unwind_tables) Optimization
> Just generate unwind tables for exception handling.
>diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>index 50740c5..1413095 100644
>--- a/gcc/doc/invoke.texi
>+++ b/gcc/doc/invoke.texi
>@@ -437,7 +437,7 @@ Objective-C and Objective-C++ Dialects}.
> -ftree-reassoc  -ftree-sink  -ftree-slsr  -ftree-sra @gol
> -ftree-switch-conversion  -ftree-tail-merge @gol
> -ftree-ter  -ftree-vectorize  -ftree-vrp  -funconstrained-commons @gol
>--funit-at-a-time  -funroll-all-loops  -funroll-loops -funroll-and-jam
>@gol
>+-funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> -funsafe-math-optimizations  -funswitch-loops @gol
>-fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt
>@gol
> -fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
>@@ -8511,11 +8511,9 @@ at @option{-O} and higher.
> @item -ftree-loop-linear
> @itemx -floop-strip-mine
> @itemx -floop-block
>-@itemx -floop-unroll-and-jam
> @opindex ftree-loop-linear
> @opindex floop-strip-mine
> @opindex floop-block
>-@opindex floop-unroll-and-jam
> Perform loop nest optimizations.  Same as
>@option{-floop-nest-optimize}.  To use this code transformation, GCC
>has
> to be configured with @option{--with-isl} to enable the Graphite loop
>@@ -9789,8 +9787,8 @@ for one side of the iteration space and false for
>the other.
>Move branches with loop invariant conditions out of the loop, with
>duplicates
>of the loop on both branches (modified according to result of the
>condition).
> 
>-@item -funroll-and-jam
>-@opindex funroll-and-jam
>+@item -floop-unroll-and-jam
>+@opindex floop-unroll-and-jam
> Apply unroll and jam transoformations on feasible loops.  In a loop
>nest this unrolls the outer loop by some factor and fuses the resulting
> multiple inner loops.
>diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
>index 32f813b..8ed1bef 100644
>--- a/gcc/gimple-loop-jam.c
>+++ b/gcc/gimple-loop-jam.c
>@@ -152,7 +152,7 @@ merge_loop_tree (struct loop *loop, struct loop
>*old)
>   free (bbs);
> }
> 
>-/* BB exits the outer loop of an unroll-and-jam situation.
>+/* BB is part of the outer loop of an unroll-and-jam situation.
>  Check if any statements therein would prevent the transformation.  */
> 
> static bool
>@@ -160,9 +160,10 @@ bb_prevents_fusion_p (basic_block bb)
> {
>   gimple_stmt_iterator gsi;
>   /* BB is duplicated by outer unrolling and then all N-1 first copies
>-     move into the body of the fused inner loop.  The last copy
>remains
>-     the exit block of the outer loop and is still outside the inner
>loop
>-     also after fusion.  We can't allow this for some effects of BB:
>+     move into the body of the fused inner loop.  If BB exits the
>outer loop
>+     the last copy still doess so, and the first N-1 copies are
>cancelled
>+     by loop unrolling, so also after fusion it's the exit block.
>+     But there might be other reasons that prevent fusion:
>        * stores or unknown side-effects prevent fusion
>        * loads don't
>        * computations into SSA names: these aren't problematic.  Their
>@@ -204,6 +205,19 @@ unroll_jam_possible_p (struct loop *outer, struct
>loop *loop)
>   if (outer->inner != loop || loop->next)
>     return false;
> 
>+  /* Prevent head-controlled inner loops, that we usually have.
>+     The guard block would need to be accepted
>+     (invariant condition either entering or skipping the loop),
>+     without also accepting arbitrary control flow.  When unswitching
>+     ran before us (as with -O3) this won't be a problem because its
>+     outer loop unswitching will have moved out the invariant
>condition.
>+
>+     If we do that we need to extend fuse_loops() to cope with this
>+     by threading through the (still invariant) copied condition
>+     between the two loop copies.  */
>+  if (!dominated_by_p (CDI_DOMINATORS, outer->latch, loop->header))
>+    return false;
>+
>   /* The number of iterations of the inner loop must be loop invariant
>      with respect to the outer loop.  */
>   if (!number_of_iterations_exit (loop, single_exit (loop), &niter,
>@@ -218,23 +232,8 @@ unroll_jam_possible_p (struct loop *outer, struct
>loop *loop)
>n = get_loop_body_with_size (outer, bbs, n_basic_blocks_for_fn (cfun));
> 
>   for (i = 0; i < n; i++)
>-    {
>-      if (bbs[i]->loop_father == outer
>-	  && bbs[i] != outer->latch && bbs[i] != outer->header
>-	  && (!loop_exits_from_bb_p (outer, bbs[i])
>-	      || bb_prevents_fusion_p (bbs[i])))
>-	break;
>-      /* XXX Note that the above disallows head-controlled inner
>loops,
>-         that we usually have.  The guard block would need to be
>accepted
>-	 (invariant condition either entering or skipping the loop),
>-	 without also accepting arbitrary control flow.  When unswitching
>-	 ran before us (as with -O3) this won't be a problem because its
>-	 outer loop unswitching will have moved out the invariant condition.
>-	 
>-	 If we do that we need to extend fuse_loops() to cope with this
>-	 by threading through the (still invariant) copied condition
>-	 between the two loop copies.  */
>-    }
>+    if (bbs[i]->loop_father == outer && bb_prevents_fusion_p (bbs[i]))
>+      break;
>   free (bbs);
>   if (i != n)
>     return false;
>diff --git a/gcc/opts.c b/gcc/opts.c
>index 98fbf53..a157b5f 100644
>--- a/gcc/opts.c
>+++ b/gcc/opts.c
>@@ -536,7 +536,7 @@ static const struct default_options
>default_options_table[] =
>{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL,
>1 },
>     { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>     { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>-    { OPT_LEVELS_3_PLUS, OPT_funroll_and_jam, NULL, 1 },
>+    { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 },
>     { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>     { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
>diff --git a/gcc/testsuite/gcc.dg/pr83323.c
>b/gcc/testsuite/gcc.dg/pr83323.c
>new file mode 100644
>index 0000000..6111745
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/pr83323.c
>@@ -0,0 +1,23 @@
>+/* { dg-do run } */
>+/* { dg-options "-O3 -floop-unroll-and-jam --param
>unroll-jam-min-percent=0" } */
>+int x[1024], y[1024];
>+
>+void __attribute__((noipa)) foo ()
>+{
>+  for (int i = 0; i < 1024; ++i)
>+    {
>+      x[i] = 0;
>+      for (int j = 0; j < 1024; ++j)
>+        if (!y[j])
>+          x[i] = 1;
>+    }
>+}
>+
>+int main()
>+{
>+  y[1023] = 1;
>+  foo ();
>+  if (x[1] != 1)
>+    __builtin_abort ();
>+  return 0;
>+}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]