This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Fix PR83323 (crafty miscompile by unroll-and-jam)
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Michael Matz <matz at suse dot de>,gcc-patches at gcc dot gnu dot org
- Date: Fri, 08 Dec 2017 17:59:08 +0100
- Subject: Re: Fix PR83323 (crafty miscompile by unroll-and-jam)
- Authentication-results: sourceware.org; auth=none
- References: <alpine.LSU.2.21.1712081730040.25295@wotan.suse.de>
On December 8, 2017 5:44:31 PM GMT+01:00, Michael Matz <matz@suse.de> wrote:
>Hi,
>
>the predicate for feasiblity of unroll-and-jam was a bit nonsensical.
>Properly check for non-head-controlled loops, and also check all BBs of
>
>the outer loop for any instructions that would inhibit fusion (before
>we
>never checked header or latch).
>
>Richi also asked me to reuse the -floop-unrool-and-jam option
>(currently
>aliased to graphites loop-nest-optimize, like a couple other old
>options),
>instead of adding a new one with slightly different spelling. This
>patch
>does that in addition.
>
>Fixes the added testcase and crafty from cpu2000. Regstrapping on
>x86_64-linux in progress. Okay if that passes?
OK.
Don't you need to adjust existing unroll and jam testcases?
Thanks,
Richard.
>
>Ciao,
>Michael.
>---------------------------
> Fix PR83323
>
> * gimple-loop-jam (unroll_jam_possible_p): Correct test for
> head-controlled loops and loop BBs.
> * common.opt (funroll-and-jam): Remove, instead ...
> (floop-unroll-and-jam): ... reuse this option.
> * opts.c (default_options_table): Use OPT_floop_unroll_and_jam.
> * doc/invoke.texi (-funroll-and-jam): Move docu to ...
> (-floop-unroll-and-jam): ... this option.
>
> testsuite/
> * gcc.dg/pr83323.c: New test.
>
>diff --git a/gcc/common.opt b/gcc/common.opt
>index 6fab2ab..57b3cd7 100644
>--- a/gcc/common.opt
>+++ b/gcc/common.opt
>@@ -1512,8 +1512,8 @@ Common Alias(floop-nest-optimize)
> Enable loop nest transforms. Same as -floop-nest-optimize.
>
> floop-unroll-and-jam
>-Common Alias(floop-nest-optimize)
>-Enable loop nest transforms. Same as -floop-nest-optimize.
>+Common Report Var(flag_unroll_jam) Optimization
>+Perform unroll-and-jam on loops.
>
> fgnu-tm
> Common Report Var(flag_tm)
>@@ -2695,10 +2695,6 @@ fsplit-loops
> Common Report Var(flag_split_loops) Optimization
> Perform loop splitting.
>
>-funroll-and-jam
>-Common Report Var(flag_unroll_jam) Optimization
>-Perform unroll-and-jam on loops.
>-
> funwind-tables
> Common Report Var(flag_unwind_tables) Optimization
> Just generate unwind tables for exception handling.
>diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>index 50740c5..1413095 100644
>--- a/gcc/doc/invoke.texi
>+++ b/gcc/doc/invoke.texi
>@@ -437,7 +437,7 @@ Objective-C and Objective-C++ Dialects}.
> -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
> -ftree-switch-conversion -ftree-tail-merge @gol
> -ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol
>--funit-at-a-time -funroll-all-loops -funroll-loops -funroll-and-jam
>@gol
>+-funit-at-a-time -funroll-all-loops -funroll-loops @gol
> -funsafe-math-optimizations -funswitch-loops @gol
>-fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt
>@gol
> -fweb -fwhole-program -fwpa -fuse-linker-plugin @gol
>@@ -8511,11 +8511,9 @@ at @option{-O} and higher.
> @item -ftree-loop-linear
> @itemx -floop-strip-mine
> @itemx -floop-block
>-@itemx -floop-unroll-and-jam
> @opindex ftree-loop-linear
> @opindex floop-strip-mine
> @opindex floop-block
>-@opindex floop-unroll-and-jam
> Perform loop nest optimizations. Same as
>@option{-floop-nest-optimize}. To use this code transformation, GCC
>has
> to be configured with @option{--with-isl} to enable the Graphite loop
>@@ -9789,8 +9787,8 @@ for one side of the iteration space and false for
>the other.
>Move branches with loop invariant conditions out of the loop, with
>duplicates
>of the loop on both branches (modified according to result of the
>condition).
>
>-@item -funroll-and-jam
>-@opindex funroll-and-jam
>+@item -floop-unroll-and-jam
>+@opindex floop-unroll-and-jam
> Apply unroll and jam transoformations on feasible loops. In a loop
>nest this unrolls the outer loop by some factor and fuses the resulting
> multiple inner loops.
>diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
>index 32f813b..8ed1bef 100644
>--- a/gcc/gimple-loop-jam.c
>+++ b/gcc/gimple-loop-jam.c
>@@ -152,7 +152,7 @@ merge_loop_tree (struct loop *loop, struct loop
>*old)
> free (bbs);
> }
>
>-/* BB exits the outer loop of an unroll-and-jam situation.
>+/* BB is part of the outer loop of an unroll-and-jam situation.
> Check if any statements therein would prevent the transformation. */
>
> static bool
>@@ -160,9 +160,10 @@ bb_prevents_fusion_p (basic_block bb)
> {
> gimple_stmt_iterator gsi;
> /* BB is duplicated by outer unrolling and then all N-1 first copies
>- move into the body of the fused inner loop. The last copy
>remains
>- the exit block of the outer loop and is still outside the inner
>loop
>- also after fusion. We can't allow this for some effects of BB:
>+ move into the body of the fused inner loop. If BB exits the
>outer loop
>+ the last copy still doess so, and the first N-1 copies are
>cancelled
>+ by loop unrolling, so also after fusion it's the exit block.
>+ But there might be other reasons that prevent fusion:
> * stores or unknown side-effects prevent fusion
> * loads don't
> * computations into SSA names: these aren't problematic. Their
>@@ -204,6 +205,19 @@ unroll_jam_possible_p (struct loop *outer, struct
>loop *loop)
> if (outer->inner != loop || loop->next)
> return false;
>
>+ /* Prevent head-controlled inner loops, that we usually have.
>+ The guard block would need to be accepted
>+ (invariant condition either entering or skipping the loop),
>+ without also accepting arbitrary control flow. When unswitching
>+ ran before us (as with -O3) this won't be a problem because its
>+ outer loop unswitching will have moved out the invariant
>condition.
>+
>+ If we do that we need to extend fuse_loops() to cope with this
>+ by threading through the (still invariant) copied condition
>+ between the two loop copies. */
>+ if (!dominated_by_p (CDI_DOMINATORS, outer->latch, loop->header))
>+ return false;
>+
> /* The number of iterations of the inner loop must be loop invariant
> with respect to the outer loop. */
> if (!number_of_iterations_exit (loop, single_exit (loop), &niter,
>@@ -218,23 +232,8 @@ unroll_jam_possible_p (struct loop *outer, struct
>loop *loop)
>n = get_loop_body_with_size (outer, bbs, n_basic_blocks_for_fn (cfun));
>
> for (i = 0; i < n; i++)
>- {
>- if (bbs[i]->loop_father == outer
>- && bbs[i] != outer->latch && bbs[i] != outer->header
>- && (!loop_exits_from_bb_p (outer, bbs[i])
>- || bb_prevents_fusion_p (bbs[i])))
>- break;
>- /* XXX Note that the above disallows head-controlled inner
>loops,
>- that we usually have. The guard block would need to be
>accepted
>- (invariant condition either entering or skipping the loop),
>- without also accepting arbitrary control flow. When unswitching
>- ran before us (as with -O3) this won't be a problem because its
>- outer loop unswitching will have moved out the invariant condition.
>-
>- If we do that we need to extend fuse_loops() to cope with this
>- by threading through the (still invariant) copied condition
>- between the two loop copies. */
>- }
>+ if (bbs[i]->loop_father == outer && bb_prevents_fusion_p (bbs[i]))
>+ break;
> free (bbs);
> if (i != n)
> return false;
>diff --git a/gcc/opts.c b/gcc/opts.c
>index 98fbf53..a157b5f 100644
>--- a/gcc/opts.c
>+++ b/gcc/opts.c
>@@ -536,7 +536,7 @@ static const struct default_options
>default_options_table[] =
>{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL,
>1 },
> { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>- { OPT_LEVELS_3_PLUS, OPT_funroll_and_jam, NULL, 1 },
>+ { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
>diff --git a/gcc/testsuite/gcc.dg/pr83323.c
>b/gcc/testsuite/gcc.dg/pr83323.c
>new file mode 100644
>index 0000000..6111745
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/pr83323.c
>@@ -0,0 +1,23 @@
>+/* { dg-do run } */
>+/* { dg-options "-O3 -floop-unroll-and-jam --param
>unroll-jam-min-percent=0" } */
>+int x[1024], y[1024];
>+
>+void __attribute__((noipa)) foo ()
>+{
>+ for (int i = 0; i < 1024; ++i)
>+ {
>+ x[i] = 0;
>+ for (int j = 0; j < 1024; ++j)
>+ if (!y[j])
>+ x[i] = 1;
>+ }
>+}
>+
>+int main()
>+{
>+ y[1023] = 1;
>+ foo ();
>+ if (x[1] != 1)
>+ __builtin_abort ();
>+ return 0;
>+}