[Bug tree-optimization/79622] [6/7/8 Regression] Wrong code w/ -O2 -floop-nest-optimize

Fri Sep 15 07:51:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79622

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 15 Sep 2017, spop at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79622
> 
> --- Comment #4 from Sebastian Pop <spop at gcc dot gnu.org> ---
> Yes, that phi node looks like a reduction.  We need to handle the phi as a
> write to expose the loop carried reduction variable to the dependence analysis.
> I think your change goes in the right direction.  Thanks!

As an extra data point while the patch fixes the testcase when using
ISL 0.18 it still fails when using ISL 0.16.1.

The patch will apply similar handling to

  if (foo_3)
    ;
  else
    ;
 # _1 = PHI <0, 1>
 a[i] = _1;

that is, for conditional assignment of constants (or invariants).  Are
those handled elsewhere?  That is, I wonder why 
graphite_find_cross_bb_scalar_vars works on GIMPLE BBs rather than
pbbs -- isn't it that we need to handle all cross STMT (in the polyhedral
representation) scalar defs/uses as writes/reads and can only ignore
defs/uses fully contained in a pbb (rather than a BB)?  Of course
at the time we do graphite_find_cross_bb_scalar_vars the pbbs are not
yet built.  Hmm, looks like pbb and BB are the same granularity
(that feels limiting to the transforms we an apply...  I would have
expected at least each memory op to be in a separate "black box").

Anyway, with that PHIs should be in separate pbbs - if you follow
the original go-out-of-SSA approach you'd have their effects on
the CFG edges.  So a more complete fix would similarly handle uses.

Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c       (revision 252780)
+++ gcc/graphite-scop-detection.c       (working copy)
@@ -1744,7 +1744,9 @@ build_cross_bb_scalars_def (scop_p scop,
   gimple *use_stmt;
   imm_use_iterator imm_iter;
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, def)
-    if (def_bb != gimple_bb (use_stmt) && !is_gimple_debug (use_stmt))
+    if ((def_bb != gimple_bb (use_stmt) && !is_gimple_debug (use_stmt))
+       /* PHIs have their effect at "BBs" on the edges.  See PR79622.  */
+       || gimple_code (SSA_NAME_DEF_STMT (def)) == GIMPLE_PHI)
       {
        writes->safe_push (def);
        DEBUG_PRINT (dp << "Adding scalar write: ";
@@ -1758,7 +1760,8 @@ build_cross_bb_scalars_def (scop_p scop,
       }
 }

-/* Record DEF if it is used in other bbs different than DEF_BB in the 
SCOP.  */
+/* Record USE if it is defined in other bbs different than USE_STMT
+   in the SCOP.  */

 static void
 build_cross_bb_scalars_use (scop_p scop, tree use, gimple *use_stmt,
@@ -1774,7 +1777,9 @@ build_cross_bb_scalars_use (scop_p scop,
     return;

   gimple *def_stmt = SSA_NAME_DEF_STMT (use);
-  if (gimple_bb (def_stmt) != gimple_bb (use_stmt))
+  if (gimple_bb (def_stmt) != gimple_bb (use_stmt)
+      /* PHIs have their effect at "BBs" on the edges.  See PR79622.  */
+      || gimple_code (def_stmt) == GIMPLE_PHI)
     {
       DEBUG_PRINT (dp << "Adding scalar read: ";
                   print_generic_expr (dump_file, use);



That said, I did some experiments on SPEC CPU 2006 where we (before
the two patches) optimized 47 loop nests in total (with
-Ofast -march=haswell -floop-nest-optimize) and performance effects
are neutral with a bias towards regressing (bwaves definitely
regresses, the rest may be noise).  Not sure how well the old
SCOP detection did but I suppose lumping all reads/writes inside
a BB into the same 'stmt' (if I read things correctly) will
severely reduce possibilities?