[PATCH] disable aggressive_loop_optimizations until niter ready

Thu Jan 13 13:39:58 GMT 2022

On Thu, 13 Jan 2022, guojiufu wrote:

> On 2022-01-03 22:30, Richard Biener wrote:
> > On Wed, 22 Dec 2021, Jiufu Guo wrote:
> > 
> >> Hi,
> >> 
> >> Normaly, estimate_numbers_of_iterations get/caculate niter first,
> >> and then invokes infer_loop_bounds_from_undefined. While in some case,
> >> after a few call stacks, estimate_numbers_of_iterations is invoked before
> >> niter is ready (e.g. before number_of_latch_executions returns).
> >> 
> >> e.g. number_of_latch_executions->...follow_ssa_edge_expr-->
> >>   --> estimate_numbers_of_iterations --> 
> >> infer_loop_bounds_from_undefined.
> >> 
> >> Since niter is still not computed, call to infer_loop_bounds_from_undefined
> >> may not get final result.
> >> To avoid infer_loop_bounds_from_undefined to be called with interim state
> >> and avoid infer_loop_bounds_from_undefined generates interim data, during
> >> niter's computing, we could disable flag_aggressive_loop_optimizations.
> >> 
> >> Bootstrap and regtest pass on ppc64* and x86_64.  Is this ok for trunk?
> > 
> > So this is a optimality fix, not a correctness one?  I suppose the
> > estimates are computed/used from scev_probably_wraps_p via
> > loop_exits_before_overflow and ultimatively chrec_convert.
> > 
> > We have a call cycle here,
> > 
> > estimate_numbers_of_iterations -> number_of_latch_executions ->
> > ... -> estimate_numbers_of_iterations
> > 
> > where the first estimate_numbers_of_iterations will make sure
> > the later call will immediately return.
> 
> Hi Richard,
> Thanks for your comments! And sorry for the late reply.
> 
> In estimate_numbers_of_iterations, there is a guard to make sure
> the second call to estimate_numbers_of_iterations returns
> immediately.
> 
> Exactly as you said, it relates to scev_probably_wraps_p calls
> loop_exits_before_overflow.
> 
> The issue is: the first calling to estimate_numbers_of_iterations
> maybe inside number_of_latch_executions.
> 
> > 
> > I'm not sure what your patch tries to do - it seems to tackle
> > the case where we enter the cycle via number_of_latch_executions?
> > Why do we get "non-final" values?  idx_infer_loop_bounds resorts
> 
> Right, when the call cycle starts from number_of_latch_execution,
> the issue may occur:
> 
> number_of_latch_executions(*1st call)->..->
> analyze_scalar_evolution(IVs 1st) ->..follow_ssa_edge_expr..->
> loop_exits_before_overflow->
> estimate_numbers_of_iterations (*1st call)->
> number_of_latch_executions(*2nd call)->..->
> analyze_scalar_evolution(IVs 2nd)->..loop_exits_before_overflow->
> estimate_numbers_of_iterations(*2nd call)
> 
> The second calling to estimate_numbers_of_iterations returns quickly.
> And then, in the first calling to estimate_numbers_of_iterations,
> infer_loop_bounds_from_undefined is invoked.
> 
> And, function "infer_loop_bounds_from_undefined" instantiate/analyze
> SCEV for each SSA in the loop.
> *Here the issue occur*, these SCEVs are based on the interim IV's
> SCEV which come from "analyze_scalar_evolution(IVs 2nd)",
> and those IV's SCEV will be overridden by up level
> "analyze_scalar_evolution(IVs 1st)".

OK, so indeed analyze_scalar_evolution is not protected against
recursive invocation on the same SSA name (though it definitely
doesn't expect to do that).  We could fix that by pre-seeding
the cache conservatively in analyze_scalar_evolution or by
not overwriting the cached result of the recursive invocation.

But to re-iterate an unanswered question, is this a correctness issue
or an optimization issue?

> To handle this issue, disabling flag_aggressive_loop_optimizations
> inside number_of_latch_executions is one method.
> To avoid the issue in other cases, e.g. the call cycle starts from
> number_of_iterations_exit or number_of_iterations_exit_assumptions,
> this patch disable flag_aggressive_loop_optimizations inside
> number_of_iterations_exit_assumptions.

But disabling flag_aggressive_loop_optimizations is a very
non-intuitive way of avoiding recursive calls.  I'd rather
avoid those in a similar way estimate_numbers_of_iterations does,
for example with

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 61d72c278a1..cc1e510b6c2 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -2807,7 +2807,7 @@ number_of_latch_executions (class loop *loop)
   if (dump_file && (dump_flags & TDF_SCEV))
     fprintf (dump_file, "(number_of_iterations_in_loop = \n");
 
-  res = chrec_dont_know;
+  loop->nb_iterations = res = chrec_dont_know;
   exit = single_exit (loop);
 
   if (exit && number_of_iterations_exit (loop, exit, &niter_desc, false))

though this doesn't seem to improve the SCEV analysis with your
testcase.  Alternatively one could more conciously compute an
"estimated" estimate like with

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 61d72c278a1..8529c44d574 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -2802,6 +2802,19 @@ number_of_latch_executions (class loop *loop)
   if (res)
     return res;
 
+  /* When estimates for this loop are not computed yet compute them
+     without resorting to niter analysis results which means w/o
+     flag_aggressive_loop_optimizations.  */
+  if (loop->estimate_state == EST_NOT_COMPUTED)
+    {
+      bool saved_flag_aggressive_loop_optimizations
+       = flag_aggressive_loop_optimizations;
+      flag_aggressive_loop_optimizations = false;
+      estimate_numbers_of_iterations (loop);
+      flag_aggressive_loop_optimizations
+       = saved_flag_aggressive_loop_optimizations;
+    }
+
   may_be_zero = NULL_TREE;
 
but that also doesn't change your testcase for me.  Applying your
patch does the trick but then I really wonder why.

(get_scalar_evolution
  (scalar = lv_10)
  (scalar_evolution = {(int) start_7(D), +, 1}_1))

from

  <bb 3> [local count: 955630225]:
  # i_15 = PHI <i_12(6), start_7(D)(5)>
  lv_10 = (int) i_15;
  i.1_3 = (unsigned short) i_15;
  _4 = i.1_3 + 1;
  i_12 = (short int) _4;

where the 'i' IV can wrap when start >= end but that's ruled out
by the entry test.  The scalar evolution for lv_10 would be
wrong if we didn't conclude that so I guess this optimization
is provided by the estimate somehow.

I also tried

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 935d2d4d8f3..d1787ba39b6 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -8093,6 +8093,8 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, 
class loop *loop,
       fprintf (dump_file, "\n");
     }
 
+  estimate_numbers_of_iterations (loop);
+
   body = get_loop_body (loop);
   data->body_includes_call = loop_body_includes_call (body, 
loop->num_nodes);
   renumber_gimple_stmt_uids_in_blocks (body, loop->num_nodes);

to get into the cycles from the "correct" entry but that does
not help either.  Nor does

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index b767056aeb0..f058be04836 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2534,6 +2534,14 @@ number_of_iterations_exit_assumptions (class loop 
*loop, edge exit,
       && !POINTER_TYPE_P (type))
     return false;
 
+  if (loop->estimate_state == EST_NOT_COMPUTED)
+    {
+      bool saved = flag_aggressive_loop_optimizations;
+      flag_aggressive_loop_optimizations = false;
+      estimate_numbers_of_iterations (loop);
+      flag_aggressive_loop_optimizations = saved;
+    }
+
   tree iv0_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
                              op0, &iv0, safe ? &iv0_niters : NULL, 
false))

or

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 61d72c278a1..d1af89eb459 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -518,7 +518,8 @@ set_scalar_evolution (basic_block instantiated_below, 
tree scalar, tree chrec)
        nb_set_scev++;
     }
 
-  *scalar_info = chrec;
+  if (*scalar_info == chrec_not_analyzed_yet)
+    *scalar_info = chrec;
 }
 
 /* Retrieve the chrec associated to SCALAR instantiated below

That said, having the cycles is bad, we should definitively break
them (if only for compile-time reasons).  But I don't really
understand how your patch helps and my attempts do not which
means I am missing a critical piece of understanding ... :/

Thanks,
Richard.

> Thanks again.
> 
> BR,
> Jiufu
> 
> > to SCEV and thus may recurse again - to me it would be more
> > logical to try avoid recursing in number_of_latch_executions by
> > setting ->nb_iterations to something early, maybe chrec_dont_know,
> > to signal we're using something we're just trying to compute.
> > 
> > Richard.
> > 
> >> BR,
> >> Jiufu
> >> 
> >> gcc/ChangeLog:
> >> 
> >>  * tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions):
> >>  Disable/restore flag_aggressive_loop_optimizations.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >>  * gcc.dg/tree-ssa/scev-16.c: New test.
> >> 
> >> ---
> >>  gcc/tree-ssa-loop-niter.c               | 23 +++++++++++++++++++----
> >>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 20 ++++++++++++++++++++
> >>  2 files changed, 39 insertions(+), 4 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> >> 
> >> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> >> index 06954e437f5..51bb501019e 100644
> >> --- a/gcc/tree-ssa-loop-niter.c
> >> +++ b/gcc/tree-ssa-loop-niter.c
> >> @@ -2534,18 +2534,31 @@ number_of_iterations_exit_assumptions (class loop
> >> *loop, edge exit,
> >>        && !POINTER_TYPE_P (type))
> >>      return false;
> >> 
> >> +  /* Before niter is calculated, avoid to analyze interim state. */
> >> +  int old_aggressive_loop_optimizations =
> >> flag_aggressive_loop_optimizations;
> >> +  flag_aggressive_loop_optimizations = 0;
> >> +
> >>    tree iv0_niters = NULL_TREE;
> >>    if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
> >> 			      op0, &iv0, safe ? &iv0_niters : NULL, false))
> >> -    return number_of_iterations_popcount (loop, exit, code, niter);
> >> +    {
> >> +      bool res = number_of_iterations_popcount (loop, exit, code, niter);
> >> +      flag_aggressive_loop_optimizations =
> >> old_aggressive_loop_optimizations;
> >> +      return res;
> >> +    }
> >>    tree iv1_niters = NULL_TREE;
> >>    if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
> >> 			      op1, &iv1, safe ? &iv1_niters : NULL, false))
> >> -    return false;
> >> +    {
> >> +      flag_aggressive_loop_optimizations =
> >> old_aggressive_loop_optimizations;
> >> +      return false;
> >> +    }
> >>    /* Give up on complicated case.  */
> >>    if (iv0_niters && iv1_niters)
> >> -    return false;
> >> -
> >> +    {
> >> +      flag_aggressive_loop_optimizations =
> >> old_aggressive_loop_optimizations;
> >> +      return false;
> >> +    }
> >>    /* We don't want to see undefined signed overflow warnings while
> >>       computing the number of iterations.  */
> >>    fold_defer_overflow_warnings ();
> >> @@ -2565,6 +2578,7 @@ number_of_iterations_exit_assumptions (class loop
> >> *loop, edge exit,
> >>      				  only_exit_p, safe))
> >>      {
> >>        fold_undefer_and_ignore_overflow_warnings ();
> >> +      flag_aggressive_loop_optimizations =
> >> old_aggressive_loop_optimizations;
> >>        return false;
> >>      }
> >> 
> >> @@ -2608,6 +2622,7 @@ number_of_iterations_exit_assumptions (class loop
> >> *loop, edge exit,
> >>              niter->may_be_zero);
> >> 
> >>    fold_undefer_and_ignore_overflow_warnings ();
> >> +  flag_aggressive_loop_optimizations = old_aggressive_loop_optimizations;
> >> 
> >>    /* If NITER has simplified into a constant, update MAX.  */
> >>    if (TREE_CODE (niter->niter) == INTEGER_CST)
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> >> b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> >> new file mode 100644
> >> index 00000000000..708ffab88ca
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> >> @@ -0,0 +1,20 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-ivopts-scev" } */
> >> +
> >> +/* Expect scalar_evolution = {(int) start_7(D), +, 1}_1), instead
> >> +   (int) (short int) {(unsigned short) start_7(D), +, 1}_1 */
> >> +
> >> +int arr[1000];
> >> +
> >> +void
> >> +s2i (short start, short end)
> >> +{
> >> +  int res = 0;
> >> +  for (short i = start; i < end; i++)
> >> +    {
> >> +      int lv = i;
> >> +      arr[lv] += lv;
> >> +    }
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "scalar_evolution = \{\\(int\\)
> >> start_\[0-9\]+\\(D\\), \\+, 1\}_1" 1 "ivopts" } } */
> >> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)