Bug 68021 - [6 Regression] ice in rewrite_use_nonlinear_expr with -O3
Summary: [6 Regression] ice in rewrite_use_nonlinear_expr with -O3
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 6.0
: P1 normal
Target Milestone: 6.0
Assignee: bin cheng
URL:
Keywords:
: 69163 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-10-19 18:07 UTC by David Binderman
Modified: 2017-04-11 08:16 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.9.3, 5.2.0
Known to fail: 6.0
Last reconfirmed: 2016-01-04 00:00:00


Attachments
C++ source code (55.09 KB, text/plain)
2015-10-19 18:07 UTC, David Binderman
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Binderman 2015-10-19 18:07:38 UTC
Created attachment 36540 [details]
C++ source code

For today's trunk gcc on x64_64

$ ../results/bin/g++ -O3 -c bug237.cc
svga_cirrus.cc: In function ‘void bitblt_rop_bkwd_src_and_dst(Bit8u*, const Bit8u*, int, int, int, int)’:
svga_cirrus.cc:3304:13: internal compiler error: in rewrite_use_nonlinear_expr, at tree-ssa-loop-ivopts.c:6916
0xf044d4 rewrite_use_nonlinear_expr
	../../src/trunk/gcc/tree-ssa-loop-ivopts.c:6916
0xf044d4 rewrite_use
	../../src/trunk/gcc/tree-ssa-loop-ivopts.c:7177
0xf044d4 rewrite_uses
	../../src/trunk/gcc/tree-ssa-loop-ivopts.c:7210
0xf0b190 tree_ssa_iv_optimize_loop
	../../src/trunk/gcc/tree-ssa-loop-ivopts.c:7550
0xf0b190 tree_ssa_iv_optimize()
	../../src/trunk/gcc/tree-ssa-loop-ivopts.c:7582
0xf24fc0 execute
	../../src/trunk/gcc/tree-ssa-loop.c:412
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

tree-ssa-loop-ivopts.c:6916 is

  gcc_assert (comp != NULL_TREE);
Comment 1 Markus Trippelsdorf 2015-10-19 19:10:36 UTC
Needs -O3:

char a;
int b;
void fn1(char *p1, int p2, int p3) {
  int x;
  for (; b;) {
    for (x = 0; x < p3; x++) {
      *p1 = a;
      p1--;
    }
    p1 += p2;
  }
}
Comment 2 Marek Polacek 2015-10-20 09:09:04 UTC
Started with r228599.
Comment 3 Yuri Rumyantsev 2015-10-20 16:05:00 UTC
It looks like unswitching of outer loops pass simply triggers the issue and this tree-ssa-loop-ivopts issue.
Comment 4 Yuri Rumyantsev 2015-10-21 09:48:16 UTC
Indeed, there is an issue with outer-loop unswitching - it should not be performed for infinite loops. But if we slightly modify test if finite outer-loop we will get the same error: 
char a;
void fn1(char *p1, int p2, int p3) {
  int i, x;
  for (i = 0; i < 10; i++) {
    for (x = 0; x < p3; x++) {
      *p1 = a;
      p1--;
    }
    p1 += p2;
  }
}

I assume that  sccp is responsible for it.
Comment 5 Yuri Rumyantsev 2015-10-21 12:46:42 UTC
Hi HJ

I added my evaluation to bug. It looks like my changes are not
responsible for ICE.

2015-10-20 13:20 GMT+03:00 hjl.tools at gmail dot com
<gcc-bugzilla@gcc.gnu.org>:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68021
>
> H.J. Lu <hjl.tools at gmail dot com> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |ysrumyan at gmail dot com
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 6 Jakub Jelinek 2015-11-30 16:40:33 UTC
Even the #c4 testcase started to ICE with r228599 though.
Comment 7 Markus Trippelsdorf 2016-01-06 15:44:36 UTC
*** Bug 69163 has been marked as a duplicate of this bug. ***
Comment 8 Jakub Jelinek 2016-02-04 17:51:46 UTC
Seems the problem is that we have use->iv->step
(_2 - (sizetype) ((unsigned int) p3_11(D) + 4294967295)) + 18446744073709551615;
and cand->iv->step
(_2 - (sizetype) ((unsigned int) p3_11(D) + 4294967295)) - 1;
Both are really equivalent, but when constant_multiple_of is called on these, it calls operand_equal_p and that will return they are not equal and thus not multiple of each other.
This boils down to the canonicalization we do on MINUS_EXPR:
      /* A - B -> A + (-B) if B is easily negatable.  */
      if (negate_expr_p (op1)
          && ! TYPE_OVERFLOW_SANITIZED (type)
          && ((FLOAT_TYPE_P (type)
               /* Avoid this transformation if B is a positive REAL_CST.  */
               && (TREE_CODE (op1) != REAL_CST
                   || REAL_VALUE_NEGATIVE (TREE_REAL_CST (op1))))
              || INTEGRAL_TYPE_P (type)))
        return fold_build2_loc (loc, PLUS_EXPR, type,
                                fold_convert_loc (loc, type, arg0),
                                negate_expr (op1));
but don't really do it in associate_trees.
The caller of associate_trees has worrying comments on that though:
              /* Preserve the MINUS_EXPR if the negative part of the literal is
                 greater than the positive part.  Otherwise, the multiplicative
                 folding code (i.e extract_muldiv) may be fooled in case
                 unsigned constants are subtracted, like in the following
                 example: ((X*2 + 4) - 8U)/2.  */
which is pretty much the case here, - 1 constant is smaller than + (-1UL).
So, are we ok that this isn't really canonicalized always the same?
Shall operand_equal_p be treating those two cases as equal (perhaps under some flag)?  Or shall constant_multiple_of be using some different comparison?
Or shall something much earlier in the IVOPTS code just give up if the steps aren't really equal/multiples of each other?
Comment 9 bin cheng 2016-02-04 17:57:18 UTC
Sorry for missing this.  I will study the case.
Comment 10 bin cheng 2016-02-04 20:08:01 UTC
This is an ivopt bug all the time.  As designed, ivopt tries to identify and reuse induction variables in the original input.  Apparently we don't need to compute such original biv with new code because the computation is already there.  Every time an iv use is rewritten, ivopt checks if the candidate is an original biv using below code:

  /* An important special case -- if we are asked to express value of
     the original iv by itself, just exit; there is no need to
     introduce a new computation (that might also need casting the
     variable to unsigned and back).  */
  if (cand->pos == IP_ORIGINAL
      && cand->incremented_at == use->stmt)
    {
      enum tree_code stmt_code;

      gcc_assert (is_gimple_assign (use->stmt));
      gcc_assert (gimple_assign_lhs (use->stmt) == cand->var_after);

      /* Check whether we may leave the computation unchanged.
	 This is the case only if it does not rely on other
	 computations in the loop -- otherwise, the computation
	 we rely upon may be removed in remove_unused_ivs,
	 thus leading to ICE.  */
      stmt_code = gimple_assign_rhs_code (use->stmt);
      if (stmt_code == PLUS_EXPR
	  || stmt_code == MINUS_EXPR
	  || stmt_code == POINTER_PLUS_EXPR)
	{
	  if (gimple_assign_rhs1 (use->stmt) == cand->var_before)
	    op = gimple_assign_rhs2 (use->stmt);
	  else if (gimple_assign_rhs2 (use->stmt) == cand->var_before)
	    op = gimple_assign_rhs1 (use->stmt);
	  else
	    op = NULL_TREE;
	}
      else
	op = NULL_TREE;

      if (op && expr_invariant_in_loop_p (data->current_loop, op))
	return;
    }

Note this code can only handle specific form biv, in which there is an explicit increment stmt in the form of "biv_after = biv_before + step".

Unfortunately, in rare case like this, the biv is increased in two stmts, like:
  biv_x = biv_before + step_part_1;
  biv_after = biv_x + step_part_2;

That's why control flow goes to ICE point.  We don't need to fix the ICE point because:
1) We shouldn't rewrite biv candidate.  Even there is no correctness issue, it will introduce redundant code by rewriting it.
2) For non biv candidate, all the computation at ICE point has already been done at iv cost computation part.  In other words, if control flow goes here, gcc_assert (comp != NULL" will be true.

Back to this issue, I have two possible fixes.  First one is to specially rewrite mentioned increment stmts into:
  biv_after = biv_before + step
This fix need more work because after candidate creation point we need to compute step in loop pre-header.

Another fix is just don't add biv for such case.  In this way, we check stricter condition when adding biv candidate, thus control flow doesn't go to ICE point.  It won't cause worse code since we add exact the same candidate anyway (which is non biv candidate).  As a matter of fact, we create/use another candidate which has the same {base, step} as the biv.  The computation of biv now becomes dead code and removed by following passes.

Testing a patch using 2nd method.
Comment 11 Jakub Jelinek 2016-02-05 11:03:55 UTC
void bar (void);

void
foo (int p2, int p3)
{
  unsigned long a = p2;
  unsigned long b = (~(unsigned long) ((unsigned int) p3 + -1U)) + a;
  unsigned long c = (a - (unsigned long) ((unsigned int) p3 + -1U)) + (-1UL);
  if (b != c)
    bar ();
}

shows this folding inconsistency in the *.original dump:
  long unsigned int a = (long unsigned int) p2;
  long unsigned int b = (a - (long unsigned int) ((unsigned int) p3 + 4294967295)) - 1;
  long unsigned int c = (a - (long unsigned int) ((unsigned int) p3 + 4294967295)) + 18446744073709551615;
but *.gimple already canonicalizes it, so that it does + 18446744073709551615 at the end in both cases.
Comment 12 Richard Biener 2016-02-05 11:14:22 UTC
(In reply to Jakub Jelinek from comment #11)
> void bar (void);
> 
> void
> foo (int p2, int p3)
> {
>   unsigned long a = p2;
>   unsigned long b = (~(unsigned long) ((unsigned int) p3 + -1U)) + a;
>   unsigned long c = (a - (unsigned long) ((unsigned int) p3 + -1U)) + (-1UL);
>   if (b != c)
>     bar ();
> }
> 
> shows this folding inconsistency in the *.original dump:
>   long unsigned int a = (long unsigned int) p2;
>   long unsigned int b = (a - (long unsigned int) ((unsigned int) p3 +
> 4294967295)) - 1;
>   long unsigned int c = (a - (long unsigned int) ((unsigned int) p3 +
> 4294967295)) + 18446744073709551615;
> but *.gimple already canonicalizes it, so that it does +
> 18446744073709551615 at the end in both cases.

associate_trees doesn't re-fold the result (due to fear of recursion I guess)
Comment 13 bin cheng 2016-02-08 15:41:09 UTC
Hmm, I posted a patch at https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00447.html
But after digging deeper I think the posted is unnecessary and it should be fixed in fold stuff.

The ivopt logic is
1) create iv use for original biv candidate chosen by the algorithm.
2) rewrite iv use created in 1) using the original biv in rewrite_use_nonlinear_expr.  (Here we use nonlinear use as the example)
3) If the iv use is exact the candidate increment stmt, then skip rewriting since the computation is already there.
4) If that's not the case (in 3), it falls through  and uses get_computation to rewrite the use.  As commented in #10, this may introduce redundant code, we rely on following passes to remove them.

So for this case, the iv use created is direct for an original biv var.  The values of the iv use and the original biv are same to each other.  Just the iv use will be rewrite in a different form of computation to the original biv.

In other words, it is guaranteed that the iv use has same value as original candidate when control flow comes to the ICE point.  The ICE really is caused by inconsistent fold behavior.

So IMHO, we may have to fix the fold part...
Comment 14 Jakub Jelinek 2016-02-08 15:48:06 UTC
(In reply to amker from comment #13)
> Hmm, I posted a patch at
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00447.html
> But after digging deeper I think the posted is unnecessary and it should be
> fixed in fold stuff.
> 
> The ivopt logic is
> 1) create iv use for original biv candidate chosen by the algorithm.
> 2) rewrite iv use created in 1) using the original biv in
> rewrite_use_nonlinear_expr.  (Here we use nonlinear use as the example)
> 3) If the iv use is exact the candidate increment stmt, then skip rewriting
> since the computation is already there.
> 4) If that's not the case (in 3), it falls through  and uses get_computation
> to rewrite the use.  As commented in #10, this may introduce redundant code,
> we rely on following passes to remove them.
> 
> So for this case, the iv use created is direct for an original biv var.  The
> values of the iv use and the original biv are same to each other.  Just the
> iv use will be rewrite in a different form of computation to the original
> biv.
> 
> In other words, it is guaranteed that the iv use has same value as original
> candidate when control flow comes to the ICE point.  The ICE really is
> caused by inconsistent fold behavior.
> 
> So IMHO, we may have to fix the fold part...

I'm afraid we can't guarantee you'll get always operand_equal_p results from the folding, if you start folding in different orders etc.  The issue mentioned here is just one case we know of.
So, IMHO it must be solved in the IVOPTs side, either by giving up somewhere and not using candidate that would not be recognized during rewriting, or by always folding the same operands in the same order if it is the same thing, or just noting it is the same thing somewhere (and/or remembering the multiply factor) and not trying the constant_multiple_of.
Incremental improvements on the folding side will then incrementally improve stuff on the IVOPTs side, but you won't ICE on arbitrary cases that aren't handled yet.
Comment 15 bin cheng 2016-02-09 11:20:43 UTC
Though the previous patch can work, I am testing another patch less intrusive.
Comment 16 bin cheng 2016-02-10 14:09:37 UTC
Author: amker
Date: Wed Feb 10 14:09:05 2016
New Revision: 233269

URL: https://gcc.gnu.org/viewcvs?rev=233269&root=gcc&view=rev
Log:

	PR tree-optimization/68021
	* tree-ssa-loop-ivopts.c (get_computation_aff): Set ratio to 1 if
	when computing the value of biv cand by itself.

	gcc/testsuite/ChangeLog
	PR tree-optimization/68021
	* gcc.dg/tree-ssa/pr68021.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr68021.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-ivopts.c
Comment 17 Jakub Jelinek 2016-02-10 14:16:12 UTC
Fixed.
Comment 18 bin cheng 2017-04-11 08:16:24 UTC
Author: amker
Date: Tue Apr 11 08:15:51 2017
New Revision: 246833

URL: https://gcc.gnu.org/viewcvs?rev=246833&root=gcc&view=rev
Log:
	Backport from mainline
	2016-02-10  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/68021
	* tree-ssa-loop-ivopts.c (get_computation_aff): Set ratio to 1 if
	when computing the value of biv cand by itself.


	gcc/testsuite
	PR tree-optimization/80345
	* gcc.c-torture/compile/pr80345.c

	Backport from mainline
	2016-02-10  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/68021
	* gcc.dg/tree-ssa/pr68021.c: New test.

Added:
    branches/gcc-5-branch/gcc/testsuite/gcc.c-torture/compile/pr80345.c
    branches/gcc-5-branch/gcc/testsuite/gcc.dg/tree-ssa/pr68021.c
Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/testsuite/ChangeLog
    branches/gcc-5-branch/gcc/tree-ssa-loop-ivopts.c