Bug 55875 - [4.8 Regression] IVopts caused miscompilation
Summary: [4.8 Regression] IVopts caused miscompilation
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P1 normal
Target Milestone: 4.8.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-04 12:08 UTC by Jakub Jelinek
Modified: 2013-01-09 16:35 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-01-04 00:00:00


Attachments
patch in testing. (1.64 KB, application/octet-stream)
2013-01-08 16:12 UTC, Jan Hubicka
Details
updated patch (3.77 KB, patch)
2013-01-08 17:55 UTC, Jan Hubicka
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2013-01-04 12:08:35 UTC
The following testcase is miscompiled at -O3 on x86_64-linux starting with
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192989
(ok, that revision doesn't compile, needs r192900 follow-up).

struct A
{
  short int a1;
  unsigned char a2;
  unsigned int a3;
};

struct B
{
  unsigned short b1;
  const A *b2;
};

B b;

__attribute__((noinline, noclone))
int foo (unsigned x)
{
  __asm volatile ("" : "+r" (x) : : "memory");
  return x;
}

inline void
bar (const int &)
{
}

__attribute__((noinline)) void
baz ()
{
  const A *a = b.b2;
  unsigned int i;
  unsigned short n = b.b1;
  for (i = 0; i < n; ++i)
    if (a[i].a1 == 11)
      {
	if (i > 0 && (a[i - 1].a2 & 1))
	  continue;
	bar (foo (2));
	return;
      }
}

int
main ()
{
  A a[4] = { { 10, 0, 0 }, { 11, 1, 0 }, { 11, 1, 0 }, { 11, 1, 0 } };
  b.b1 = 4;
  b.b2 = a;
  baz ();
  return 0;
}

In *.slp we have:
  <bb 5>:
  if (i_21 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;
  
  <bb 6>:
  _11 = i_21 + 4294967295;
  _12 = (long unsigned int) _11;
  _13 = _12 * 8;
  _14 = a_4 + _13;
  _15 = _14->a2;
but ivopts turns that into:
  <bb 5>:
  i_25 = (unsigned int) ivtmp.9_31;
  if (i_25 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;

  <bb 6>:
  _28 = ivtmp.9_31 * 8;
  _27 = a_4 + _28;
  _26 = _27 + 34359738362;
  _15 = MEM[base: _26, offset: 0B];
which is wrong, i_21 + -1U wrapped around (and wasn't executed for i_21 being 0), while 34359738362 is 0xffffffffULL * 8 + 2, thus it ignores the wrapping and does what the original code would do for
  _12 = (long unsigned int) i_21;
  _77 = _12 + 4294967295;
  _13 = _77 * 8;
i.e. as if the -1U addition was done in the wider precision.
Comment 1 Marek Polacek 2013-01-04 12:20:41 UTC
Confirmed.
Comment 2 Jakub Jelinek 2013-01-04 12:24:12 UTC
ssa name _12
  type long unsigned int
  base 4294967295
  step 1
ssa name _13
  type long unsigned int
  base 34359738360
  step 8
ssa name _14
  type const struct A *
  base a_4 + 34359738360
  step 8
  base object (void *) a_4

all look wrong to me, it ignores the wrapping in the narrower type.

OT, I wonder why number of iterations analysis doesn't figure out smaller number:
Analyzing # of iterations of loop 1
  exit condition [1, + , 1] < (unsigned int) n_5
  bounds on difference of bases: 0 ... 4294967294
  result:
    # of iterations (unsigned int) n_5 + 4294967295, bounded by 4294967294
The loop condition is:
  i_18 = i_21 + 1;
  if (i_18 < _20)
and _20 is:
  _20 = (unsigned int) n_5;
where
  short unsigned int n;
thus, I'd thought we should figure out from that that i_18 will be at most 0xffff (highest possible n_5 is 0xffff).
But that is 4.9 material likely, while the ivopts? bug is a severe 4.8 issue.
Comment 3 Jan Hubicka 2013-01-07 13:36:58 UTC
OK, the problem seems to be already in what simple_iv returns for SSA name 12.  Here we should have -1.  While analyzing the cast
(gdb) p debug_gimple_stmt (at_stmt)
_12 = (long unsigned int) _11;
that effectively changes addition to minus, we get to
    CASE_CONVERT:
      /* In case we have a truncation of a widened operation that in
         the truncated type has undefined overflow behavior analyze
         the operation done in an unsigned type of the same precision
         as the final truncation.  We cannot derive a scalar evolution
         for the widened operation but for the truncated result.  */
      if (TREE_CODE (type) == INTEGER_TYPE
          && TREE_CODE (TREE_TYPE (rhs1)) == INTEGER_TYPE
          && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (rhs1))
          && TYPE_OVERFLOW_UNDEFINED (type)
          && TREE_CODE (rhs1) == SSA_NAME
          && (def = SSA_NAME_DEF_STMT (rhs1))
          && is_gimple_assign (def)
          && TREE_CODE_CLASS (gimple_assign_rhs_code (def)) == tcc_binary
          && TREE_CODE (gimple_assign_rhs2 (def)) == INTEGER_CST)
        { 
          tree utype = unsigned_type_for (type);
          chrec1 = interpret_rhs_expr (loop, at_stmt, utype,
                                       gimple_assign_rhs1 (def),
                                       gimple_assign_rhs_code (def),
                                       gimple_assign_rhs2 (def));
        }
      else
        chrec1 = analyze_scalar_evolution (loop, rhs1);

Here for widening conversions we do nothing. So we get
 <polynomial_chrec 0x7ffff705d7e0
    type <integer_type 0x7ffff6ee8690 unsigned int sizes-gimplified public unsigned SI
        size <integer_cst 0x7ffff6eeb140 constant 32>
        unit size <integer_cst 0x7ffff6eeb160 constant 4>
        align 32 symtab 0 alias set -1 canonical type 0x7ffff6ee8690 precision 32 min <integer_cst 0x7ffff6eeb180 0> max <integer_cst 0x7ffff6eeb120 4294967295>
        pointer_to_this <pointer_type 0x7ffff6fc9738>>
   
    arg 0 <integer_cst 0x7ffff6eeb3c0 type <integer_type 0x7ffff6ee85e8 int> constant 1>
    arg 1 <integer_cst 0x7ffff6eeb120 type <integer_type 0x7ffff6ee8690 unsigned int> constant 4294967295>
    arg 2 <integer_cst 0x7ffff7042ae0 type <integer_type 0x7ffff6ee8690 unsigned int> constant 1>>

this is correct, since it is done in unsigned int.
Next we do:
      res = chrec_convert (type, chrec1, at_stmt);
Eventually we go to convert_affine_scev and we set enforce_overflow_semantics
  enforce_overflow_semantics = (use_overflow_semantics
                                && nowrap_type_p (type));

This test looks backwards to me.  If the type is nowrap, we do not need to enforce anything about overflows, when it is wrap, then we have to.

Flipping it however do not change fix the testcase.
Anyway, the result of convert_affince_scev makes us to produce
 <polynomial_chrec 0x7ffff705d810
    type <integer_type 0x7ffff6ee87e0 long unsigned int public unsigned DI
        size <integer_cst 0x7ffff6ecddc0 constant 64>
        unit size <integer_cst 0x7ffff6ecdde0 constant 8>
        align 64 symtab 0 alias set -1 canonical type 0x7ffff6ee87e0 precision 64 min <integer_cst 0x7ffff6eeb200 0> max <integer_cst 0x7ffff6eeb1e0 18446744073709551615>>
   
    arg 0 <integer_cst 0x7ffff6eeb3c0 type <integer_type 0x7ffff6ee85e8 int> constant 1>
    arg 1 <integer_cst 0x7ffff7042de0 type <integer_type 0x7ffff6ee87e0 long unsigned int> constant 4294967295>
    arg 2 <integer_cst 0x7ffff704cba0 type <integer_type 0x7ffff6ee87e0 long unsigned int> constant 1>>

that seems already wrong.
So the bug seems to be that convert_affince_scev is not doing the right thing here? But what the right thing is? Conversion to -1 or giving up?
Comment 4 rakdver 2013-01-07 14:09:18 UTC
> this is correct, since it is done in unsigned int.
> Next we do:
>       res = chrec_convert (type, chrec1, at_stmt);
> Eventually we go to convert_affine_scev and we set enforce_overflow_semantics
>   enforce_overflow_semantics = (use_overflow_semantics
>                                 && nowrap_type_p (type));
> 
> This test looks backwards to me.  If the type is nowrap, we do not need to
> enforce anything about overflows, when it is wrap, then we have to.

The test is correct -- here, type is the type we are converting to; so, if
the type is nowrap, we have to make sure that the conversion is not creating
an overflowing iv.

Anyway, that is irrelevant to the problem: the check of scev_probably_wraps_p
below should return false -- which is probably what got broken by the #of
iterations estimation change.

Zdenek
Comment 5 Zdenek Dvorak 2013-01-07 14:11:29 UTC
(In reply to comment #4)
> the check of scev_probably_wraps_p below should return false

this should be "should return true"
Comment 6 Jan Hubicka 2013-01-07 17:07:57 UTC
OK, I understnad the issue now.  It is bug caused by my patch indeed.
The problem is logic in scev_probably_wraps_p that is trying to prove that given IV at given STMT is not wrapping based on loop bounds connected.
When I was extending loop bounds to contain not only statements that dominate the exit BB, I verified the walkers that they are valid after the change.
In this case it is however not true. What I missed is that it does two things

1) it tries to prove that STMT is bounded by given bound based on fact that bound's STMT dominate the statement
2) it tries to prove the bound based on number of iterations of loop that it derrives from the bounds

2) needs to be updated.  I am testing patch.
Comment 7 Jan Hubicka 2013-01-08 16:12:25 UTC
Created attachment 29106 [details]
patch in testing.
Comment 8 Jan Hubicka 2013-01-08 17:55:42 UTC
Created attachment 29109 [details]
updated patch

There is another bug triggered by this testcase. Some of the bounds, like those derrived by if (iv_var == constant) can not be used when they are not executed every iteration.

This patch prevents them from being recorded as bounds.
Comment 9 Jakub Jelinek 2013-01-08 18:27:44 UTC
FYI, the execute/pr55875.c with -O1 started failing with
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=138207
aka tuples merge.
Comment 10 Jan Hubicka 2013-01-09 15:10:55 UTC
Author: hubicka
Date: Wed Jan  9 15:10:43 2013
New Revision: 195054

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195054
Log:
	PR tree-optimiation/55875
	* gcc.c-torture/execute/pr55875.c: New testcase.
	* g++.dg/torture/pr55875.C: New testcase.

	* tree-ssa-loop-niter.c (number_of_iterations_cond): Add
	EVERY_ITERATION parameter.
	(number_of_iterations_exit): Check if exit is executed every
	iteration.
	(idx_infer_loop_bounds): Similarly here.
	(n_of_executions_at_most): Simplify
	to only test for cases where statement is dominated by the
	particular bound; handle correctly the "postdominance"
	test.
	(scev_probably_wraps_p): Use max loop iterations info
	as a global bound first.


Added:
    trunk/gcc/testsuite/g++.dg/torture/pr55875.C
    trunk/gcc/testsuite/gcc.c-torture/execute/pr55875.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-niter.c
Comment 11 Jakub Jelinek 2013-01-09 15:13:20 UTC
Fixed, thanks.
Comment 12 Jan Hubicka 2013-01-09 16:29:21 UTC
Shall we track the C testcase regression in 4.7 and earlier?

Honza
Comment 13 Jakub Jelinek 2013-01-09 16:35:13 UTC
Yes, but I'd say under a different PR.