Bug 57904 - [4.9 Regression] Bogus(?) "invokes undefined behavior" warning with Fortran's finalization wrapper (gfortran.dg/class_48.f90)
Summary: [4.9 Regression] Bogus(?) "invokes undefined behavior" warning with Fortran's...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.9.0
: P1 normal
Target Milestone: 4.9.0
Assignee: Jeffrey A. Law
URL:
Keywords:
: 58746 (view as bug list)
Depends on:
Blocks: 58746
  Show dependency treegraph
 
Reported: 2013-07-16 09:03 UTC by Tobias Burnus
Modified: 2014-01-17 17:51 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-07-23 00:00:00


Attachments
change code generation for simple DO-loops (416 bytes, patch)
2013-12-20 05:51 UTC, Bernd Edlinger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2013-07-16 09:03:46 UTC
The test case is based on gfortran.dg/class_48.f90 - and the issue is exposed by r200954.

Compiling the following test case with "gfortran -m32 -O2"  (or "-m32 -Os") gives the warning:

test.f90: In function '__final_test2_T.1838.constprop.0':
test.f90:21:0: warning: iteration 2147483648 invokes undefined behavior [-Waggressive-loop-optimizations]
       class(t), allocatable :: a
 ^


Test case:

program test
  call test2 ()
contains
  subroutine test2 ()
    type t
      integer, allocatable :: x
    end type t

    type t2
      class(t), allocatable :: a
    end type t2

    type(t2) :: one, two

    allocate (two%a)
    one = two
  end subroutine test2
end program test
Comment 1 Martin Jambor 2013-07-23 19:53:53 UTC
I'm not really sure what the warning is about.  The warning is emitted
in the cunrolli (note the i at the end) pass when it also dumps the
following to the dump:

Statement _16 = idx_15 + -1;
 is executed at most 2147483647 (bounded by 2147483647) + 1 times in loop 3.

which looks suspicious.  However, the whole loop is guarded by
cndition (ubound.0_3 > 0) which IPA-CP tells us is false, so the code
is never executed.  Indeed the whole loop disappears in the very next
pass dump ccp2 and scheduling an extra ccp before cunrolli makes the
warning go away.

Having said that, I'm not sure how to proceed at the moment and have
to leave my office pretty much immediately :-)  Parhaps Jakub, who
introduced the warning, might have an idea?
Comment 2 Hans-Peter Nilsson 2013-08-19 23:03:30 UTC
I'm guessing this fails for all ILP32 targets as described; at least ARM, CRIS, PowerPC and m68k as well.  Adjusting title to properly mark as regression.
Comment 3 Bernd Edlinger 2013-08-20 17:40:11 UTC
this is somehow very similar to PR58143.
also here, the warning goes away at -O2 and -Os
if I add -fno-strict-overflow or -ftrapv or -fwrapv...
Comment 4 Richard Biener 2013-10-30 13:13:41 UTC
But not fixed by the fix for PR58143 and not present on the branch.

Re-confirmed.

Loop 1 iterates at most 2147483646 times.
Loop 3 iterates 4294967295 times.
Loop 3 iterates at most -1 times.

still needs investigation.
Comment 5 Jakub Jelinek 2013-11-27 15:54:54 UTC
We have:
<bb 2>:
ubound.0_3 = 0;
size.1_4 = ubound.0_3 + 1;
size.1_5 = MAX_EXPR <size.1_4, 0>;
_6 = size.1_5 * 4;
_7 = (character(kind=4)) _6;
_8 = MAX_EXPR <_7, 1>;
sizes_9 = __builtin_malloc (_8);
size.3_10 = MAX_EXPR <ubound.0_3, 0>;
_11 = size.3_10 * 4;
_12 = (character(kind=4)) _11;
_13 = MAX_EXPR <_12, 1>;
strides_14 = __builtin_malloc (_13);
MEM[(integer(kind=4)[0:D.1906] *)sizes_9][0] = 1;
if (ubound.0_3 > 0)
  goto <bb 3>;
else
  goto <bb 6>;
<bb 3>:
idx_50 = 1;
<bb 4>:
# idx_15 = PHI <1(3), idx_25(5)>
_16 = idx_15 + -1;
_17 = array_1(D)->dim[_16].stride;
MEM[(integer(kind=4)[0:D.1900] *)strides_14][_16] = _17;
_18 = MEM[(integer(kind=4)[0:D.1906] *)sizes_9][_16];
_19 = array_1(D)->dim[_16].ubound;
_20 = array_1(D)->dim[_16].lbound;
_21 = _19 - _20;
_22 = _21 + 1;
_23 = MAX_EXPR <_22, 0>;
_24 = _18 * _23;
MEM[(integer(kind=4)[0:D.1906] *)sizes_9][idx_15] = _24;
idx_25 = idx_15 + 1;
if (ubound.0_3 == idx_15)
  goto <bb 6>;
else
  goto <bb 5>;
<bb 5>:
goto <bb 4>;

and the warning is on the header(4) latch(5) loop.  The warning is correct,
the loop would execute 0xffffffff times and _16 = idx_15 + -1 would invoke undefined behavior on the last iteration, but still the warning is undesirable because it is on a dead loop.  Only during IPA optimizations the original
  _15 = array_14(D)->dtype;
  ubound.0_16 = _15 & 7;
was optimized into:
  _2 = 344;
  ubound.0_3 = _2 & 7;
and only copyprop2 pass optimizes that into:
  ubound.0_3 = 0;
but nothing yet propagated it into the condition and folded the condition, cunrolli runs simply too early.  I guess scheduling there a forwprop pass in between copyprop2 and cunrolli would fix this, but don't know how expensive it is.  Or we could avoid emitting the warning right away, but add there __builtin_warning or similar into the loop, and only warn later on after some cleanup passes that can actually figure out what code is dead.
Comment 6 Joost VandeVondele 2013-11-27 15:59:43 UTC
this is the same/similar issue as PR58746, which also triggers this warning but even without -m32 on x86_64.
Comment 7 Jeffrey A. Law 2013-12-19 20:55:00 UTC
So I see a few issues here.

First the copyprop pass does not discover new const/copies.  So if we have

  _2 = 344;
  ubound.0_3 = _2 & 7;
  size.1_4 = ubound.0_3 + 1;
  size.1_5 = MAX_EXPR <size.1_4, 0>;
  _6 = size.1_5 * 4;
  _7 = (character(kind=4)) _6;
  _8 = MAX_EXPR <_7, 1>;
  sizes_9 = __builtin_malloc (_8);
  size.3_10 = MAX_EXPR <ubound.0_3, 0>;
  _11 = size.3_10 * 4;
  _12 = (character(kind=4)) _11;
  _13 = MAX_EXPR <_12, 1>;
  strides_14 = __builtin_malloc (_13);
  MEM[(integer(kind=4)[0:D.1917] *)sizes_9][0] = 1;
  if (ubound.0_3 > 0)
    goto <bb 3>;
  else
    goto <bb 6>;

We discover _2 = 344 and copyprop that to its uses.  However, copyprop does not discover that the RHS of ubound.0_3's assignment will simplify into a copy until *after* the main copy propagation algorithm is complete.  ie, that isn't discovered until subsitute_and_fold.   Basically anything that doesn't look like a const/copy initialization when the pass starts is effectively ignored.

Durding substitute_and_fold, we substitute the value 344 for uses of _2.  But we don't pick up that ubound.0_3 now has a propagatable value.  Worse yet, because we walk backwards through the statements, even if we recorded that ubound.0_3 has a propagatable value we wouldn't be able to utilize that information.

What a mess.


My preference would be to enhance the copy propagation pass to discover these secondary copies.  That would obviously slow down the pass as it's going to have to look at a lot more statements and there'll be more statements to reevaluate as values move through the lattice.

Another possibility would be to do some kind of post-processing after substitute_and_fold to recognize that new const/copies were exposed and do something sensible.
Comment 8 Jeffrey A. Law 2013-12-19 21:32:24 UTC
I wonder if we could refactor propgate_rhs_into_lhs from tree-ssa-dom.c to help here.  It was designed to handle precisely this kind of problem.
Comment 9 Bernd Edlinger 2013-12-20 05:51:37 UTC
Created attachment 31485 [details]
change code generation for simple DO-loops

This not yet fully tested patch changes the DO-loop code generation
to a more loop-niter frendly way.

before:
   if ((step > 0) ? (dovar <= to) : (dovar => to))
    {
      for (;;)
        {
          body;
   cycle_label:
          cond = (dovar == to);
          dovar += step;
          if (cond) goto end_label;
        }
      }
   end_label:


after:
   if ((step > 0) ? (dovar <= to) : (dovar => to))
    {
      for (;;)
        {
          body;
   cycle_label:
          cond = (step > 0) ? (dovar >= to) : (dovar <= to);
          dovar += step;
          if (cond) goto end_label;
        }
      }
   end_label:

this is more easy to see, that dovar can not overflow:
what do you think of it?
Comment 10 Jeffrey A. Law 2013-12-20 06:26:44 UTC
Bernd,

It's certainly good if the test outside the loop and inside the loop is the same.  It's a lot more likely to be discovered to be redundant earlier.  I have no idea how it would affect ivopts.

Regardless of what the front-end presents us with, the early optimizers are just doing a crappy job here.  If you look at the IL dump and propagate/simplify based on ubound_3 = 0 you'll find that a ton of code just goes away.

The question in my mind is can we get the propagation/simplifications we want with a reasonable cost.  We certainly have passes that will clean this up that we could schedule to run after copyprop, but an integrated solution may be significantly better from a compile-time standpoint.

It also helps that we already have code which does, probably 95% of what we need in phi-only cprop.  I suspect if we just marked things that obviously simplified down to copies/constants and re-used the phi-only-cprop code that everything would "just work" with a minimal compile-time hit.  I'm hoping to prototype that tomorrow.
Comment 11 Dominique d'Humieres 2013-12-20 07:21:13 UTC
With the patch in comment 9, gfortran.dg/class_48.f90 no longer fails and I don't see any regression. The warning for the test in pr58746 comment 2 is also fixed.
Comment 12 Jakub Jelinek 2013-12-20 08:36:47 UTC
(In reply to Dominique d'Humieres from comment #11)
> With the patch in comment 9, gfortran.dg/class_48.f90 no longer fails and I
> don't see any regression. The warning for the test in pr58746 comment 2 is
> also fixed.

But you can always create testcases (in C/C++ etc.) that will hit this warning, so while the FE change is possible, we need to do something either about the optimization passes in between IPA and cunrolli (copyprop change Jeff talks about, perhaps only done for that single pass instance and not others, or all?, guess depending on how expensive it is) or scheduling there another instance of some other cleanup pass, or deferring the warning reporting until some cleanup.

For the FE change, I guess most important are benchmark results, doesn't it slow down important benchmarks?
Comment 13 Bernd Edlinger 2013-12-20 09:53:55 UTC
(In reply to Jakub Jelinek from comment #12)
> But you can always create testcases (in C/C++ etc.) that will hit this
> warning, so while the FE change is possible, we need to do something either
> about the optimization passes in between IPA and cunrolli (copyprop change
> Jeff talks about, perhaps only done for that single pass instance and not
> others, or all?, guess depending on how expensive it is) or scheduling there
> another instance of some other cleanup pass, or deferring the warning
> reporting until some cleanup.
> 

Could someone please provide a C test case that generates this warning?

maybe something like:

 int dovar = from;
 if (dovar <= to)
   for (;;)
     {
      ...
      if (dovar == to)
        break;
      dovar += step;
    }

> For the FE change, I guess most important are benchmark results, doesn't it
> slow down important benchmarks?


I think the change in the simple DO-loops really makes sense,
independent of this warning.

It generates the IL code just a bit more like it is done in C:

for (dovar=from; dovar <= to; dovar+=step) { ... }

=>

 dovar = from;
 if (dovar <= to)
 {
   loop:
    ...
    dovar+=step;
    if (dovar <= to)
       goto loop;
 }

Therefore the IL is easier to analyze and 
chances are good that the optimizer performs better.

I can't do these benchmarks however. Who could do that for us?
Comment 14 Jeffrey A. Law 2013-12-20 19:22:38 UTC
So a quick prototype which reuses the infrastructure from the phi-only-propagator cleans things up quite nicely.

Given this block after substitute_and_fold does its thing:

<bb 2>:
ubound.0_3 = 0;
size.1_4 = ubound.0_3 + 1;
size.1_5 = MAX_EXPR <size.1_4, 0>;
_6 = size.1_5 * 4;
_7 = (character(kind=4)) _6;
_8 = MAX_EXPR <_7, 1>;
sizes_9 = __builtin_malloc (_8);
size.3_10 = MAX_EXPR <ubound.0_3, 0>;
_11 = size.3_10 * 4;
_12 = (character(kind=4)) _11;
_13 = MAX_EXPR <_12, 1>;
strides_14 = __builtin_malloc (_13);
MEM[(integer(kind=4)[0:D.1917] *)sizes_9][0] = 1;
if (ubound.0_3 > 0)
  goto <bb 3>;
else
  goto <bb 6>;


I (manually) seed the phi-only propagator's 2nd step with _3 as being a newly exposed destination of a copy/constant initialization and let that trivial propagator do its thing...  Resulting in:

<bb 2>:
sizes_9 = __builtin_malloc (4);
strides_14 = __builtin_malloc (1);
MEM[(integer(kind=4)[0:D.1917] *)sizes_9][0] = 1;
goto <bb 6>;

Which is exactly what we want.  Also note that we collapse the conditional at the end of the block.  That in turn makes the problematic loop unreachable and it goes away as one would expect.

The problem I see is I'd prefer not to expose this in substitute_and_fold directly.  That routine is used by multiple propagators.  I'm thinking that instead we can have a callback to the pass utilizing substitute_and_fold that gets called when something has folded.
Comment 15 Dominique d'Humieres 2013-12-20 21:56:30 UTC
*** Bug 58746 has been marked as a duplicate of this bug. ***
Comment 16 Dominique d'Humieres 2013-12-20 21:58:44 UTC
> For the FE change, I guess most important are benchmark results, 
> doesn't it slow down important benchmarks?

AFAICT the answer is not at least for the gfortran test suite and the polyhedron ones
(pb05, pb11, and my own variants).

May be Joost can do more testing for CP2K.
Comment 17 Jeffrey A. Law 2013-12-20 22:27:27 UTC
Dominique, thanks for verifying that 58746 is a duplicate.  I was wondering about that.

Richi, we've known for a long time (since the early 90s) that running CSE soon after loop unrolling is profitable.  Soooo, why not run conditionally run DOM after unrolling?   That wouldn't create a compile-time hit for a typical -O2 compilation.
Comment 18 Jeffrey A. Law 2013-12-20 22:28:09 UTC
Whoops, message for Richi was meant for a different BZ.
Comment 19 Jeffrey A. Law 2014-01-17 17:50:42 UTC
Author: law
Date: Fri Jan 17 17:50:10 2014
New Revision: 206723

URL: http://gcc.gnu.org/viewcvs?rev=206723&root=gcc&view=rev
Log:
	PR middle-end/57904
	* passes.def: Reorder pass_copy_prop, pass_unrolli, pass_ccp sequence
	so that pass_ccp runs first.

        PR middle-end/57904
	* gfortran.dg/pr57904.f90: New test.

Added:
    trunk/gcc/testsuite/gfortran.dg/pr57904.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/passes.def
    trunk/gcc/testsuite/ChangeLog
Comment 20 Jeffrey A. Law 2014-01-17 17:51:22 UTC
Fix by recent commit.