Bug 37102

Summary:	[4.3 Regression] out-of-SSA is broken
Product:	gcc	Reporter:	John Regehr <regehr>
Component:	tree-optimization	Assignee:	Andrew Macleod <amacleod>
Status:	RESOLVED FIXED
Severity:	blocker	CC:	amacleod, cnstar9988, dje, fang, gcc-bugs, pawel_sikora, rguenth
Priority:	P1	Keywords:	wrong-code
Version:	4.4.0
Target Milestone:	4.3.3
Host:		Target:
Build:		Known to work:	4.2.4 4.3.3 4.4.0
Known to fail:	4.3.2	Last reconfirmed:	2008-08-13 09:53:31
Attachments:	potential patch 1 Potential patch #2 Patch submitted

Description John Regehr 2008-08-13 03:45:21 UTC

This is for svn 139046 on Ubuntu Hardy.

regehr@john-home:~/volatile/tmp0$ current-gcc -O -Wall -fwrapv small.c -o small
regehr@john-home:~/volatile/tmp0$ ./small
5
regehr@john-home:~/volatile/tmp0$ current-gcc -O3 -Wall -fwrapv small.c -o small
regehr@john-home:~/volatile/tmp0$ ./small
0
regehr@john-home:~/volatile/tmp0$ current-gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../configure --program-prefix=current- --enable-languages=c,c++ --prefix=/home/regehr
Thread model: posix
gcc version 4.4.0 20080813 (experimental) (GCC) 
regehr@john-home:~/volatile/tmp0$ cat small.c

#include <stdio.h>

unsigned int g_24;
unsigned int g_37 = 1;
unsigned char g_225;

int main (void)
{
  int l_289;
  for (l_289 = 1; l_289 < 5; l_289 += 1) {
    if (g_225) {
      g_24 = g_37;
    }
  }
  g_24 = g_37;
  unsigned short context = g_24 << 1;
  do {
    if (context)
      context = (context << 1) ^ 1;
  } while (0);
  printf ("%d\n", (int)context);
  return 0;
}

Comment 1 Richard Biener 2008-08-13 09:53:30 UTC

-fno-unswitch-loops fixes the original testcase but not the following which
only fails _with_ -O3 -fno-unswitch-loops.

extern void abort (void);

unsigned int g_24;
unsigned int g_37 = 1;
unsigned int g_225 = 0;

int main ()
{
  unsigned int l_289;
  for (l_289 = 1; l_289 < 5; l_289 += 1) {
      if (g_225) {
          g_24 = g_37;
      }
  }
  g_24 = g_37;
  unsigned int context = g_24 << 1;
  do {
      if (context)
        context = (context << 1) ^ 1;
  } while (0);
  if (context != 5)
    abort ();
  return 0;
}

Comment 2 Richard Biener 2008-08-13 09:54:07 UTC

Testcase failing with -O3 but not with -O3 -fno-unswitch-loops:

extern void abort (void);

unsigned int g_24;
unsigned int g_37 = 1;
unsigned int g_225 = 0;

void __attribute__((noinline)) foo (int x)
{
  if (x != 5)
    abort ();
}

int main ()
{
  unsigned int l_289;
  for (l_289 = 1; l_289 < 5; l_289 += 1) {
      if (g_225) {
          g_24 = g_37;
      }
  }
  g_24 = g_37;
  unsigned int context = g_24 << 1;
  do {
      if (context)
        context = (context << 1) ^ 1;
  } while (0);
  foo (context);
  return 0;
}

Comment 3 Jakub Jelinek 2008-08-14 10:02:42 UTC

Somewhat shorter:
extern void abort (void);

unsigned int a, b = 1, c;

void __attribute__ ((noinline))
foo (int x)
{
  if (x != 5)
    abort ();
}

int
main ()
{
  unsigned int d, e;
  for (d = 1; d < 5; d++)
    if (c)
      a = b;
  a = b;
  e = a << 1;
  if (e)
    e = (e << 1) ^ 1;
  foo (e);
  return 0;
}

At *.tailc looks sane to me.
<bb 2>:
  pretmp.35_15 = c;
  b.1_5 = b;
  a_lsm.48_4 = a;
  if (pretmp.35_15 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:

<bb 4>:
  # a_lsm.48_28 = PHI <b.1_5(3), a_lsm.48_4(2)>
  a = b.1_5;
  e_9 = b.1_5 << 1;
  if (e_9 != 0)
    goto <bb 5>;
  else
    goto <bb 6>;
so a_lsm.48_28 is either b or a, depending on whether c != 0 or not, and a = b.
Then copyrename4 decides to replace a = b.1_5; with a = a_lsm_48_5; and at *.uncprop we have:
<bb 2>:
  pretmp.35_15 = c;
  a_lsm.48_5 = b;
  a_lsm.48_4 = a;
  if (pretmp.35_15 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:

<bb 4>:
  # a_lsm.48_28 = PHI <a_lsm.48_5(3), a_lsm.48_4(2)>
  a = a_lsm.48_5;
  e_9 = a_lsm.48_5 << 1;
  if (e_9 != 0)
    goto <bb 5>;
  else
    goto <bb 6>;
which looks correct too.  But *.optimized is already wrong:
<bb 2>:
  a_lsm.49 = b;
  a_lsm.48 = a;
  if (c != 0)
    goto <bb 3>;
  else
    goto <bb 7>;

<bb 7>:
  a_lsm.49 = a_lsm.48;
  goto <bb 4>;

<bb 3>:
    
<bb 4>:
  a = a_lsm.49;
  e = a_lsm.49 << 1;
  if (e != 0)
    goto <bb 5>;
  else
    goto <bb 8>;
While in *.uncprop a was set unconditionally to b, now it is sometimes b and sometimes a.

Comment 4 littlestar 2008-08-26 13:38:21 UTC

gcc 4.3.2 20080826 failed.

----------------------------
#include <stdio.h>

unsigned int g_24;
unsigned int g_37 = 1;
unsigned char g_225;

int main (void)
{
  int l_289;
  for (l_289 = 1; l_289 < 5; l_289 += 1) {
    if (g_225) {
      g_24 = g_37;
    }
  }
  g_24 = g_37;
  unsigned short context = g_24 << 1;
  do {
    if (context)
      context = (context << 1) ^ 1;
  } while (0);
  printf ("%d\n", (int)context);
  return 0;
}

Comment 5 Joseph S. Myers 2008-08-27 22:05:34 UTC

4.3.2 is released, changing milestones to 4.3.3.

Comment 6 Richard Biener 2008-08-28 14:48:11 UTC

It looks like that we correctly track liveliness by

Live on entry to BB2 :

Live on entry to BB3 : a_lsm.27_5

Live on entry to BB4 : a_lsm.27_5

Live on entry to BB5 :

Live on entry to BB6 :

for

  # BLOCK 2 freq:2000
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  c.0_4 = c;
  a_lsm.27_5 = b;
  a_lsm.27_1 = a;
  if (c.0_4 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;
  # SUCC: 4 [50.0%]  (false,exec) 3 [50.0%]  (true,exec)

  # BLOCK 3 freq:1000
  # PRED: 2 [50.0%]  (true,exec)
  # SUCC: 4 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:2000
  # PRED: 3 [100.0%]  (fallthru,exec) 2 [50.0%]  (false,exec)
  # a_lsm.27_29 = PHI <a_lsm.27_5(3), a_lsm.27_1(2)>
  a = a_lsm.27_5;
  e_9 = a_lsm.27_5 << 1;
  if (e_9 != 0)
    goto <bb 5>;
  else
    goto <bb 6>;
  # SUCC: 5 [54.0%]  (true,exec) 6 [46.0%]  (false,exec)


and based on this create the conflict graph.  But we still (blindly)
generate copies for the (dead) PHI node in BB4 which causes conflicts
that we didn't see.

So, why exactly do we generate copies for a PHI node which result is
not live-in in its BB?  Andrew - you are probably most familiar with
the out-of-SSA code, can you have a look here?  (the dead PHI node is
caused by the last DSE pass which removes the last use of it and there
is no other DCE pass to clean that up).

Jakubs testcase from comment #3 at -O3.

My humble "fix" for this would be to exchange 118.cddce with 120.dse which
should hide this problem.

Comment 7 Richard Biener 2008-09-04 14:53:19 UTC

As on the trunk we are now feeding out-of-SSA with random dead statements
at -O0 we should look into this.  Or schedule a DCE pass before it if it
cannot cope with dead statements.

Comment 8 Andrew Macleod 2008-09-04 16:09:35 UTC

out of ssa generally expects that there is no dead code. 

I think the original logic was that you never want to generate code for dead statements, so DCE would be run just before out of ssa.

It assumes any conflicts caused by any copies which will be inserted are already covered in the conflict graph.  In this particular case, if the PHI results was live for even a statement or two, it would conflict with the use of a_lsm.27_5 in the next statement, and then the coalesce would happen with a_lsm.27_1 instead.  We'd still get the dead copy, but the code would work fine.

That's a reasonable example of why we don't want to feed dead code into outofssa. Figuring out which PHI parameter is TRULY available would be a tricky proposition to catch this case. And its a waste.

So we either run dead code just before out of ssa, or take PHI's with no uses and discard them at the same time we discard virtual PHIs (this would be trivial). I wonder if we might eventually find a similar circumstance with a dead PHI cycle that would not be easy to detect without the same logic as a DCE tho.

I'd say a DCE pass is generally a better approach, and should be documented as a requirement, or even wired in.

Comment 9 Richard Biener 2008-09-04 16:17:44 UTC

Thanks for the explanation, for the branch I would recommend an extra DCE
pass right before pass_del_ssa.  On the trunk we need to make sure to run
this at -O0 as well.  Note that the simple DCE can leave dead statements
around, only control-dependent DCE will make sure to not retain any
DCE opportunities.

Comment 10 Andrew Macleod 2008-09-04 16:51:55 UTC

As long as it removes any dead PHIs, it would be sufficient. Other types of dead statements don't have 'unexpected' side effects across basic block boundaries, and should be handled fine. Other than being a waste of effort of course :-)

A little further thinking leads me to think dead PHI cycles could not produce this problem. The PHI results would get conflicts since they are live out their basic blocks. It the lack of any use in this case that is the problem.
 
But removing a PHI with no uses can easily have ripple effects in that it may have been the only use of some other PHI, etc, etc.  So we're back to DCE type work anyway.

Comment 11 Lucas 2008-09-15 01:03:25 UTC

(In reply to comment #9)
> Thanks for the explanation, for the branch I would recommend an extra DCE
> pass right before pass_del_ssa.  On the trunk we need to make sure to run
> this at -O0 as well.  Note that the simple DCE can leave dead statements
> around, only control-dependent DCE will make sure to not retain any
> DCE opportunities.
> 

Well, I tried just that (running CD-DCE right before Out-of-SSA independent of optimization level) on the trunk, and 16 Fortran testcases blow up (with an ICE in Out-of-SSA) at -O0 due to it.  However, turning optimization on makes them compile again, so running DCE at -O0 isn't such a great idea.  On the other hand, this should be easy enough to fix on the branch, as we can just run the pass if optimization is enabled.

Comment 12 Richard Biener 2008-09-17 14:25:04 UTC

Ok, so I think we should be fine if we ignore PHI nodes with zero-use results
during building the elimination graph - chained unused PHIs will have lifeness
computed for all but the PHI node with the zero-use result.

So, on the 4.3 branch the following fixes the failure for me:

Index: tree-outof-ssa.c
===================================================================
*** tree-outof-ssa.c    (revision 140417)
--- tree-outof-ssa.c    (working copy)
*************** eliminate_build (elim_graph g, basic_blo
*** 321,326 ****
--- 321,329 ----
    
    for (phi = phi_nodes (B); phi; phi = PHI_CHAIN (phi))
      {
+       if (has_zero_uses (PHI_RESULT (phi)))
+       continue;
+ 
        T0 = var_to_partition_to_var (g->map, PHI_RESULT (phi));
        
        /* Ignore results which are not in partitions.  */


I am now giving this testing.

Comment 13 Andrew Macleod 2008-09-17 14:34:04 UTC

I was in the middle of updating this PR and taking possesion :-)

Upon further reflection, I don't think it is acceptable for out-of-ssa to generate incorrect code simply because an optimization wasn't run before it.

So, there are 2 basic choices. ONnis as you state, only there is an additional bit required during the rewrite verification.  

The other option is a very mini dead phi removal done at the same time as virtual phis are eliminated. its only a few lines of code and simple removes PHIs with no uses, and takes care of the ripple effect as well. I have a patch for it as well, its just not completely tested yet.

I'll attach both in a minute, I just wanted you to know I was looking at it :-)

Comment 14 Richard Biener 2008-09-17 14:38:28 UTC

Hm, it doesn't work - bootstrap fails as gengtype segfaults.

Comment 15 Andrew Macleod 2008-09-17 14:39:19 UTC

Created attachment 16343 [details]
potential patch 1

This is the first option which simply doesn't add the result of a PHI with no uses to the partition map.  It passes bootstrap and all tests on x86_64-unknown-linux-gnu with no regressions.

Comment 16 Andrew Macleod 2008-09-17 14:48:04 UTC

Created attachment 16345 [details]
Potential patch #2

This is the other option, eliminate dead PHIs.  compiling all of GCC .c files at -O3, it finds a total of 12 instances of dead phis, and 3 of those were "extra" ones made dead by elimination of the original dead PHI.
 

This patch has not yet been bootstrapped and fully tested, its ongoing now.

Comment 17 Andrew Macleod 2008-09-17 21:55:39 UTC

Created attachment 16348 [details]
Patch submitted

Doh. The last patch for the dead code removal had a disabling check in it that I was using for some testing, so it didn't actually do anything :-P

This is the correct patch which I will be checking in. It has been bootstrapped and passed all tests with no regressions on x86_64-unknown-linux-gnu.

Comment 18 Andrew Macleod 2008-09-18 14:00:18 UTC

Subject: Bug 37102

Author: amacleod
Date: Thu Sep 18 13:58:55 2008
New Revision: 140455

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140455
Log:
fix PR 37102 by having out of ssa remove dead PHI nodes.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr37102.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-outof-ssa.c

Comment 19 Andrew Macleod 2008-09-18 20:04:56 UTC

The patch should be fine for 4.3, but I'll give it a few days in mainline and see if anything shows up on any overnight or weekly runs

Comment 20 littlestar 2008-09-27 11:32:01 UTC

ping...
The bug is not fixed in 4.3-branch.

Comment 21 littlestar 2008-10-10 00:56:10 UTC

Does this patch works well on 4.3?
Thanks!

Comment 22 Lucas 2008-10-12 19:57:48 UTC

amacleod: ping?

Comment 23 Andrew Macleod 2008-10-17 17:37:20 UTC

Subject: Bug 37102

Author: amacleod
Date: Fri Oct 17 17:35:58 2008
New Revision: 141195

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141195
Log:
fix PR tree-optimization/37102

2008-10-17  Andrew MacLeod  <amacleod@redhat.com>

	PR tree-optimization/37102
	* tree-outof-ssa.c (remove_gimple_phi_args): Remove all the PHI args  
	from a node. Check to see if another PHI is dead.
	(eliminate_useless_phis): Rename from eliminate_virtual_phis and remove
	real PHIs which have no uses.
	(rewrite_out_of_ssa): Call eliminate_useless_phis.

2008-10-17  Andrew MacLeod  <amacleod@redhat.com>

	PR tree-optimization/37102
	* gcc.c-torture/execute/pr37102.c: New Test.


Added:
    branches/gcc-4_3-branch/gcc/testsuite/gcc.c-torture/execute/pr37102.c
Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_3-branch/gcc/tree-outof-ssa.c

Comment 24 Richard Biener 2008-10-18 11:17:42 UTC

Fixed then.  Thanks.