Summary: | [4.3 Regression] out-of-SSA is broken | ||
---|---|---|---|
Product: | gcc | Reporter: | John Regehr <regehr> |
Component: | tree-optimization | Assignee: | Andrew Macleod <amacleod> |
Status: | RESOLVED FIXED | ||
Severity: | blocker | CC: | amacleod, cnstar9988, dje, fang, gcc-bugs, pawel_sikora, rguenth |
Priority: | P1 | Keywords: | wrong-code |
Version: | 4.4.0 | ||
Target Milestone: | 4.3.3 | ||
Host: | Target: | ||
Build: | Known to work: | 4.2.4 4.3.3 4.4.0 | |
Known to fail: | 4.3.2 | Last reconfirmed: | 2008-08-13 09:53:31 |
Attachments: |
potential patch 1
Potential patch #2 Patch submitted |
Description
John Regehr
2008-08-13 03:45:21 UTC
-fno-unswitch-loops fixes the original testcase but not the following which only fails _with_ -O3 -fno-unswitch-loops. extern void abort (void); unsigned int g_24; unsigned int g_37 = 1; unsigned int g_225 = 0; int main () { unsigned int l_289; for (l_289 = 1; l_289 < 5; l_289 += 1) { if (g_225) { g_24 = g_37; } } g_24 = g_37; unsigned int context = g_24 << 1; do { if (context) context = (context << 1) ^ 1; } while (0); if (context != 5) abort (); return 0; } Testcase failing with -O3 but not with -O3 -fno-unswitch-loops: extern void abort (void); unsigned int g_24; unsigned int g_37 = 1; unsigned int g_225 = 0; void __attribute__((noinline)) foo (int x) { if (x != 5) abort (); } int main () { unsigned int l_289; for (l_289 = 1; l_289 < 5; l_289 += 1) { if (g_225) { g_24 = g_37; } } g_24 = g_37; unsigned int context = g_24 << 1; do { if (context) context = (context << 1) ^ 1; } while (0); foo (context); return 0; } Somewhat shorter: extern void abort (void); unsigned int a, b = 1, c; void __attribute__ ((noinline)) foo (int x) { if (x != 5) abort (); } int main () { unsigned int d, e; for (d = 1; d < 5; d++) if (c) a = b; a = b; e = a << 1; if (e) e = (e << 1) ^ 1; foo (e); return 0; } At *.tailc looks sane to me. <bb 2>: pretmp.35_15 = c; b.1_5 = b; a_lsm.48_4 = a; if (pretmp.35_15 != 0) goto <bb 3>; else goto <bb 4>; <bb 3>: <bb 4>: # a_lsm.48_28 = PHI <b.1_5(3), a_lsm.48_4(2)> a = b.1_5; e_9 = b.1_5 << 1; if (e_9 != 0) goto <bb 5>; else goto <bb 6>; so a_lsm.48_28 is either b or a, depending on whether c != 0 or not, and a = b. Then copyrename4 decides to replace a = b.1_5; with a = a_lsm_48_5; and at *.uncprop we have: <bb 2>: pretmp.35_15 = c; a_lsm.48_5 = b; a_lsm.48_4 = a; if (pretmp.35_15 != 0) goto <bb 3>; else goto <bb 4>; <bb 3>: <bb 4>: # a_lsm.48_28 = PHI <a_lsm.48_5(3), a_lsm.48_4(2)> a = a_lsm.48_5; e_9 = a_lsm.48_5 << 1; if (e_9 != 0) goto <bb 5>; else goto <bb 6>; which looks correct too. But *.optimized is already wrong: <bb 2>: a_lsm.49 = b; a_lsm.48 = a; if (c != 0) goto <bb 3>; else goto <bb 7>; <bb 7>: a_lsm.49 = a_lsm.48; goto <bb 4>; <bb 3>: <bb 4>: a = a_lsm.49; e = a_lsm.49 << 1; if (e != 0) goto <bb 5>; else goto <bb 8>; While in *.uncprop a was set unconditionally to b, now it is sometimes b and sometimes a. gcc 4.3.2 20080826 failed. ---------------------------- #include <stdio.h> unsigned int g_24; unsigned int g_37 = 1; unsigned char g_225; int main (void) { int l_289; for (l_289 = 1; l_289 < 5; l_289 += 1) { if (g_225) { g_24 = g_37; } } g_24 = g_37; unsigned short context = g_24 << 1; do { if (context) context = (context << 1) ^ 1; } while (0); printf ("%d\n", (int)context); return 0; } 4.3.2 is released, changing milestones to 4.3.3. It looks like that we correctly track liveliness by Live on entry to BB2 : Live on entry to BB3 : a_lsm.27_5 Live on entry to BB4 : a_lsm.27_5 Live on entry to BB5 : Live on entry to BB6 : for # BLOCK 2 freq:2000 # PRED: ENTRY [100.0%] (fallthru,exec) c.0_4 = c; a_lsm.27_5 = b; a_lsm.27_1 = a; if (c.0_4 != 0) goto <bb 3>; else goto <bb 4>; # SUCC: 4 [50.0%] (false,exec) 3 [50.0%] (true,exec) # BLOCK 3 freq:1000 # PRED: 2 [50.0%] (true,exec) # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:2000 # PRED: 3 [100.0%] (fallthru,exec) 2 [50.0%] (false,exec) # a_lsm.27_29 = PHI <a_lsm.27_5(3), a_lsm.27_1(2)> a = a_lsm.27_5; e_9 = a_lsm.27_5 << 1; if (e_9 != 0) goto <bb 5>; else goto <bb 6>; # SUCC: 5 [54.0%] (true,exec) 6 [46.0%] (false,exec) and based on this create the conflict graph. But we still (blindly) generate copies for the (dead) PHI node in BB4 which causes conflicts that we didn't see. So, why exactly do we generate copies for a PHI node which result is not live-in in its BB? Andrew - you are probably most familiar with the out-of-SSA code, can you have a look here? (the dead PHI node is caused by the last DSE pass which removes the last use of it and there is no other DCE pass to clean that up). Jakubs testcase from comment #3 at -O3. My humble "fix" for this would be to exchange 118.cddce with 120.dse which should hide this problem. As on the trunk we are now feeding out-of-SSA with random dead statements at -O0 we should look into this. Or schedule a DCE pass before it if it cannot cope with dead statements. out of ssa generally expects that there is no dead code. I think the original logic was that you never want to generate code for dead statements, so DCE would be run just before out of ssa. It assumes any conflicts caused by any copies which will be inserted are already covered in the conflict graph. In this particular case, if the PHI results was live for even a statement or two, it would conflict with the use of a_lsm.27_5 in the next statement, and then the coalesce would happen with a_lsm.27_1 instead. We'd still get the dead copy, but the code would work fine. That's a reasonable example of why we don't want to feed dead code into outofssa. Figuring out which PHI parameter is TRULY available would be a tricky proposition to catch this case. And its a waste. So we either run dead code just before out of ssa, or take PHI's with no uses and discard them at the same time we discard virtual PHIs (this would be trivial). I wonder if we might eventually find a similar circumstance with a dead PHI cycle that would not be easy to detect without the same logic as a DCE tho. I'd say a DCE pass is generally a better approach, and should be documented as a requirement, or even wired in. Thanks for the explanation, for the branch I would recommend an extra DCE pass right before pass_del_ssa. On the trunk we need to make sure to run this at -O0 as well. Note that the simple DCE can leave dead statements around, only control-dependent DCE will make sure to not retain any DCE opportunities. As long as it removes any dead PHIs, it would be sufficient. Other types of dead statements don't have 'unexpected' side effects across basic block boundaries, and should be handled fine. Other than being a waste of effort of course :-) A little further thinking leads me to think dead PHI cycles could not produce this problem. The PHI results would get conflicts since they are live out their basic blocks. It the lack of any use in this case that is the problem. But removing a PHI with no uses can easily have ripple effects in that it may have been the only use of some other PHI, etc, etc. So we're back to DCE type work anyway. (In reply to comment #9) > Thanks for the explanation, for the branch I would recommend an extra DCE > pass right before pass_del_ssa. On the trunk we need to make sure to run > this at -O0 as well. Note that the simple DCE can leave dead statements > around, only control-dependent DCE will make sure to not retain any > DCE opportunities. > Well, I tried just that (running CD-DCE right before Out-of-SSA independent of optimization level) on the trunk, and 16 Fortran testcases blow up (with an ICE in Out-of-SSA) at -O0 due to it. However, turning optimization on makes them compile again, so running DCE at -O0 isn't such a great idea. On the other hand, this should be easy enough to fix on the branch, as we can just run the pass if optimization is enabled. Ok, so I think we should be fine if we ignore PHI nodes with zero-use results during building the elimination graph - chained unused PHIs will have lifeness computed for all but the PHI node with the zero-use result. So, on the 4.3 branch the following fixes the failure for me: Index: tree-outof-ssa.c =================================================================== *** tree-outof-ssa.c (revision 140417) --- tree-outof-ssa.c (working copy) *************** eliminate_build (elim_graph g, basic_blo *** 321,326 **** --- 321,329 ---- for (phi = phi_nodes (B); phi; phi = PHI_CHAIN (phi)) { + if (has_zero_uses (PHI_RESULT (phi))) + continue; + T0 = var_to_partition_to_var (g->map, PHI_RESULT (phi)); /* Ignore results which are not in partitions. */ I am now giving this testing. I was in the middle of updating this PR and taking possesion :-) Upon further reflection, I don't think it is acceptable for out-of-ssa to generate incorrect code simply because an optimization wasn't run before it. So, there are 2 basic choices. ONnis as you state, only there is an additional bit required during the rewrite verification. The other option is a very mini dead phi removal done at the same time as virtual phis are eliminated. its only a few lines of code and simple removes PHIs with no uses, and takes care of the ripple effect as well. I have a patch for it as well, its just not completely tested yet. I'll attach both in a minute, I just wanted you to know I was looking at it :-) Hm, it doesn't work - bootstrap fails as gengtype segfaults. Created attachment 16343 [details]
potential patch 1
This is the first option which simply doesn't add the result of a PHI with no uses to the partition map. It passes bootstrap and all tests on x86_64-unknown-linux-gnu with no regressions.
Created attachment 16345 [details]
Potential patch #2
This is the other option, eliminate dead PHIs. compiling all of GCC .c files at -O3, it finds a total of 12 instances of dead phis, and 3 of those were "extra" ones made dead by elimination of the original dead PHI.
This patch has not yet been bootstrapped and fully tested, its ongoing now.
Created attachment 16348 [details]
Patch submitted
Doh. The last patch for the dead code removal had a disabling check in it that I was using for some testing, so it didn't actually do anything :-P
This is the correct patch which I will be checking in. It has been bootstrapped and passed all tests with no regressions on x86_64-unknown-linux-gnu.
Subject: Bug 37102 Author: amacleod Date: Thu Sep 18 13:58:55 2008 New Revision: 140455 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140455 Log: fix PR 37102 by having out of ssa remove dead PHI nodes. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr37102.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-outof-ssa.c The patch should be fine for 4.3, but I'll give it a few days in mainline and see if anything shows up on any overnight or weekly runs ping... The bug is not fixed in 4.3-branch. Does this patch works well on 4.3? Thanks! amacleod: ping? Subject: Bug 37102 Author: amacleod Date: Fri Oct 17 17:35:58 2008 New Revision: 141195 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141195 Log: fix PR tree-optimization/37102 2008-10-17 Andrew MacLeod <amacleod@redhat.com> PR tree-optimization/37102 * tree-outof-ssa.c (remove_gimple_phi_args): Remove all the PHI args from a node. Check to see if another PHI is dead. (eliminate_useless_phis): Rename from eliminate_virtual_phis and remove real PHIs which have no uses. (rewrite_out_of_ssa): Call eliminate_useless_phis. 2008-10-17 Andrew MacLeod <amacleod@redhat.com> PR tree-optimization/37102 * gcc.c-torture/execute/pr37102.c: New Test. Added: branches/gcc-4_3-branch/gcc/testsuite/gcc.c-torture/execute/pr37102.c Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/testsuite/ChangeLog branches/gcc-4_3-branch/gcc/tree-outof-ssa.c Fixed then. Thanks. |