----------------------------------- void bar (unsigned int); unsigned int foo (void) { unsigned int i, j; for (i = 1; i < 30; i++) { j = 2 + 3*i; bar (j); } return j; } ----------------------------------- --> .optimized: foo () { unsigned intD.3 ivtmp.10D.1589; unsigned intD.3 ivtmp.2D.1581; unsigned intD.3 pretmp.1D.1580; unsigned intD.3 pretmp.0D.1579; unsigned intD.3 jD.1567; unsigned intD.3 iD.1566; unsigned intD.3 D.1572; unsigned intD.3 D.1571; # BLOCK 0 # PRED: ENTRY [100.0%] (fallthru,exec) # SUCC: 1 [100.0%] (fallthru,exec) # BLOCK 1 # PRED: 1 [96.5%] (dfs_back,true,exec) 0 [100.0%] (fallthru,exec) # jD.1567_2 = PHI <jD.1567_13(1), 5(0)>; <L0>:; bar (jD.1567_2); jD.1567_13 = jD.1567_2 + 3; if (jD.1567_13 != 92) goto <L0>; else goto <L2>; # SUCC: 1 [96.5%] (dfs_back,true,exec) 2 [3.5%] (loop_exit,false,exec) # BLOCK 2 # PRED: 1 [3.5%] (loop_exit,false,exec) # jD.1567_3 = PHI <jD.1567_2(1)>; <L2>:; return jD.1567_3; # SUCC: EXIT [100.0%] } --> .vars foo () { unsigned intD.3 j.13D.1592; unsigned intD.3 jD.1567; # BLOCK 0 # PRED: ENTRY [100.0%] (fallthru,exec) jD.1567 = 5; # SUCC: 1 [100.0%] (fallthru,exec) # BLOCK 1 # PRED: 3 [100.0%] (fallthru) 0 [100.0%] (fallthru,exec) <L0>:; bar (jD.1567); j.13D.1592 = jD.1567 + 3; if (j.13D.1592 != 92) goto <L8>; else goto <L2>; # SUCC: 3 [96.5%] (dfs_back,true,exec) 2 [3.5%] (loop_exit,false,exec) # BLOCK 3 # PRED: 1 [96.5%] (dfs_back,true,exec) <L8>:; jD.1567 = j.13D.1592; goto <bb 1> (<L0>); # SUCC: 1 [100.0%] (fallthru) # BLOCK 2 # PRED: 1 [3.5%] (loop_exit,false,exec) <L2>:; return jD.1567; # SUCC: EXIT [100.0%] } Note the copy inside the loop, and the extra basic block that is needed for it. This happens because j_2 and j_13 are not coalesced when going out of SSA form. We never again get rid of it later on: foo: pushq %rbx movl $5, %ebx jmp .L2 .p2align 4,,7 .L7: movl %eax, %ebx .L2: movl %ebx, %edi call bar leal 3(%rbx), %eax cmpl $92, %eax jne .L7 movl %ebx, %eax popq %rbx ret This is a regression from GCC 4.0 that is most likely to blame on the copy propagation pass that Diego merged in from the tree-cleanup-branch. It causes a number of performance regressions on various targets. The bug looks somewhat similar to the issue previously discussed in PR18048, but I haven't checked if that bug is also affected by this problem.
Actually, GCC 4.0 has this problem also. GCC 3.3: foo: pushq %rbp movl $1, %ebp pushq %rbx movl $5, %ebx subq $8, %rsp .p2align 4,,7 .L6: movl %ebx, %edi incl %ebp addl $3, %ebx call bar cmpl $29, %ebp jbe .L6 addq $8, %rsp movl $89, %eax popq %rbx popq %rbp ret GCC 4.0: foo: pushq %rbx movl $5, %ebx jmp .L2 .p2align 4,,7 .L7: movl %eax, %ebx .L2: movl %ebx, %edi call bar leal 3(%rbx), %eax cmpl $92, %eax jne .L7 movl %ebx, %eax popq %rbx ret GCC CVS head as of today: foo: pushq %rbx movl $5, %ebx jmp .L2 .p2align 4,,7 .L7: movl %eax, %ebx .L2: movl %ebx, %edi call bar leal 3(%rbx), %eax cmpl $92, %eax jne .L7 movl %ebx, %eax popq %rbx ret
This is IVopts producing unresolvable overlapping live ranges again: t.c.t56.cunroll: foo () { unsigned int pretmp.1; unsigned int pretmp.0; unsigned int j; unsigned int i; unsigned int D.1572; unsigned int D.1571; <bb 0>: # i_12 = PHI <i_9(3), 1(0)>; <L0>:; D.1571_7 = i_12 * 3; j_8 = D.1571_7 + 2; bar (j_8); i_9 = i_12 + 1; if (i_9 <= 29) goto <L7>; else goto <L2>; <L7>:; goto <bb 1> (<L0>); # j_3 = PHI <j_8(1)>; <L2>:; return j_3; } t.c.t57.ivopts: foo () { unsigned int ivtmp.6; unsigned int pretmp.1; unsigned int pretmp.0; unsigned int j; unsigned int i; unsigned int D.1572; unsigned int D.1571; <bb 0>: # ivtmp.6_1 = PHI <ivtmp.6_2(3), 5(0)>; <L0>:; j_8 = ivtmp.6_1; bar (j_8); ivtmp.6_2 = ivtmp.6_1 + 3; if (ivtmp.6_2 != 92) goto <L7>; else goto <L2>; <L7>:; goto <bb 1> (<L0>); # j_3 = PHI <j_8(1)>; <L2>:; return j_3; } Because j_8 escapes from the loop and ivtmp is modified after j_8, we have this problem.
Confirmed and has been a bug since "3.5.0 20040909".
Shouldn't we know what value j has at this point? # j_3 = PHI <j_8(1)>; <L2>:; return j_3; 92 - 3?
Anyways here is a testcase which should not produce a constant after the loop: int bar (unsigned int); unsigned int foo (void) { unsigned int i, j; for (i = 1; i < 30; i++) { j = 2 + 3*i; if (bar (j)) break; } return j; }
Actually, ivopts do not produce any "unresolvable overlapping live ranges". It does not change life range of j_8 at all, and only replaces the variable i by more suitable strength reduced version ivtmp.6. Note that immediately after ivopts, ivtmp.6_2 and ivtmp.6_1 can be coalesced. Some later pass renames all the references to ivtmp.6 to j, thus creating the overlapping life ranges. The loop may become one of the following three possibilities: a) x = 5; while (1) { j = x; bar (j); x = x + 3; if (...) break; } return j; or b) x = 5; while (1) { bar (x); x1 = x + 3; if (...) break; x = x1; } return x; or c) x = 5; while (1) { bar (x); if (...) break; x = x + 3; } return x; a) is the one chosen by ivopts. b) is the one some copy propagation pass converts it to later. c) is not used by ivopts in order to prevent creation of loops with multiple basic blocks (see PR 19038). It would be possible to teach ivopts to handle this special case, but I don't think c) is that much better than a) (one more assignment inside the loop body, but a) is easier to schedule).
Both DOM and copyprop are messing this up.
Subject: Re: [4.0/4.1 regression] Not copy propagating single-argument PHIs causes out-of-ssa coalescing failure On Tue, May 10, 2005 at 11:07:24PM -0000, pinskia at gcc dot gnu dot org wrote: > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-05-10 23:07 ------- > Both DOM and copyprop are messing this up. > Be more specific: 1- What exactly is being "messed up". 2- Why is it wrong? 3- How would "fixing" this affect PR 18048? 4- What do you think should be done differently? You know, it is extremely irritating to read these one liners that offer no valuable information at all. It's not the first time you do this, please be more considerate in the future. If you don't have anything useful to contribute, silence is a perfectly valid option. Diego.
(In reply to comment #8) > Be more specific: > > 1- What exactly is being "messed up". Read Zdenek's comment about renaming them. > 2- Why is it wrong? Because now we have "unresolvable overlapping live ranges" > 3- How would "fixing" this affect PR 18048? I don't know if it does. The bug which Steven was referencing was really PR 19038, well there was some discussion in PR 18048 about this problem also. > 4- What do you think should be done differently? Not creating "unresolvable overlapping live ranges", how I don't know.
Subject: Re: [4.0/4.1 regression] Not copy propagating single-argument PHIs causes out-of-ssa coalescing failure On Tue, May 10, 2005 at 11:27:17PM -0000, pinskia at gcc dot gnu dot org wrote: > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-05-10 23:27 ------- > (In reply to comment #8) > > > Be more specific: > > > > 1- What exactly is being "messed up". > Read Zdenek's comment about renaming them. > > > 2- Why is it wrong? > > Because now we have "unresolvable overlapping live ranges" > > > 3- How would "fixing" this affect PR 18048? > I don't know if it does. The bug which Steven was referencing was really PR 19038, well there was some > discussion in PR 18048 about this problem also. > > > 4- What do you think should be done differently? > Not creating "unresolvable overlapping live ranges", how I don't know. > You really didn't understand my point, did you? You needlessly added noise to the PR. Nothing you said was new information nor was it useful to the person fixing this problem. I appreciate the triaging work that you do, but I don't appreciate random noise in PRs, particularly when that implies more mail in my inbox. Diego.
Here is an example which shows that IV-OPTs is not the cause and that we have the same issue and shows what is going wrong clearer: int k(int); int h(void); long h1(void); int f(int i) { int oldii; int ii; ii = h1(); do { oldii = ii; k(ii); ii = oldii + h(); } while (i<ii); return oldii; } We remove the use of oldii and use ii_n instead, I think we are coalescing the wrong two ii_n and not the two inside the loop but the one that was called oldii with the one before the loop.
Like Diego, I'd like to understand this PR better. Not with a guess, but with a concrete explanation. If there's an overlapping live range, what range is it? In order to understand these kinds of optimization PRs, we really need a better level of detail.
Doesn't happen anymore on current mainline.