21488 – [4.0/4.1 regression] Not copy propagating single-argument PHIs causes out-of-ssa coalescing failure

Bug 21488 - [4.0/4.1 regression] Not copy propagating single-argument PHIs causes out-of-ssa coalescing failure

Summary: [4.0/4.1 regression] Not copy propagating single-argument PHIs causes out-of-...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.1.0

Importance:	P2 minor
Target Milestone:	4.0.3
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization, TREE

Depends on:	19038
Blocks:
	Show dependency tree / graph

Reported:	2005-05-10 12:48 UTC by Steven Bosscher
Modified:	2005-11-13 01:38 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2005-05-10 20:32:24

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Steven Bosscher 2005-05-10 12:48:25 UTC

----------------------------------- 
void bar (unsigned int); 
 
unsigned int 
foo (void) 
{ 
  unsigned int i, j; 
 
  for (i = 1; i < 30; i++) 
    { 
      j = 2 + 3*i; 
      bar (j); 
    } 
  return j; 
} 
----------------------------------- 
 
--> .optimized: 
 
foo () 
{ 
  unsigned intD.3 ivtmp.10D.1589; 
  unsigned intD.3 ivtmp.2D.1581; 
  unsigned intD.3 pretmp.1D.1580; 
  unsigned intD.3 pretmp.0D.1579; 
  unsigned intD.3 jD.1567; 
  unsigned intD.3 iD.1566; 
  unsigned intD.3 D.1572; 
  unsigned intD.3 D.1571; 
 
  # BLOCK 0 
  # PRED: ENTRY [100.0%]  (fallthru,exec) 
  # SUCC: 1 [100.0%]  (fallthru,exec) 
 
  # BLOCK 1 
  # PRED: 1 [96.5%]  (dfs_back,true,exec) 0 [100.0%]  (fallthru,exec) 
  # jD.1567_2 = PHI <jD.1567_13(1), 5(0)>; 
<L0>:; 
  bar (jD.1567_2); 
  jD.1567_13 = jD.1567_2 + 3; 
  if (jD.1567_13 != 92) goto <L0>; else goto <L2>; 
  # SUCC: 1 [96.5%]  (dfs_back,true,exec) 2 [3.5%]  (loop_exit,false,exec) 
 
  # BLOCK 2 
  # PRED: 1 [3.5%]  (loop_exit,false,exec) 
  # jD.1567_3 = PHI <jD.1567_2(1)>; 
<L2>:; 
  return jD.1567_3; 
  # SUCC: EXIT [100.0%] 
 
} 
 
 
--> .vars 
 
foo () 
{ 
  unsigned intD.3 j.13D.1592; 
  unsigned intD.3 jD.1567; 
 
  # BLOCK 0 
  # PRED: ENTRY [100.0%]  (fallthru,exec) 
  jD.1567 = 5; 
  # SUCC: 1 [100.0%]  (fallthru,exec) 
 
  # BLOCK 1 
  # PRED: 3 [100.0%]  (fallthru) 0 [100.0%]  (fallthru,exec) 
<L0>:; 
  bar (jD.1567); 
  j.13D.1592 = jD.1567 + 3; 
  if (j.13D.1592 != 92) goto <L8>; else goto <L2>; 
  # SUCC: 3 [96.5%]  (dfs_back,true,exec) 2 [3.5%]  (loop_exit,false,exec) 
 
  # BLOCK 3 
  # PRED: 1 [96.5%]  (dfs_back,true,exec) 
<L8>:; 
  jD.1567 = j.13D.1592; 
  goto <bb 1> (<L0>); 
  # SUCC: 1 [100.0%]  (fallthru) 
 
  # BLOCK 2 
  # PRED: 1 [3.5%]  (loop_exit,false,exec) 
<L2>:; 
  return jD.1567; 
  # SUCC: EXIT [100.0%] 
 
} 
 
Note the copy inside the loop, and the extra basic block that is needed 
for it.  This happens because j_2 and j_13 are not coalesced when going 
out of SSA form.  We never again get rid of it later on: 
 
foo: 
        pushq   %rbx 
        movl    $5, %ebx 
        jmp     .L2 
        .p2align 4,,7 
.L7: 
        movl    %eax, %ebx 
.L2: 
        movl    %ebx, %edi 
        call    bar 
        leal    3(%rbx), %eax 
        cmpl    $92, %eax 
        jne     .L7 
        movl    %ebx, %eax 
        popq    %rbx 
        ret 
 
This is a regression from GCC 4.0 that is most likely to blame on the  
copy propagation pass that Diego merged in from the tree-cleanup-branch. 
It causes a number of performance regressions on various targets. 
 
The bug looks somewhat similar to the issue previously discussed in 
PR18048, but I haven't checked if that bug is also affected by this 
problem.

Comment 1 Steven Bosscher 2005-05-10 13:29:10 UTC

Actually, GCC 4.0 has this problem also. 
 
GCC 3.3: 
foo: 
        pushq   %rbp 
        movl    $1, %ebp 
        pushq   %rbx 
        movl    $5, %ebx 
        subq    $8, %rsp 
        .p2align 4,,7 
.L6: 
        movl    %ebx, %edi 
        incl    %ebp 
        addl    $3, %ebx 
        call    bar 
        cmpl    $29, %ebp 
        jbe     .L6 
        addq    $8, %rsp 
        movl    $89, %eax 
        popq    %rbx 
        popq    %rbp 
        ret 
 
GCC 4.0: 
foo: 
        pushq   %rbx 
        movl    $5, %ebx 
        jmp     .L2 
        .p2align 4,,7 
.L7: 
        movl    %eax, %ebx 
.L2: 
        movl    %ebx, %edi 
        call    bar 
        leal    3(%rbx), %eax 
        cmpl    $92, %eax 
        jne     .L7 
        movl    %ebx, %eax 
        popq    %rbx 
        ret 
 
GCC CVS head as of today: 
foo: 
        pushq   %rbx 
        movl    $5, %ebx 
        jmp     .L2 
        .p2align 4,,7 
.L7: 
        movl    %eax, %ebx 
.L2: 
        movl    %ebx, %edi 
        call    bar 
        leal    3(%rbx), %eax 
        cmpl    $92, %eax 
        jne     .L7 
        movl    %ebx, %eax 
        popq    %rbx 
        ret

Comment 2 Steven Bosscher 2005-05-10 14:14:53 UTC

This is IVopts producing unresolvable overlapping live ranges again: 
 
t.c.t56.cunroll: 
foo () 
{ 
  unsigned int pretmp.1; 
  unsigned int pretmp.0; 
  unsigned int j; 
  unsigned int i; 
  unsigned int D.1572; 
  unsigned int D.1571; 
 
<bb 0>: 
 
  # i_12 = PHI <i_9(3), 1(0)>; 
<L0>:; 
  D.1571_7 = i_12 * 3; 
  j_8 = D.1571_7 + 2; 
  bar (j_8); 
  i_9 = i_12 + 1; 
  if (i_9 <= 29) goto <L7>; else goto <L2>; 
 
<L7>:; 
  goto <bb 1> (<L0>); 
 
  # j_3 = PHI <j_8(1)>; 
<L2>:; 
  return j_3; 
 
} 
 
 
t.c.t57.ivopts: 
foo () 
{ 
  unsigned int ivtmp.6; 
  unsigned int pretmp.1; 
  unsigned int pretmp.0; 
  unsigned int j; 
  unsigned int i; 
  unsigned int D.1572; 
  unsigned int D.1571; 
 
<bb 0>: 
 
  # ivtmp.6_1 = PHI <ivtmp.6_2(3), 5(0)>; 
<L0>:; 
  j_8 = ivtmp.6_1; 
  bar (j_8); 
  ivtmp.6_2 = ivtmp.6_1 + 3; 
  if (ivtmp.6_2 != 92) goto <L7>; else goto <L2>; 
 
<L7>:; 
  goto <bb 1> (<L0>); 
 
  # j_3 = PHI <j_8(1)>; 
<L2>:; 
  return j_3; 
 
} 
 
Because j_8 escapes from the loop and ivtmp is modified after j_8, we 
have this problem.

Comment 3 Andrew Pinski 2005-05-10 17:48:10 UTC

Confirmed and has been a bug since "3.5.0 20040909".

Comment 4 Andrew Pinski 2005-05-10 20:32:24 UTC

Shouldn't we know what value j has at this point?
  # j_3 = PHI <j_8(1)>;
<L2>:;
  return j_3;

92 - 3?

Comment 5 Andrew Pinski 2005-05-10 20:33:25 UTC

Anyways here is a testcase which should not produce a constant after the loop:
int bar (unsigned int);

unsigned int
foo (void)
{
  unsigned int i, j;

  for (i = 1; i < 30; i++)
    {
      j = 2 + 3*i;
      if (bar (j))
        break;
    }
  return j;
}

Comment 6 Zdenek Dvorak 2005-05-10 22:28:48 UTC

Actually, ivopts do not produce any "unresolvable overlapping live ranges".
It does not change life range of j_8 at all, and only replaces the variable
i by more suitable strength reduced version ivtmp.6.  Note that immediately
after ivopts, ivtmp.6_2 and ivtmp.6_1 can be coalesced.

Some later pass renames all the references to ivtmp.6 to j, thus creating the
overlapping life ranges.

The loop may become one of the following three possibilities:

a)

x = 5;
while (1)
  {
    j = x;
    bar (j);

    x = x + 3;
    if (...)
      break;
  }

return j;

or

b)

x = 5;
while (1)
  {
    bar (x);

    x1 = x + 3;
    if (...)
      break;
    x = x1;
  }
return x;

or

c)

x = 5;
while (1)
  {
    bar (x);

    if (...)
      break;
    x = x + 3;
  }
return x;

a) is the one chosen by ivopts.
b) is the one some copy propagation pass converts it to later.
c) is not used by ivopts in order to prevent creation of loops with multiple
   basic blocks (see PR 19038).  It would be possible to teach ivopts to handle
   this special case, but I don't think c) is that much better than a) (one more
   assignment inside the loop body, but a) is easier to schedule).

Comment 7 Andrew Pinski 2005-05-10 23:07:23 UTC

Both DOM and copyprop are messing this up.

Comment 8 Diego Novillo 2005-05-10 23:21:18 UTC

Subject: Re:  [4.0/4.1 regression] Not copy propagating single-argument PHIs causes out-of-ssa coalescing failure

On Tue, May 10, 2005 at 11:07:24PM -0000, pinskia at gcc dot gnu dot org wrote:
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2005-05-10 23:07 -------
> Both DOM and copyprop are messing this up.
> 
Be more specific:

1- What exactly is being "messed up".
2- Why is it wrong?
3- How would "fixing" this affect PR 18048?
4- What do you think should be done differently?

You know, it is extremely irritating to read these one liners
that offer no valuable information at all.

It's not the first time you do this, please be more considerate
in the future.  If you don't have anything useful to contribute,
silence is a perfectly valid option.

Diego.

Comment 9 Andrew Pinski 2005-05-10 23:27:16 UTC

(In reply to comment #8)

> Be more specific:
> 
> 1- What exactly is being "messed up".
Read Zdenek's comment about renaming them.

> 2- Why is it wrong?

Because now we have "unresolvable overlapping live ranges"

> 3- How would "fixing" this affect PR 18048?
I don't know if it does.  The bug which Steven was referencing was really PR 19038, well there was some 
discussion in PR 18048 about this problem also.

> 4- What do you think should be done differently?
Not creating "unresolvable overlapping live ranges", how I don't know.

Comment 10 Diego Novillo 2005-05-10 23:33:58 UTC

Subject: Re:  [4.0/4.1 regression] Not copy propagating single-argument PHIs causes out-of-ssa coalescing failure

On Tue, May 10, 2005 at 11:27:17PM -0000, pinskia at gcc dot gnu dot org wrote:
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2005-05-10 23:27 -------
> (In reply to comment #8)
> 
> > Be more specific:
> > 
> > 1- What exactly is being "messed up".
> Read Zdenek's comment about renaming them.
> 
> > 2- Why is it wrong?
> 
> Because now we have "unresolvable overlapping live ranges"
> 
> > 3- How would "fixing" this affect PR 18048?
> I don't know if it does.  The bug which Steven was referencing was really PR 19038, well there was some 
> discussion in PR 18048 about this problem also.
> 
> > 4- What do you think should be done differently?
> Not creating "unresolvable overlapping live ranges", how I don't know.
> 
You really didn't understand my point, did you?  You needlessly
added noise to the PR.  Nothing you said was new information nor
was it useful to the person fixing this problem.

I appreciate the triaging work that you do, but I don't
appreciate random noise in PRs, particularly when that implies
more mail in my inbox.


Diego.

Comment 11 Andrew Pinski 2005-08-11 19:43:06 UTC

Here is an example which shows that IV-OPTs is not the cause and that we have the same issue and 
shows what is going wrong clearer:
int k(int);
int h(void);
long h1(void);

int f(int i)
{
  int oldii;
  int ii;
  ii = h1();
  do
  {
    oldii = ii;
    k(ii);
    ii = oldii + h();
  } while (i<ii);
  return oldii;
}

We remove the use of oldii and use ii_n instead,  I think we are coalescing the wrong two ii_n and not 
the two inside the loop but the one that was called oldii with the one before the loop.

Comment 12 Mark Mitchell 2005-10-31 03:27:02 UTC

Like Diego, I'd like to understand this PR better.  Not with a guess, but with a concrete explanation.  If there's an overlapping live range, what range is it?  In order to understand these kinds of optimization PRs, we really need a better level of detail.

Comment 13 Steven Bosscher 2005-11-13 01:38:51 UTC

Doesn't happen anymore on current mainline.