Bug 28868 - [4.3/4.4 Regression] Not eliminating the PHIs which have the same arguments
Summary: [4.3/4.4 Regression] Not eliminating the PHIs which have the same arguments
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.2.0
: P2 minor
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 31914
Blocks: 23305
  Show dependency treegraph
 
Reported: 2006-08-28 05:52 UTC by Andrew Pinski
Modified: 2010-02-26 11:05 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.3.3 3.2.3 2.95.3 4.5.0
Known to fail: 3.4.0 4.0.0 4.0.4 4.1.0 4.2.0 3.0.4
Last reconfirmed: 2009-03-25 13:18:37


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2006-08-28 05:52:52 UTC
These two functions should produce the same asm but they don't:
int f(int t, int a, int b)
{
  int c, d;
  if (t)
  {
    c = a;
    d = a;
  }
  else
  {
    c = b;
    d = b;
  }
  return c+d;
}

int f1(int t, int a, int b)
{
  int c;
  c = t?a:b;
  return c+c;
}

----------------
f1 is better optimized because it has less register pressure than f.
Note this was reduced from a problem in PR23305.
Comment 1 Andrew Pinski 2006-08-28 06:01:26 UTC
This is weird, this code is a regression also.
Comment 2 Andrew Pinski 2006-08-28 06:06:50 UTC
Evening adding one more variable causes 3.2.3 to optimize this.  I don't know what does the optimization on the rtl level.
Comment 3 Andrew Pinski 2006-08-28 06:43:25 UTC
If I do this:
int f(int t, int a, int b)
{
  int c, d, e;
  if (t)
  {
    c = a+1;
    d = a+1;
    e = a+1;
  }
  else
  {
    c = b+1;
    d = b+1;
    e = b+1;
  }
  return c+d+e;
}

We get the extra moves elimintated (though not at the tree level).
Comment 4 Steven Bosscher 2006-08-28 13:59:32 UTC
From the hammer branch for AMD64:

.globl f
        .type   f, @function
f:
.LFB4:
        testl   %edi, %edi
        movl    %esi, %eax
        jne     .L3
        movl    %edx, %esi
        movl    %edx, %eax
.L3:
        leal    (%rax,%rsi), %eax
        ret
.LFE4:
        .size   f, .-f
        .p2align 4,,15
.globl f1
        .type   f1, @function
f1:
.LFB5:
        testl   %edi, %edi
        cmove   %edx, %esi
        leal    (%rsi,%rsi), %eax
        ret
.LFE5:
        .size   f1, .-f1


And from gcc 4.2 20060818:

.globl f
        .type   f, @function
f:
.LFB2:
        testl   %edi, %edi
        movl    %esi, %eax
        cmove   %edx, %esi
        cmove   %esi, %eax
        addl    %esi, %eax
        ret
.LFE2:
        .size   f, .-f
        .p2align 4,,15
.globl f1
        .type   f1, @function
f1:
.LFB3:
        testl   %edi, %edi
        cmove   %edx, %esi
        leal    (%rsi,%rsi), %eax
        ret
.LFE3:
        .size   f1, .-f1


So not all gcc3 releases do so well.  Are there GCC releases that optimize the two functions to identical code?

In any case, this is a missed optimization.  I suppose the trick in this case is to recognise that "c + d" == "c + c" (perhaps during value numbering?), but the first step to analyze this bug would be to figure out where gcc3 (supposedly) performs this optimization.

Comment 5 Andrew Pinski 2006-08-28 14:37:29 UTC
(In reply to comment #4)
> So not all gcc3 releases do so well.  Are there GCC releases that optimize the
> two functions to identical code?

Yes (FSF) 3.2.3.
.globl f
        .type   f,@function
f:
        movl    4(%esp), %eax
        testl   %eax, %eax
        movl    12(%esp), %eax
        cmovne  8(%esp), %eax
        addl    %eax, %eax
        ret
.Lfe1:
        .size   f,.Lfe1-f
        .p2align 4,,15
.globl f1
        .type   f1,@function
f1:
        movl    4(%esp), %edx
        movl    12(%esp), %eax
        testl   %edx, %edx
        cmovne  8(%esp), %eax
        addl    %eax, %eax
        ret
.Lfe2:
        .size   f1,.Lfe2-f1
        .ident  "GCC: (GNU) 3.2.3"
Comment 6 Daniel Berlin 2006-08-29 14:59:33 UTC
SCCVN comes up with

Value numbers:
d_2 = c_1
c_7 = b_6
d_8 = b_6
c_10 = a_9
d_11 = a_9


where
  # PRED: 3 (fallthru,exec) 4 (fallthru,exec)
  # dD.1526_2 = PHI <dD.1526_11(3), dD.1526_8(4)>;
  # cD.1525_1 = PHI <cD.1525_10(3), cD.1525_7(4)>;
...


As a result, once integrated into PRE/FRE, it will eliminate uses of the d_2 phi with the c_1 phi, which is what you want.

Comment 7 Andrew Pinski 2006-08-30 04:44:44 UTC
(In reply to comment #6)
> As a result, once integrated into PRE/FRE, it will eliminate uses of the d_2
> phi with the c_1 phi, which is what you want.
Thanks, I was expecting that the new VN would fix this.
Comment 8 Gabriel Dos Reis 2007-02-03 19:38:09 UTC
won't fix in GCC-4.0.x.  Adjusting milestone.
Comment 9 Andrew Pinski 2007-07-01 01:09:42 UTC
We get in .fre now:
Value numbers:
d_2 = c_1 

But we still get the PHIs:
  # d_2 = PHI <a_4(D)(2), b_7(D)(3)>
  # c_1 = PHI <a_4(D)(2), b_7(D)(3)>
Comment 10 Richard Biener 2007-11-04 15:45:58 UTC
FRE doesn't replace a SSA_NAME with another SSA_NAME.  So even though VN
figured that e_3 == d_2 == c_1 it doesn't replace them here:

SCCVN says e_3 value numbers to c_1 (VH.5)
SCCVN says d_2 value numbers to c_1 (VH.5)

<bb 5>:
  # e_3 = PHI <e_8(3), e_12(4)>
  # d_2 = PHI <d_7(3), d_11(4)>
  # c_1 = PHI <c_6(3), c_10(4)>
  D.1182_13 = c_1 + d_2;
  D.1181_14 = D.1182_13 + e_3;

the next copyprop pass makes them look the same:

<bb 5>:
  # e_3 = PHI <c_6(3), c_10(4)>
  # d_2 = PHI <c_6(3), c_10(4)>
  # c_1 = PHI <c_6(3), c_10(4)>
  D.1182_13 = c_1 + d_2;
  D.1181_14 = D.1182_13 + e_3;
  return D.1181_14;

but that's it.  See also PR31914.
Comment 11 Daniel Berlin 2007-11-04 19:24:38 UTC
Subject: Re:  [4.0/4.1/4.2/4.3 Regression] Not elimintating the PHIs which have the same arguments

On 4 Nov 2007 15:45:59 -0000, rguenth at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #10 from rguenth at gcc dot gnu dot org  2007-11-04 15:45 -------
> FRE doesn't replace a SSA_NAME with another SSA_NAME


Also true. It will only replace non-ssa names with ssa-names.

This is to avoid lengthening the range of variables.

Replacing ssa names with other ssa names willy-nilly is not always a
win.  We eventually ended up with heuristics to not change loop depths
of ssa names, etc.
Comment 12 sebpop@gmail.com 2007-11-05 06:13:29 UTC
Subject: Re:  [4.0/4.1/4.2/4.3 Regression] Not elimintating the PHIs which have the same arguments

> Replacing ssa names with other ssa names willy-nilly is not always a
> win.  We eventually ended up with heuristics to not change loop depths
> of ssa names, etc.
>

See also PR23821, where we reach the exact same conclusion:
DOM and VRP are playing the replace SSA_NAMEs game, and
we're losing to this game as the substitution is done randomly...
Comment 13 Joseph S. Myers 2008-07-04 21:28:42 UTC
Closing 4.1 branch.
Comment 14 Paolo Bonzini 2009-02-04 07:40:33 UTC
We should make up our mind and close either this one or 23821... they are requesting exactly opposite things.
Comment 15 Richard Biener 2009-02-04 10:14:52 UTC
I think this is bug is a valid request for FRE to eliminate redundant PHI
nodes:

<bb 4>:
  # c_1 = PHI <a_4(D)(2), b_7(D)(3)>
  # d_2 = PHI <a_4(D)(2), b_7(D)(3)>
  D.1599_10 = c_1 + d_2;

There are no two same PHI nodes in PR23821.

I am going to fix this one.
Comment 16 Richard Biener 2009-03-25 13:18:36 UTC
Re-confirmed.  The fix is to eliminate duplicate PHI nodes in eliminate() by
copy-propagating PHI value-numbers.
Comment 17 Joseph S. Myers 2009-03-31 19:41:06 UTC
Closing 4.2 branch.
Comment 18 Richard Biener 2009-04-04 17:57:40 UTC
Patch posted:
http://gcc.gnu.org/ml/gcc-patches/2009-04/msg00311.html

it will replace redundant PHI nodes with a copy, so

<bb 4>:
  # c_1 = PHI <a_4(D)(2), b_7(D)(3)>
  # d_2 = PHI <a_4(D)(2), b_7(D)(3)>
  D.1599_10 = c_1 + d_2;

will become

<bb 4>:
  # c_1 = PHI <a_4(D)(2), b_7(D)(3)>
  d_2 = c_1
  D.1599_10 = c_1 + d_2;

to not immediately trigger PR23821 (of course the next copyprop will
happily propagate c_1 into all uses of d_2).  It does replace single-uses
though, which does not increase register pressure.
Comment 19 Richard Biener 2009-04-06 14:55:47 UTC
Subject: Bug 28868

Author: rguenth
Date: Mon Apr  6 14:55:31 2009
New Revision: 145607

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=145607
Log:
2009-04-06  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/28868
	* tree-ssa-pre.c (inserted_phi_names): New bitmap to keep track
	of which PHI results we inserted.
	(insert_into_preds_of_block): Record inserted PHIs.
	(eliminate): Eliminate redundant PHI nodes.
	(init_pre): Init inserted_phi_names.

	* gcc.dg/tree-ssa/ssa-fre-21.c: New testcase.
	* gcc.dg/tree-ssa/ssa-sccvn-1.c: Adjust.
	* gcc.dg/tree-ssa/ssa-sccvn-2.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sccvn-4.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-23.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-4.c
    trunk/gcc/tree-ssa-pre.c

Comment 20 Richard Biener 2009-04-06 14:56:14 UTC
Fixed for GCC 4.5.0.
Comment 21 Richard Biener 2009-08-04 12:27:55 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 22 Jeffrey A. Law 2010-02-26 01:34:33 UTC
Fixed on the trunk eons ago.  Fix not really suitable for backporting to release branches.