Bug 19431 - missed optimization with ifs and deferencing
Summary: missed optimization with ifs and deferencing
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 enhancement
Target Milestone: 4.3.0
Assignee: Richard Biener
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: missed-optimization
: 30261 30738 (view as bug list)
Depends on:
Blocks: 21463 22501
  Show dependency treegraph
 
Reported: 2005-01-13 21:38 UTC by Andrew Pinski
Modified: 2007-04-18 19:21 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-12-20 09:58:09


Attachments
patch to fix testcase in comment 10 (513 bytes, patch)
2007-02-25 22:51 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2005-01-13 21:38:34 UTC
I found this while looking into PR 8361 for missed optimization.
The following two programs should produce the same asm:
int f(int k, int i1, int j1)
{
  int *f1;
  if(k)
   f1 = &i1;
  else
   f1 = &j1;
  return *f1;
}
int g(int k, int i1, int j1)
{
  int *f1;
  if(k)
   return i1;
  else
   return j1;
}
Comment 1 Steven Bosscher 2005-01-13 21:41:59 UTC
...and of course you are going to show what you *actually* get now? 
 
Comment 2 Andrew Pinski 2005-01-13 21:43:44 UTC
This is what I get with the mainline on ppc-darwin:
_f:
        cmpwi cr7,r3,0
        stw r4,28(r1)
        stw r5,32(r1)
        addi r3,r1,28
        beq- cr7,L7
        lwz r3,0(r3)
        blr
L7:
        addi r3,r1,32
        lwz r3,0(r3)
        blr
        .align 2
        .globl _g
_g:
        cmpwi cr7,r3,0
        mr r3,r4
        bnelr+ cr7
        mr r3,r5
        blr

Note the stores in the f case but non otherwise.
Comment 3 Andrew Pinski 2005-01-13 21:54:24 UTC
This comes from "std::_Deque_base<_Tp, _Alloc>::_M_initialize_map" in libstdc++.

Here is the last SSA tree dump:
f (k, i1, j1)
{
  int * f1;
  int D.1116;

<bb 0>:
  if (k_2 != 0) goto <L2>; else goto <L4>;

<L4>:;

  # f1_1 = PHI <&i1(0), &j1(1)>;
<L2>:;
  #   VUSE <i1_7>;
  #   VUSE <j1_8>;
  D.1116_3 = *f1_1;
  return D.1116_3;

}

And the .final_cleanup one:
f (k, i1, j1)
{
  int * f1;

<bb 0>:
  if (k != 0) goto <L5>; else goto <L4>;

<L5>:;
  f1 = &i1;
  goto <bb 3> (<L2>);

<L4>:;
  f1 = &j1;

<L2>:;
  return *f1;

}
Comment 4 Daniel Berlin 2005-01-15 00:22:02 UTC
Subject: Re:  New: missed optimization with
	ifs and deferencing

On Thu, 2005-01-13 at 21:38 +0000, pinskia at gcc dot gnu dot org wrote:
> I found this while looking into PR 8361 for missed optimization.
> The following two programs should produce the same asm:
> int f(int k, int i1, int j1)
> {
>   int *f1;
>   if(k)
>    f1 = &i1;
>   else
>    f1 = &j1;
>   return *f1;
> }


The only way you are going to get this to optimize is to make it see we
are derefencing a phi whose arguments are all is_gimple_min_invariant,
substitute the value in, merge it, and then use the that at the
derefence point.

This is actually just a special case of noticing that we are really
doing *&<something> in both cases, and removing the *& part, while
leaving the rest.


IE transform it like so

Step 0:
f (k, i1, j1)
{
  int * f1;
  int D.1116;

<bb 0>:
  if (k_2 != 0) goto <L2>; else goto <L4>;

<L4>:;

  # f1_1 = PHI <&i1(0), &j1(1)>;
<L2>:;
  #   VUSE <i1_7>;
  #   VUSE <j1_8>;
  D.1116_3 = *f1_1;
  return D.1116_3;

}
Step 1 (new blocks added for clarity)

f (k, i1, j1)
{
  int * f1;
  int D.1116;

<bb 0>:
  if (k_2 != 0) goto <L3>; else goto <L4>;

<L4>:;
newtemp_2 = i1;
goto L2

<L3>
newtemp_1 = j1;

f1_1 = PHI<&i1(0), &j1(1)>
<L2>:;
  #   VUSE <i1_7>;
  #   VUSE <j1_8>;
  D.1116_3 = *f1_1;
  return D.1116_3;

}

Step 2:
f (k, i1, j1)
{
  int * f1;
  int D.1116;

<bb 0>:
  if (k_2 != 0) goto <L3>; else goto <L4>;

<L4>:;
newtemp_2 = i1;
goto L2

<L3>
newtemp_1 = j1;

newphi = PHI<newtemp_2(0), newtemp_1(1)>
<L2>:;
  #   VUSE <i1_7>;
  #   VUSE <j1_8>;
  D.1116_3 = *f1_1;
  return D.1116_3;

}

Step 3:
f (k, i1, j1)
{
  int * f1;
  int D.1116;

<bb 0>:
  if (k_2 != 0) goto <L3>; else goto <L4>;

<L4>:;
newtemp_2 = i1;
goto L2

<L3>
newtemp_1 = j1;

newphi = PHI<newtemp_2(0), newtemp_1(1)>
<L2>:;
  D.1116_3 = newphi;
  return D.1116_3;

}

Notice that all i've done is simply match up that the dereference was of
a phi node whoes arguments are all &something.
In that specific case, we can remove all the addressing operations and
the dereference use, and just replace it with the values.



Comment 5 Andrew Pinski 2006-02-05 15:37:46 UTC
I should mention this shows up with std::min/std::max like:

int main(int argc)
{
  int a = std::min(8, argc*2);
  return a;
}
Comment 6 Andrew Pinski 2006-12-19 21:16:11 UTC
*** Bug 30261 has been marked as a duplicate of this bug. ***
Comment 7 Steven Bosscher 2006-12-20 09:58:09 UTC
Maybe we can do something with this in PHIopt...
Comment 8 Andrew Pinski 2007-02-09 14:46:01 UTC
*** Bug 30738 has been marked as a duplicate of this bug. ***
Comment 9 Richard Biener 2007-02-09 14:48:38 UTC
I have a patch^Whack.
Comment 10 Pawel Sikora 2007-02-25 19:00:07 UTC
one more testcase:

#include <tr1/functional>
#include <algorithm>
extern void assign( long* variable, long v )
{
        std::transform( variable, variable + 1, variable,
                std::tr1::bind( std::plus< long >(), 0L, v ) );
}
extern void assign( long& variable, long v )
{
        std::transform( &variable, &variable + 1, &variable,
                std::tr1::bind( std::plus< long >(), 0L, v ) );
}

at -O3 gcc 4.2 produces:

assign(long&, long):
        movq    -40(%rsp), %rax
        movq    %rsi, -24(%rsp)
        movq    $0, -32(%rsp)
        movq    $0, -64(%rsp)
        movq    %rsi, -56(%rsp)
        movq    %rsi, (%rdi)
        movq    %rax, -72(%rsp)
        ret

assign(long*, long):
        movq    %rsi, (%rdi)
        ret
Comment 11 Richard Biener 2007-02-25 19:52:36 UTC
Bonus points if you can make that self-contained ;)
Comment 12 Pawel Sikora 2007-02-25 20:09:22 UTC
(In reply to comment #11)
> Bonus points if you can make that self-contained ;)

-fdump-tree-optimized shows the same code for both variants
and there is the if-with-dereferencing :)

(...)
  if (variable != variable + 8B) goto <L6>; else goto <L2>;
(...)
Comment 13 Andrew Pinski 2007-02-25 22:01:45 UTC
>   if (variable != variable + 8B) goto <L6>; else goto <L2>;

That issue was fixed just yesterday by PR 30951.
Comment 14 Richard Biener 2007-02-25 22:23:39 UTC
Not really as this is only exposed by TER.  Though I remember forwprop catching similiar stuff before.  I'll have a look.
Comment 15 Richard Biener 2007-02-25 22:50:09 UTC
We start with

  D.59982_3 = variable_2(D) + 4B;
  __unary_op = __unary_op.65;
  goto <bb 4> (<L1>);

<L0>:;
  D.61146_10 = __unary_op._M_arg1;
  D.61147_11 = __unary_op._M_arg2;
  D.61093_12 = D.61146_10 + D.61147_11;
  *variable_13 = D.61093_12;
  variable_14 = variable_7 + 4B;
  variable_15 = variable_13 + 4B;

  # variable_7 = PHI <variable_2(D)(2), variable_14(3)>
  # variable_13 = PHI <variable_2(D)(2), variable_15(3)>
<L1>:;
  if (D.59982_3 != variable_7) goto <L0>; else goto <L2>;

and loop header copying and DOM make it optimizable:

  D.59982_3 = variable_2(D) + 4B;
  __unary_op.65._M_arg2 = v_1(D);
  __unary_op.65._M_arg1 = 0;
  __unary_op = __unary_op.65;
  __unary_op$_M_arg2_54 = __unary_op._M_arg2;
  __unary_op$_M_arg1_53 = __unary_op._M_arg1;
  if (variable_2(D) != D.59982_3) goto <L7>; else goto <L2>;

<L2>:;
  return;

  # variable_49 = PHI <variable_2(D)(2)>
  # variable_27 = PHI <variable_2(D)(2)>
<L7>:;
  __unary_op$_M_arg1_25 = __unary_op$_M_arg1_53;
  __unary_op$_M_arg2_17 = __unary_op$_M_arg2_54;
  D.61093_18 = __unary_op$_M_arg1_53 + __unary_op$_M_arg2_54;
  *variable_49 = D.61093_18;
  variable_20 = variable_27 + 4B;
  variable_21 = variable_49 + 4B;
  goto <bb 3> (<L2>);

so a tweaked forwprop fixes it at least before PRE.  But dunno if we want
to do tree combining on conditions here.
Comment 16 Richard Biener 2007-02-25 22:51:19 UTC
Created attachment 13109 [details]
patch to fix testcase in comment 10
Comment 17 Richard Biener 2007-02-26 10:33:14 UTC
I split the issue in comment #10 to PR30965 as it is a slightly different issue.
Comment 18 Richard Biener 2007-04-18 12:45:29 UTC
Subject: Bug 19431

Author: rguenth
Date: Wed Apr 18 12:45:09 2007
New Revision: 123946

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123946
Log:
2007-04-18  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/19431
	PR tree-optimization/21463
	* tree-pass.h (pass_phiprop): Declare.
	* passes.c (init_optimization_passes): New phiprop pass.
	* tree-ssa-forwprop.c (struct phiprop_d): New structure.
	(phivn_valid_p): New helper function.
	(phiprop_insert_phi): Likewise.
	(propagate_with_phi): Likewise.
	(tree_ssa_phiprop): New propagator propagating loads
	through phi nodes if profitable.

	* gcc.c-torture/execute/20070212-1.c: New testcase.
	* gcc.c-torture/execute/20070212-2.c: Likewise.
	* gcc.c-torture/execute/20070212-3.c: Likewise.
	* gcc.dg/tree-ssa/pr19431.c: Likewise.
	* gcc.dg/tree-ssa/pr21463.c: Likewise.
	* g++.dg/tree-ssa/pr21463.C: Likewise.
	* g++.dg/tree-ssa/pr30738.C: Likewise.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr21463.C
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr30738.C
    trunk/gcc/testsuite/gcc.c-torture/execute/20070212-1.c
    trunk/gcc/testsuite/gcc.c-torture/execute/20070212-2.c
    trunk/gcc/testsuite/gcc.c-torture/execute/20070212-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr19431.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/passes.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-pass.h
    trunk/gcc/tree-ssa-forwprop.c

Comment 19 Richard Biener 2007-04-18 12:48:13 UTC
Fixed.
Comment 20 Pawel Sikora 2007-04-18 13:02:29 UTC
(In reply to comment #19)
> Fixed.
> 

will it be backported to 4.2?
it improves perf. of big stl-based apps.
Comment 21 Richard Biener 2007-04-18 14:45:22 UTC
Ha, sure not ;)
Comment 22 Pawel Sikora 2007-04-18 19:18:17 UTC
(In reply to comment #21)
> Ha, sure not ;)
> 

and wait another years for 4.3 release. it sux.
Comment 23 Andrew Pinski 2007-04-18 19:21:40 UTC
(In reply to comment #22)
> and wait another years for 4.3 release. it sux.

GCC 4.3 is the one of the few compilers to do this optimization.