Bug 8781 - Pessimization of C++ (functional) code
Summary: Pessimization of C++ (functional) code
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 3.3
: P3 enhancement
Target Milestone: 4.5.0
Assignee: Richard Biener
URL:
Keywords: alias, missed-optimization
Depends on: 37892
Blocks:
  Show dependency treegraph
 
Reported: 2002-12-02 02:26 UTC by wesslen
Modified: 2017-09-20 21:16 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 3.4.6
Last reconfirmed: 2009-04-03 13:44:16


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wesslen 2002-12-02 02:26:01 UTC
Functional programming style constructs makes g++ generate unneccesary stores.

Release:
gcc version 3.3 20021125 (experimental)

Environment:
CYGWIN_NT-5.1 DANIEL-P4 1.3.17(0.67/3/2) 2002-11-27 18:54 i686 unknown

How-To-Repeat:
A somewhat modified version of the program submitted by AWLaFramboise in http://gcc.gnu.org/ml/gcc/2002-11/msg01012.html follows. Compile this with -O3.

/// begin
int f();

template<typename predicate>
class noop_t {
	const predicate &pred;
public:
	explicit noop_t(const predicate &p) : pred(p) {}

	int operator()() const { return pred(); }
};

template<typename predicate>
inline noop_t<predicate> noop(const predicate pred) {
	return noop_t<predicate>(pred);
}

int x()
{
	return (noop(noop(noop(noop(noop(noop(noop(noop(noop(f)))))))))());
}
/// end

Compiled with -O3, the following code is generated.

.globl __Z1xv
        .def    __Z1xv; .scl    2;      .type   32;     .endef
__Z1xv:
        pushl   %ebp
        movl    %esp, %ebp
        leal    -4(%ebp), %ecx
        leal    -12(%ebp), %eax
        subl    $40, %esp
        movl    %ecx, -12(%ebp)
        leal    -8(%ebp), %edx
        movl    %eax, -16(%ebp)
        leal    -16(%ebp), %ecx
        movl    %edx, -4(%ebp)
        leal    -20(%ebp), %eax
        movl    %ecx, -20(%ebp)
        leal    -32(%ebp), %edx
        movl    %eax, -24(%ebp)
        leal    -24(%ebp), %ecx
        movl    $__Z1fv, -8(%ebp)
        leal    -28(%ebp), %eax
        movl    %ecx, -28(%ebp)
        movl    %eax, -32(%ebp)
        movl    %edx, -36(%ebp)
        call    __Z1fv
        movl    %ebp, %esp
        popl    %ebp
        ret
        .def    __Z1fv; .scl    3;      .type   32;     .endef
Comment 1 Wolfgang Bangerth 2002-12-09 17:48:39 UTC
State-Changed-From-To: open->analyzed
State-Changed-Why: Confirmed. This happens with all versions of gcc since at least 2.95.
    (And I must say I am impressed by gcc's register cycling 
    scheme :-)
Comment 2 Andrew Pinski 2003-11-05 07:25:03 UTC
The problem is that GCC does puts objects on the stack too soon, maybe be able to fix 
this on the tree-ssa branch.
Comment 3 Andrew Pinski 2003-12-01 04:01:24 UTC
SRA did not help this either.
Comment 4 Daniel Berlin 2003-12-01 04:20:36 UTC
This seems more related to references and aliasing than anything else.

Try changing const predicate &pred to const predicate pred, and you'll get:

.globl _Z1xv
        .type   _Z1xv, @function
_Z1xv:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        call    _Z1fv
        leave
        ret
        .size   _Z1xv, .-_Z1xv
Comment 5 Andrew Pinski 2004-01-20 16:54:16 UTC
The problem for the tree-ssa is being discused here: <http://gcc.gnu.org/ml/gcc/2004-01/
msg01419.html>.
Comment 6 Andrew Pinski 2004-03-03 05:14:37 UTC
I think this is caused by the C++ Front-end, I will look to see my patch to stop lowering 
fixes this bug (I think it does).
Comment 7 Andrew Pinski 2004-03-03 05:46:25 UTC
Actually it is not caused by the front-end but it is almost fixed on the tree-ssa now:
  pred<D1940> = f;
  pred<D1947>.pred = &pred<D1940>;
  pred<D1954>.pred = &pred<D1947>;
  pred<D1961>.pred = &pred<D1954>;
  pred<D1968>.pred = &pred<D1961>;
  pred<D1975>.pred = &pred<D1968>;
  pred<D1982>.pred = &pred<D1975>;
  pred<D1989>.pred = &pred<D1982>;
  pred<D1996>.pred = &pred<D1989>;
  <D2053> = f ();

Now all we have to do is remove the stores and this would be fixed.
Comment 8 Andrew Pinski 2004-03-08 07:37:48 UTC
Looks like I was using a different sources for this because it is not fixed at all.
Comment 9 Andrew Pinski 2004-05-23 23:30:56 UTC
Mine, the problem now is there is an extra cast which is wrong, I have a patch to fix this but I need to 
convence everyone is correct as the types are the same really.
Comment 10 Andrew Pinski 2004-06-02 04:45:45 UTC
Much newer and simpler patch: <http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00087.html> for 
at least the cast problem.  But that just gets back to the problem in comment #7.  Which I refer to the 
lowering problem.
Comment 11 Andrew Pinski 2004-08-23 05:24:56 UTC
Not mine anymore, it is an aliasing issue:
  #   V_MUST_DEF <pred<D2020>_54>;
  pred<D2020> = f<D1558>;

....
  #   pred<D2024>_97 = V_MAY_DEF <pred<D2024>_72>;
  #   pred<D2028>_98 = V_MAY_DEF <pred<D2028>_69>;
  #   pred<D2032>_99 = V_MAY_DEF <pred<D2032>_66>;
  #   pred<D2036>_100 = V_MAY_DEF <pred<D2036>_63>;
  #   pred<D2040>_101 = V_MAY_DEF <pred<D2040>_60>;
  #   pred<D2044>_102 = V_MAY_DEF <pred<D2044>_57>;
  #   pred<D2048>_103 = V_MAY_DEF <pred<D2048>_49>;
  #   TMT.31<D2104>_50 = V_MAY_DEF <TMT.31<D2104>_52>;
  #   TMT.32<D2105>_46 = V_MAY_DEF <TMT.32<D2105>_48>;
  #   TMT.33<D2106>_42 = V_MAY_DEF <TMT.33<D2106>_44>;
  #   TMT.34<D2107>_38 = V_MAY_DEF <TMT.34<D2107>_40>;
  #   TMT.35<D2108>_41 = V_MAY_DEF <TMT.35<D2108>_39>;
  #   TMT.36<D2109>_47 = V_MAY_DEF <TMT.36<D2109>_45>;
  #   TMT.37<D2110>_53 = V_MAY_DEF <TMT.37<D2110>_51>;
  #   TMT.38<D2111>_59 = V_MAY_DEF <TMT.38<D2111>_58>;
  #   TMT.39<D2112>_62 = V_MAY_DEF <TMT.39<D2112>_61>;
  #   VUSE <pred<D2020>_54>;
  T.26<D2098> = f ();

I wonder if we should not figure out something about function calls again or something else is going 
wrong.
Comment 12 Giovanni Bajo 2004-08-25 01:57:17 UTC
Diego, this is not a regression, but it would be kind to get a quick analysys 
of what is going wrong and/or spot enhancement opportunities in alias analysys. 
Can you allocate a little time to investigate?
Comment 13 Diego Novillo 2004-08-25 13:45:30 UTC
(In reply to comment #12)
> Diego, this is not a regression, but it would be kind to get a quick analysys 
> of what is going wrong and/or spot enhancement opportunities in alias analysys. 
> Can you allocate a little time to investigate?
>
The most likely problem is that alias analysis is being tricked into thinking
that &pred escapes because it is being squirreled away into 'pred.pred = &pred'.

Since we don't keep structures in SSA form, we immediately give up on addresses
that are stored inside a structure.  Alias analysis sees the store and says 'oh,
I don't know where else that may go so I will just assume that it escapes'. 
There's a FIXME in tree-ssa-alias.c precisely because of this.

Implementing a full-fledge escape analysis framework would probably cure this. 
There may also be in-between solutions to achieve some of the lower hanging fruit.


Diego.
Comment 14 Diego Novillo 2004-08-25 13:58:14 UTC
Subject: Re:  Pessimization of C++ (functional)
	code

On Wed, 2004-08-25 at 09:45, dnovillo at gcc dot gnu dot org wrote:

> Implementing a full-fledge escape analysis framework would probably cure this. 
> There may also be in-between solutions to achieve some of the lower hanging fruit.
> 
I forgot to add.  Daniel's field-sensitive aliasing and/or the
interprocedural PTA may get this correctly.


Diego.

Comment 15 Andrew Pinski 2004-11-24 06:28:25 UTC
The problem here is:

  #   predD.1714_2 = V_MUST_DEF <predD.1714_1>;
  predD.1714 = fD.1555;
  #   predD.1718_36 = V_MAY_DEF <predD.1718_9>;
  predD.1718.predD.1583 = &predD.1714;
  #   predD.1714_41 = V_MAY_DEF <predD.1714_2>;
  #   predD.1718_42 = V_MAY_DEF <predD.1718_36>;  <-- Notice this, this is because we had took its
       address at one point
  #   TMT.1D.1744_45 = V_MAY_DEF <TMT.1D.1744_31>;
  #   TMT.2D.1745_25 = V_MAY_DEF <TMT.2D.1745_30>;
  #   TMT.3D.1746_27 = V_MAY_DEF <TMT.3D.1746_28>;
  #   TMT.4D.1747_33 = V_MAY_DEF <TMT.4D.1747_26>;
  D.1738_32 = f ();
Comment 16 Andrew Pinski 2005-08-03 02:50:11 UTC
hmm, why are things being marked as call clobered when their addresses don't escape at all.

Mostly:
D.1928_31, is dereferenced, its value escapes, points-to anything

which is the variable for the function call, that does not make sense at all.
Comment 17 Andrew Pinski 2005-08-03 03:03:12 UTC
The real problem can be seen with:
int f(void);
int h(void);

int g(int a)
{
  int (*ff)(void) = f;
  int (**g)(void);
  g = &ff;

  (*g)();
  h();
  (*g)();
}

we mark the value which we calling as escaping:
D.1289_5, is dereferenced, its value escapes, points-to anything
D.1289_6, is dereferenced, its value escapes, points-to anything

that is wrong.
Comment 18 Andrew Pinski 2005-08-03 03:35:04 UTC
Even though comment #17 is correct, it is not the full story (I have a fix for the issue in comment #17).  
The real problem is that we don't consider function decls as constants so consider stuff we should not 
as call clobbered as there no way the address of the function can change.

This seems like a big defect in the current tree-ssa aliasing.
Comment 19 Andrew Pinski 2006-01-08 05:32:09 UTC
Hmm, maybe there is really something else going on here:
  struct noop_t<int (*)()>D.1999 D.2016;
  struct noop_t<int (*)()>D.1999 predD.2575;


  predD.2575 = (struct noop_t<int (*)()>D.1999) D.2016;


Hmm, why is there a cast there?  They are the same types.
Comment 20 Richard Biener 2006-04-05 16:05:01 UTC
Because they are not the same:

(gdb) call debug_generic_expr(0xb7e31c94)
struct noop_t<int (*)()>D.2008
(gdb) call debug_generic_expr(0xb7e3505c)
struct noop_t<int (*)()>D.2008

generated by

#0  build1_stat (code=NOP_EXPR, type=0xb7e3505c, node=0xb7d843c8)
    at /space/rguenther/src/svn/gcc/gcc/tree.c:2779
#1  0x085440ad in fold_build1_stat (code=NOP_EXPR, type=0xb7e3505c, 
    op0=0xb7d843c8) at /space/rguenther/src/svn/gcc/gcc/fold-const.c:11099
#2  0x084eaf25 in fold_convert (type=0xb7e3505c, arg=0xb7d843c8)
    at /space/rguenther/src/svn/gcc/gcc/fold-const.c:1996
#3  0x0828659b in setup_one_parameter (id=0xbfffeb54, p=0xb7e38000, 
    value=0xb7d843c8, fn=0xb7e37100, bb=0xb7e400a0, vars=0xbfffe924)
    at /space/rguenther/src/svn/gcc/gcc/tree-inline.c:1069
#4  0x0828692e in initialize_inlined_parameters (id=0xbfffeb54, 
    args=0xb7e36468, static_chain=0x0, fn=0xb7e37100, bb=0xb7e400a0)
    at /space/rguenther/src/svn/gcc/gcc/tree-inline.c:1125
#5  0x08289168 in expand_call_inline (bb=0xb7e400a0, stmt=0xb7d7f1b0, 
    tp=0xb7d7f1d0, data=0xbfffeb54)
(gdb) call cxx_types_compatible_p (0xb7e31c94, 0xb7e3505c)
$1 = 1

(gdb) call debug_tree(0xb7e31c94)
 <record_type 0xb7e31c94 noop_t<int (*)()> needs-constructing type_1 type_5 type_6 SI
    size <integer_cst 0xb7d773d8 type <integer_type 0xb7d8905c bit_size_type> constant invariant 32>
    unit size <integer_cst 0xb7d77168 type <integer_type 0xb7d89000 unsigned int> constant invariant 4>
    align 32 symtab 0 alias set -1
    fields <field_decl 0xb7e31e60 pred
        type <reference_type 0xb7e31ebc type <pointer_type 0xb7e31da8>
            unsigned SI size <integer_cst 0xb7d773d8 32> unit size <integer_cst 0xb7d77168 4>
            align 32 symtab 0 alias set -1>
        readonly used private unsigned nonlocal decl_3 SI file t.C line 5 size <integer_cst 0xb7d773d8 32> unit size <integer_cst 0xb7d77168 4>
        align 32 offset_align 128
        offset <integer_cst 0xb7d77180 constant invariant 0>
        bit offset <integer_cst 0xb7d77948 constant invariant 0> context <record_type 0xb7e31c94 noop_t<int (*)()>>
        chain <type_decl 0xb7dfff08 noop_t type <record_type 0xb7e31c94 noop_t<int (*)()>>
            external nonlocal suppress-debug decl_4 VOID file t.C line 4
            align 8 context <record_type 0xb7e31c94 noop_t<int (*)()>>
           >>
   needs-constructor X(constX&) this=(X&) n_parents=0 use_template=1 interface-unknown
    pointer_to_this <pointer_type 0xb7e31f18> chain <type_decl 0xb7dffea0 noop_t<int (*)()>>>

(gdb) call debug_tree(0xb7e3505c)
 <record_type 0xb7e3505c noop_t<int (*)()> readonly needs-constructing type_1 type_5 SI
    size <integer_cst 0xb7d773d8 type <integer_type 0xb7d8905c bit_size_type> constant invariant 32>
    unit size <integer_cst 0xb7d77168 type <integer_type 0xb7d89000 unsigned int> constant invariant 4>
    align 32 symtab 0 alias set -1
    fields <field_decl 0xb7e31e60 pred
        type <reference_type 0xb7e31ebc type <pointer_type 0xb7e31da8>
            unsigned SI size <integer_cst 0xb7d773d8 32> unit size <integer_cst 0xb7d77168 4>
            align 32 symtab 0 alias set -1>
        readonly used private unsigned nonlocal decl_3 SI file t.C line 5 size <integer_cst 0xb7d773d8 32> unit size <integer_cst 0xb7d77168 4>
        align 32 offset_align 128
        offset <integer_cst 0xb7d77180 constant invariant 0>
        bit offset <integer_cst 0xb7d77948 constant invariant 0> context <record_type 0xb7e31c94 noop_t<int (*)()>>
        chain <type_decl 0xb7dfff08 noop_t type <record_type 0xb7e31c94 noop_t<int (*)()>>
            external nonlocal suppress-debug decl_4 VOID file t.C line 4
            align 8 context <record_type 0xb7e31c94 noop_t<int (*)()>>
           >>
   needs-constructor X(constX&) this=(X&) n_parents=0 use_template=1 interface-unknown
    pointer_to_this <pointer_type 0xb7e350b8> reference_to_this <reference_type 0xb7e353f4>>
Comment 21 Richard Biener 2006-04-05 16:11:16 UTC
The main difference is the TYPE_DEPENDENT_P_VALID valid lang-type flag.  So this looks like a frontend problem.
Comment 22 Richard Biener 2006-10-24 12:16:04 UTC
Still bogus at the tree level as in comment #11, but fixed by RTL optimizers:

_Z1xv:
.LFB5:
        pushl   %ebp
.LCFI0:
        movl    %esp, %ebp
.LCFI1:
        subl    $24, %esp
.LCFI2:
        movl    $_Z1fv, -4(%ebp)
        leal    -8(%ebp), %eax
        movl    %eax, -8(%ebp)
        call    _Z1fv
        leave
        ret

the above is from mainline, with 4.1.1 I get

.LCFI2:
        movl    $_Z1fv, -4(%ebp)
        leal    -8(%ebp), %eax
        movl    %eax, -8(%ebp)
        call    *%eax

so there's an indirect call left.

3.4.6 is the last I can reproduce the completely bogus asm.  So, fixed in 4.2?
Comment 23 Richard Biener 2009-04-03 11:37:53 UTC
On trunk we get the following after early optimizations:

<bb 2>:
  predD.2646 = fD.2070;
  predD.2653.predD.2098 = &predD.2646;
  predD.2660.predD.2116 = &predD.2653;
  predD.2667.predD.2134 = &predD.2660;
  predD.2674.predD.2152 = &predD.2667;
  predD.2681.predD.2170 = &predD.2674;
  predD.2688.predD.2188 = &predD.2681;
  predD.2695.predD.2206 = &predD.2688;
  predD.2702.predD.2224 = &predD.2695;
  D.2720_24 = predD.2702.predD.2224;
  D.2728_25 = D.2720_24->predD.2206;
  D.2721_26 = D.2728_25->predD.2188;
  D.2727_27 = D.2721_26->predD.2170;
  D.2722_28 = D.2727_27->predD.2152;
  D.2726_29 = D.2722_28->predD.2134;
  D.2723_30 = D.2726_29->predD.2116;
  D.2725_31 = D.2723_30->predD.2098;
  D.2724_32 = *D.2725_31;
  D.2719_33 = D.2724_32 ();
  return D.2719_33;

now, while SCCVN figures out that predD.2702.predD.2224 loads &predD.2695 it
is not able to cascade here, which is because we do not "simplify" the
vn references after substituting the value-numbers (PR37892).
Comment 24 Richard Biener 2009-04-03 13:44:16 UTC
I have a patch.
Comment 25 Richard Biener 2009-04-04 09:34:44 UTC
Subject: Bug 8781

Author: rguenth
Date: Sat Apr  4 09:34:32 2009
New Revision: 145533

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=145533
Log:
2009-04-04  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/8781
	PR tree-optimization/37892
	* tree-ssa-sccvn.h (vn_reference_fold_indirect): Declare.
	* tree-ssa-sccvn.c (vn_reference_fold_indirect): New function.
	(valueize_refs): Call it for *& valueizations.
	(shared_reference_ops_from_ref): Rename to ...
	(valueize_shared_reference_ops_from_ref): ... this and valueize.
	(shared_reference_ops_from_call): Rename to ...
	(valueize_shared_reference_ops_from_call): ... this and valueize.
	(vn_reference_lookup): Update.
	(visit_reference_op_call): Likewise.
	* tree-ssa-pre.c (phi_translate_1): Fold *&.
	(eliminate): Value-replace the call address in call statements.

	* g++.dg/tree-ssa/pr8781.C: New testcase.
	* gcc.dg/tree-ssa/ssa-pre-25.c: Likewise.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr8781.C
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-25.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-ssa-sccvn.c
    trunk/gcc/tree-ssa-sccvn.h

Comment 26 Richard Biener 2009-04-04 09:35:12 UTC
Fixed.
Comment 27 Jakub Jelinek 2017-09-20 21:16:11 UTC
This testcase looks invalid to me, trying to bind a reference to a function parameter and returning that reference to the caller is a use after scope.
When