Functional programming style constructs makes g++ generate unneccesary stores. Release: gcc version 3.3 20021125 (experimental) Environment: CYGWIN_NT-5.1 DANIEL-P4 1.3.17(0.67/3/2) 2002-11-27 18:54 i686 unknown How-To-Repeat: A somewhat modified version of the program submitted by AWLaFramboise in http://gcc.gnu.org/ml/gcc/2002-11/msg01012.html follows. Compile this with -O3. /// begin int f(); template<typename predicate> class noop_t { const predicate &pred; public: explicit noop_t(const predicate &p) : pred(p) {} int operator()() const { return pred(); } }; template<typename predicate> inline noop_t<predicate> noop(const predicate pred) { return noop_t<predicate>(pred); } int x() { return (noop(noop(noop(noop(noop(noop(noop(noop(noop(f)))))))))()); } /// end Compiled with -O3, the following code is generated. .globl __Z1xv .def __Z1xv; .scl 2; .type 32; .endef __Z1xv: pushl %ebp movl %esp, %ebp leal -4(%ebp), %ecx leal -12(%ebp), %eax subl $40, %esp movl %ecx, -12(%ebp) leal -8(%ebp), %edx movl %eax, -16(%ebp) leal -16(%ebp), %ecx movl %edx, -4(%ebp) leal -20(%ebp), %eax movl %ecx, -20(%ebp) leal -32(%ebp), %edx movl %eax, -24(%ebp) leal -24(%ebp), %ecx movl $__Z1fv, -8(%ebp) leal -28(%ebp), %eax movl %ecx, -28(%ebp) movl %eax, -32(%ebp) movl %edx, -36(%ebp) call __Z1fv movl %ebp, %esp popl %ebp ret .def __Z1fv; .scl 3; .type 32; .endef
State-Changed-From-To: open->analyzed State-Changed-Why: Confirmed. This happens with all versions of gcc since at least 2.95. (And I must say I am impressed by gcc's register cycling scheme :-)
The problem is that GCC does puts objects on the stack too soon, maybe be able to fix this on the tree-ssa branch.
SRA did not help this either.
This seems more related to references and aliasing than anything else. Try changing const predicate &pred to const predicate pred, and you'll get: .globl _Z1xv .type _Z1xv, @function _Z1xv: pushl %ebp movl %esp, %ebp subl $8, %esp call _Z1fv leave ret .size _Z1xv, .-_Z1xv
The problem for the tree-ssa is being discused here: <http://gcc.gnu.org/ml/gcc/2004-01/ msg01419.html>.
I think this is caused by the C++ Front-end, I will look to see my patch to stop lowering fixes this bug (I think it does).
Actually it is not caused by the front-end but it is almost fixed on the tree-ssa now: pred<D1940> = f; pred<D1947>.pred = &pred<D1940>; pred<D1954>.pred = &pred<D1947>; pred<D1961>.pred = &pred<D1954>; pred<D1968>.pred = &pred<D1961>; pred<D1975>.pred = &pred<D1968>; pred<D1982>.pred = &pred<D1975>; pred<D1989>.pred = &pred<D1982>; pred<D1996>.pred = &pred<D1989>; <D2053> = f (); Now all we have to do is remove the stores and this would be fixed.
Looks like I was using a different sources for this because it is not fixed at all.
Mine, the problem now is there is an extra cast which is wrong, I have a patch to fix this but I need to convence everyone is correct as the types are the same really.
Much newer and simpler patch: <http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00087.html> for at least the cast problem. But that just gets back to the problem in comment #7. Which I refer to the lowering problem.
Not mine anymore, it is an aliasing issue: # V_MUST_DEF <pred<D2020>_54>; pred<D2020> = f<D1558>; .... # pred<D2024>_97 = V_MAY_DEF <pred<D2024>_72>; # pred<D2028>_98 = V_MAY_DEF <pred<D2028>_69>; # pred<D2032>_99 = V_MAY_DEF <pred<D2032>_66>; # pred<D2036>_100 = V_MAY_DEF <pred<D2036>_63>; # pred<D2040>_101 = V_MAY_DEF <pred<D2040>_60>; # pred<D2044>_102 = V_MAY_DEF <pred<D2044>_57>; # pred<D2048>_103 = V_MAY_DEF <pred<D2048>_49>; # TMT.31<D2104>_50 = V_MAY_DEF <TMT.31<D2104>_52>; # TMT.32<D2105>_46 = V_MAY_DEF <TMT.32<D2105>_48>; # TMT.33<D2106>_42 = V_MAY_DEF <TMT.33<D2106>_44>; # TMT.34<D2107>_38 = V_MAY_DEF <TMT.34<D2107>_40>; # TMT.35<D2108>_41 = V_MAY_DEF <TMT.35<D2108>_39>; # TMT.36<D2109>_47 = V_MAY_DEF <TMT.36<D2109>_45>; # TMT.37<D2110>_53 = V_MAY_DEF <TMT.37<D2110>_51>; # TMT.38<D2111>_59 = V_MAY_DEF <TMT.38<D2111>_58>; # TMT.39<D2112>_62 = V_MAY_DEF <TMT.39<D2112>_61>; # VUSE <pred<D2020>_54>; T.26<D2098> = f (); I wonder if we should not figure out something about function calls again or something else is going wrong.
Diego, this is not a regression, but it would be kind to get a quick analysys of what is going wrong and/or spot enhancement opportunities in alias analysys. Can you allocate a little time to investigate?
(In reply to comment #12) > Diego, this is not a regression, but it would be kind to get a quick analysys > of what is going wrong and/or spot enhancement opportunities in alias analysys. > Can you allocate a little time to investigate? > The most likely problem is that alias analysis is being tricked into thinking that &pred escapes because it is being squirreled away into 'pred.pred = &pred'. Since we don't keep structures in SSA form, we immediately give up on addresses that are stored inside a structure. Alias analysis sees the store and says 'oh, I don't know where else that may go so I will just assume that it escapes'. There's a FIXME in tree-ssa-alias.c precisely because of this. Implementing a full-fledge escape analysis framework would probably cure this. There may also be in-between solutions to achieve some of the lower hanging fruit. Diego.
Subject: Re: Pessimization of C++ (functional) code On Wed, 2004-08-25 at 09:45, dnovillo at gcc dot gnu dot org wrote: > Implementing a full-fledge escape analysis framework would probably cure this. > There may also be in-between solutions to achieve some of the lower hanging fruit. > I forgot to add. Daniel's field-sensitive aliasing and/or the interprocedural PTA may get this correctly. Diego.
The problem here is: # predD.1714_2 = V_MUST_DEF <predD.1714_1>; predD.1714 = fD.1555; # predD.1718_36 = V_MAY_DEF <predD.1718_9>; predD.1718.predD.1583 = &predD.1714; # predD.1714_41 = V_MAY_DEF <predD.1714_2>; # predD.1718_42 = V_MAY_DEF <predD.1718_36>; <-- Notice this, this is because we had took its address at one point # TMT.1D.1744_45 = V_MAY_DEF <TMT.1D.1744_31>; # TMT.2D.1745_25 = V_MAY_DEF <TMT.2D.1745_30>; # TMT.3D.1746_27 = V_MAY_DEF <TMT.3D.1746_28>; # TMT.4D.1747_33 = V_MAY_DEF <TMT.4D.1747_26>; D.1738_32 = f ();
hmm, why are things being marked as call clobered when their addresses don't escape at all. Mostly: D.1928_31, is dereferenced, its value escapes, points-to anything which is the variable for the function call, that does not make sense at all.
The real problem can be seen with: int f(void); int h(void); int g(int a) { int (*ff)(void) = f; int (**g)(void); g = &ff; (*g)(); h(); (*g)(); } we mark the value which we calling as escaping: D.1289_5, is dereferenced, its value escapes, points-to anything D.1289_6, is dereferenced, its value escapes, points-to anything that is wrong.
Even though comment #17 is correct, it is not the full story (I have a fix for the issue in comment #17). The real problem is that we don't consider function decls as constants so consider stuff we should not as call clobbered as there no way the address of the function can change. This seems like a big defect in the current tree-ssa aliasing.
Hmm, maybe there is really something else going on here: struct noop_t<int (*)()>D.1999 D.2016; struct noop_t<int (*)()>D.1999 predD.2575; predD.2575 = (struct noop_t<int (*)()>D.1999) D.2016; Hmm, why is there a cast there? They are the same types.
Because they are not the same: (gdb) call debug_generic_expr(0xb7e31c94) struct noop_t<int (*)()>D.2008 (gdb) call debug_generic_expr(0xb7e3505c) struct noop_t<int (*)()>D.2008 generated by #0 build1_stat (code=NOP_EXPR, type=0xb7e3505c, node=0xb7d843c8) at /space/rguenther/src/svn/gcc/gcc/tree.c:2779 #1 0x085440ad in fold_build1_stat (code=NOP_EXPR, type=0xb7e3505c, op0=0xb7d843c8) at /space/rguenther/src/svn/gcc/gcc/fold-const.c:11099 #2 0x084eaf25 in fold_convert (type=0xb7e3505c, arg=0xb7d843c8) at /space/rguenther/src/svn/gcc/gcc/fold-const.c:1996 #3 0x0828659b in setup_one_parameter (id=0xbfffeb54, p=0xb7e38000, value=0xb7d843c8, fn=0xb7e37100, bb=0xb7e400a0, vars=0xbfffe924) at /space/rguenther/src/svn/gcc/gcc/tree-inline.c:1069 #4 0x0828692e in initialize_inlined_parameters (id=0xbfffeb54, args=0xb7e36468, static_chain=0x0, fn=0xb7e37100, bb=0xb7e400a0) at /space/rguenther/src/svn/gcc/gcc/tree-inline.c:1125 #5 0x08289168 in expand_call_inline (bb=0xb7e400a0, stmt=0xb7d7f1b0, tp=0xb7d7f1d0, data=0xbfffeb54) (gdb) call cxx_types_compatible_p (0xb7e31c94, 0xb7e3505c) $1 = 1 (gdb) call debug_tree(0xb7e31c94) <record_type 0xb7e31c94 noop_t<int (*)()> needs-constructing type_1 type_5 type_6 SI size <integer_cst 0xb7d773d8 type <integer_type 0xb7d8905c bit_size_type> constant invariant 32> unit size <integer_cst 0xb7d77168 type <integer_type 0xb7d89000 unsigned int> constant invariant 4> align 32 symtab 0 alias set -1 fields <field_decl 0xb7e31e60 pred type <reference_type 0xb7e31ebc type <pointer_type 0xb7e31da8> unsigned SI size <integer_cst 0xb7d773d8 32> unit size <integer_cst 0xb7d77168 4> align 32 symtab 0 alias set -1> readonly used private unsigned nonlocal decl_3 SI file t.C line 5 size <integer_cst 0xb7d773d8 32> unit size <integer_cst 0xb7d77168 4> align 32 offset_align 128 offset <integer_cst 0xb7d77180 constant invariant 0> bit offset <integer_cst 0xb7d77948 constant invariant 0> context <record_type 0xb7e31c94 noop_t<int (*)()>> chain <type_decl 0xb7dfff08 noop_t type <record_type 0xb7e31c94 noop_t<int (*)()>> external nonlocal suppress-debug decl_4 VOID file t.C line 4 align 8 context <record_type 0xb7e31c94 noop_t<int (*)()>> >> needs-constructor X(constX&) this=(X&) n_parents=0 use_template=1 interface-unknown pointer_to_this <pointer_type 0xb7e31f18> chain <type_decl 0xb7dffea0 noop_t<int (*)()>>> (gdb) call debug_tree(0xb7e3505c) <record_type 0xb7e3505c noop_t<int (*)()> readonly needs-constructing type_1 type_5 SI size <integer_cst 0xb7d773d8 type <integer_type 0xb7d8905c bit_size_type> constant invariant 32> unit size <integer_cst 0xb7d77168 type <integer_type 0xb7d89000 unsigned int> constant invariant 4> align 32 symtab 0 alias set -1 fields <field_decl 0xb7e31e60 pred type <reference_type 0xb7e31ebc type <pointer_type 0xb7e31da8> unsigned SI size <integer_cst 0xb7d773d8 32> unit size <integer_cst 0xb7d77168 4> align 32 symtab 0 alias set -1> readonly used private unsigned nonlocal decl_3 SI file t.C line 5 size <integer_cst 0xb7d773d8 32> unit size <integer_cst 0xb7d77168 4> align 32 offset_align 128 offset <integer_cst 0xb7d77180 constant invariant 0> bit offset <integer_cst 0xb7d77948 constant invariant 0> context <record_type 0xb7e31c94 noop_t<int (*)()>> chain <type_decl 0xb7dfff08 noop_t type <record_type 0xb7e31c94 noop_t<int (*)()>> external nonlocal suppress-debug decl_4 VOID file t.C line 4 align 8 context <record_type 0xb7e31c94 noop_t<int (*)()>> >> needs-constructor X(constX&) this=(X&) n_parents=0 use_template=1 interface-unknown pointer_to_this <pointer_type 0xb7e350b8> reference_to_this <reference_type 0xb7e353f4>>
The main difference is the TYPE_DEPENDENT_P_VALID valid lang-type flag. So this looks like a frontend problem.
Still bogus at the tree level as in comment #11, but fixed by RTL optimizers: _Z1xv: .LFB5: pushl %ebp .LCFI0: movl %esp, %ebp .LCFI1: subl $24, %esp .LCFI2: movl $_Z1fv, -4(%ebp) leal -8(%ebp), %eax movl %eax, -8(%ebp) call _Z1fv leave ret the above is from mainline, with 4.1.1 I get .LCFI2: movl $_Z1fv, -4(%ebp) leal -8(%ebp), %eax movl %eax, -8(%ebp) call *%eax so there's an indirect call left. 3.4.6 is the last I can reproduce the completely bogus asm. So, fixed in 4.2?
On trunk we get the following after early optimizations: <bb 2>: predD.2646 = fD.2070; predD.2653.predD.2098 = &predD.2646; predD.2660.predD.2116 = &predD.2653; predD.2667.predD.2134 = &predD.2660; predD.2674.predD.2152 = &predD.2667; predD.2681.predD.2170 = &predD.2674; predD.2688.predD.2188 = &predD.2681; predD.2695.predD.2206 = &predD.2688; predD.2702.predD.2224 = &predD.2695; D.2720_24 = predD.2702.predD.2224; D.2728_25 = D.2720_24->predD.2206; D.2721_26 = D.2728_25->predD.2188; D.2727_27 = D.2721_26->predD.2170; D.2722_28 = D.2727_27->predD.2152; D.2726_29 = D.2722_28->predD.2134; D.2723_30 = D.2726_29->predD.2116; D.2725_31 = D.2723_30->predD.2098; D.2724_32 = *D.2725_31; D.2719_33 = D.2724_32 (); return D.2719_33; now, while SCCVN figures out that predD.2702.predD.2224 loads &predD.2695 it is not able to cascade here, which is because we do not "simplify" the vn references after substituting the value-numbers (PR37892).
I have a patch.
Subject: Bug 8781 Author: rguenth Date: Sat Apr 4 09:34:32 2009 New Revision: 145533 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=145533 Log: 2009-04-04 Richard Guenther <rguenther@suse.de> PR tree-optimization/8781 PR tree-optimization/37892 * tree-ssa-sccvn.h (vn_reference_fold_indirect): Declare. * tree-ssa-sccvn.c (vn_reference_fold_indirect): New function. (valueize_refs): Call it for *& valueizations. (shared_reference_ops_from_ref): Rename to ... (valueize_shared_reference_ops_from_ref): ... this and valueize. (shared_reference_ops_from_call): Rename to ... (valueize_shared_reference_ops_from_call): ... this and valueize. (vn_reference_lookup): Update. (visit_reference_op_call): Likewise. * tree-ssa-pre.c (phi_translate_1): Fold *&. (eliminate): Value-replace the call address in call statements. * g++.dg/tree-ssa/pr8781.C: New testcase. * gcc.dg/tree-ssa/ssa-pre-25.c: Likewise. Added: trunk/gcc/testsuite/g++.dg/tree-ssa/pr8781.C trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-25.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-ssa-sccvn.c trunk/gcc/tree-ssa-sccvn.h
Fixed.
This testcase looks invalid to me, trying to bind a reference to a function parameter and returning that reference to the caller is a use after scope. When