This bug is similar to bug 16405, but although 16405 was fixed in 4.0, this one is still present and is a regression from GCC 3.4 (not from 3.3 as was the previous one). So I prefer opening a new bug-report. The testcase simply calls a function f by passing the parameter by value: struct A { int a[1000]; //A(A const &); }; void f(A); void g(A *a) { f(*a); } When compiled with gcc 3.3 and 3.4, the generated code for g is optimal: the value *a is directly copied in the stack frame that will be used by f. With gcc 4.0, there is first a temporary copy in the stack frame of g, before copying the value in the stack frame of f (two memcpys instead of one). When putting a dummy copy constructor, both memcpys disappear: the code is optimal. So the problem seems to be with the default trivial copy constructor. The testcase is compiled with "g++-4.0 -O3", Debian package: Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --enable-nls --without-included-gettext --enable-threads=posix --program-suffix=-4.0 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr --disable-werror --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.0.2 20050806 (prerelease) (Debian 4.0.1-4)
The problem is, we end up with void g(A*) (a) { struct A D.1608; <bb 0>: D.1608 = *a; f (D.1608) [tail call]; return; } after the tree optimizers. f (*a) would not be gimple, so we create the temporary in the first place. TER does not remove this wart, neither does expand - so we start with two memcpys after RTL expansion. This is definitively different from PR16405.
Confirmed. (In reply to comment #1) > after the tree optimizers. f (*a) would not be gimple, so we create > the temporary in the first place. TER does not remove this wart, > neither does expand - so we start with two memcpys after RTL expansion. TER only works on scalars so it cannot work.
Why doesn't this happen with the copy constructor, then? there we should be calling the copyctor with *a, which would have the same problem.
With the copy ctor we end up with void g(A*) (a) { struct A D.1603; <bb 0>: __comp_ctor (&D.1603, a); f (&D.1603); return; } which confuses me a bit, because here the prototype of f looks like effectively void f(A*); do we use ABI information here, but not in the other case? The C++ frontend in this case presents us with { <<cleanup_point <<< Unknown tree: expr_stmt f (&TARGET_EXPR <D.1603, <<< Unknown tree: aggr_init_expr __comp_ctor 0B, (struct A &) (struct A *) NON_LVALUE_EXPR <a> D.1603 >>> >) >>> >>; } where in the case w/o the copy ctor we have <<cleanup_point <<< Unknown tree: expr_stmt f (TARGET_EXPR <D.1608, *(struct A &) (struct A *) NON_LVALUE_EXPR <a>>) >>> >>; is there some different wording about by-value parameter passing with or without explicit copy ctor in the C++ standard?! I.e., why isn't the above <<cleanup_point <<< Unknown tree: expr_stmt f (&TARGET_EXPR <D.1608, *(struct A &) (struct A *) NON_LVALUE_EXPR <a>>) >>> >>; ?
(In reply to comment #4) > which confuses me a bit, because here the prototype of f looks like > effectively > > void f(A*); No that is correct as it turns the class into a non pod and non pods are always passed via reference and not via value.
Indeed - adding a destructor (or anything else that makes it a non-POD) "fixes" the problem, too.
The best place to fix this is probably still the expander or TER. Or out-of-ssa, where the necessary information is best present. Or fix gimple and gimplification.
Looking at it again, I found an even worse regression with respect to g++ 3.4. Consider this testcase: struct A { int a[1000]; } A f(); void g(A); void h() { g(f()); } Ideally, h will allocate a stack frame for g and ask f to directly dump its result in it. No temporary nor memcpy will be used at all. g++ 3.4 behaves this way. g++ 4.0 however will first allocate some space for the result of f, then call f and copy its result in another temporary, and finally it will allocate the stack frame for g and copy the temporary in it. Two temporaries and two memcpys are needed for g++ 4.0. So the same issue arises when returning a result by value.
For your last testcase, struct A { int a[1000]; } A f(); void g(A); void h() { g(f()); } all of 3.4 and 4.1 produce exactly two temporaries. One to dump the result of f(), which get's copied to a new temp passed to g(). 4.0 though produces one extra unnecessary copy. The same holds true for C testcases.
> all of 3.4 and 4.1 produce exactly two temporaries. Yet I said that g++ 3.4 did not produce any temporary, and I still think so. No temporaries, only g's stack frame. See the following assembly code for the C testcase (the generated assembly is the same as for C++, but easier to read since there is no name mangling nor local labels). h: pushl %ebp movl %esp, %ebp subl $4008, %esp movl %esp, %eax subl $12, %esp pushl %eax call f addl $12, %esp call g leave ret For the sake of completeness, I'm also writing the assembly output for GCC 4.0, so that the regression with respect to GCC 3.4 is clearly visible. Two temporaries and two memory copies: h: pushl %ebp movl %esp, %ebp pushl %esi pushl %ebx subl $8012, %esp leal -8008(%ebp), %ebx pushl %ebx call f leal -4008(%ebp), %esi subl $8, %esp pushl $4000 pushl %ebx pushl %esi call memcpy subl $3968, %esp movl %esp, %eax pushl %edx pushl $4000 pushl %esi pushl %eax call memcpy addl $16, %esp call g addl $4000, %esp leal -8(%ebp), %esp popl %ebx popl %esi popl %ebp ret The C testcase is almost identical to the C++ testcase: typedef struct A { int a[1000]; } A; A f(); void g(A); void h() { g(f()); } And this is my version of GCC 3.4: $ LANG=C gcc-3.4 -v Reading specs from /usr/lib/gcc/i486-linux-gnu/3.4.5/specs Configured with: ../src/configure -v --enable-languages=c,c++,f77,pascal,objc,ada --prefix=/usr --libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu --enable-libstdcxx-debug i486-linux-gnu Thread model: posix gcc version 3.4.5 20050821 (prerelease) (Debian 3.4.4-8) Hope it helps.
I may have a patch^Whack to fix the first testcase. Let's see if it passes testing...
One possibility would be to hack out-of-ssa to coalesce single use variables with their defs in the case of aggregates. The real fix would involve expanding to rtl from ssa, so we have this information ready and need not create these useless memcpy's. Or whatever solution is more "correct" here ("fixing" the frontends will not work for the second testcase until we allow function calls as arguments in gimple). Anyway, here's the hack that passed bootstrapping and regtesting for C and C++ with only some tr1 tests failing: Index: gimplify.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/gimplify.c,v retrieving revision 2.113.2.11 diff -c -3 -p -r2.113.2.11 gimplify.c *** gimplify.c 16 Aug 2005 22:16:52 -0000 2.113.2.11 --- gimplify.c 29 Aug 2005 12:04:33 -0000 *************** gimplify_target_expr (tree *expr_p, tree *** 3628,3633 **** --- 3628,3641 ---- if (init) { + /* Try to avoid the temporary if possible. */ + if (TREE_CODE (init) == INDIRECT_REF + && !TARGET_EXPR_CLEANUP (targ)) + { + *expr_p = init; + return GS_OK; + } + /* TARGET_EXPR temps aren't part of the enclosing block, so add it to the temps list. */ gimple_add_tmp_var (temp);
Another way to fix this would have copy-propagation for aggregates, see PR 14295.
Leaving as P2.
I decided to give this another look. My hack is surely a progression on this issue and maybe even appropriate for the branches. Now trying to figure out what goes wrong with it.
I have a fix which improves the situation by modifying the gimplifier.
Subject: Bug 23372 Author: rguenth Date: Mon Jan 30 13:46:30 2006 New Revision: 110396 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=110396 Log: 2006-01-30 Richard Guenther <rguenther@suse.de> PR c++/23372 * gimplify.c (gimplify_target_expr): Handle easy cases without creating a temporary. * gcc.dg/pr23372-1.C: New testcase. Added: trunk/gcc/testsuite/gcc.dg/pr23372-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/gimplify.c trunk/gcc/testsuite/ChangeLog
The original testcase is now fixed on the mainline.
The test for this PR (gcc.dg/pr23372-1.c) fails on powerpc-darwin because there is no memcpy outputted in the asm. There is a loop: L2: lbzx r0,r9,r2 stbx r0,r11,r2 addi r2,r2,1 L3: bdnz L2 But no memcpy.
So, { xfail powerpc*-darwin* } the test?
(In reply to comment #20) > So, { xfail powerpc*-darwin* } the test? More like { xfail powerpc*-*-darwin* powerpc*-*-aix* rs6000-*-* powerpc64-*-linux* && lp64 }
(In reply to comment #21) > (In reply to comment #20) > > So, { xfail powerpc*-darwin* } the test? > > More like > { xfail powerpc*-*-darwin* powerpc*-*-aix* rs6000-*-* powerpc64-*-linux* && > lp64 } Or even better just don't run the test there as it is passing just not the way you are expecting.
I cannot get a target selector work that would exclude the patterns you mention. This seems to work though: /* { dg-do compile { xfail { powerpc*-*-darwin* powerpc*-*-aix* rs6000-*-* } || { powerpc64-*-linux* && lp64 } } } */
This test is also failing on hppa*-*-hpux* and ia64-*-hpux*.
Also fails for mmix-knuth-mmixware. This is an ABI thing; callee copies if it needs to modify (for MMIX, it's f() that does the memcpy). Add testsuite framework or run only on specific targets, please.
Ok, I'll skim through the posted testresults and will restrict the test to working targets.
Subject: Bug 23372 Author: rguenth Date: Tue Feb 7 15:36:44 2006 New Revision: 110699 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=110699 Log: 2006-02-07 Richard Guenther <rguenther@suse.de> PR c++/26140 Revert 2006-01-30 Richard Guenther <rguenther@suse.de> PR c++/23372 * gimplify.c (gimplify_target_expr): Handle easy cases without creating a temporary. Revert 2006-01-30 Richard Guenther <rguenther@suse.de> PR c++/23372 * gcc.dg/pr23372-1.C: New testcase. * g++.dg/tree-ssa/pr26140.C: New testcase. Added: trunk/gcc/testsuite/g++.dg/tree-ssa/pr26140.C Modified: trunk/gcc/ChangeLog trunk/gcc/gimplify.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/pr23372-1.c
So we regress for 4.2 again as the patch caused problems.
Created attachment 10802 [details] patch for aggregate copyprop This patch (on top of infrastructure provided by the general copyprop improvements) modifies forwprop to do copy propagation of aggregates. Untested apart from the fact it fixes all the testcases here.
(In reply to comment #29) > This patch (on top of infrastructure provided by the general copyprop > improvements) modifies forwprop to do copy propagation of aggregates. Untested > apart from the fact it fixes all the testcases here. Forward prop is really a semi hack waiting for a true combiner. And this seems like the wrong spot anyways as forward prop is only really for scalars really.
For reference, I talk about http://gcc.gnu.org/ml/gcc-patches/2006-02/msg00669.html excluding the tree-ssa-copy.c parts.
Of course you are right. But backporting this to 4.1 may be the only chance to get the stackspace / extra temporaries regressions solved there, as using forwprop for this hack is the most easiest (and frankly forwprop has become a place exactly for such hacks already ;)).
The attached patch is bogus, and a correct one doesn't fix the first testcase (as the attached one didn't, too). As analyzed previously, expand does not deal with void g(A*) (a) { struct A D.2007; <bb 2>: D.2007 = *a_1; f (D.2007) [tail call]; return; } and TER doesn't produce (non-gimple) f (*a). Still TER looks like the only place where we could get this fixed, because we still have dataflow information left. Also a real struct copyprop pass will not help here. As TER / outof-ssa is not something I want to look into, unassigning this.
We're not depending on struct copyprop here.
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
The generated code is getting both better and worse. I just tested with GCC 4.1, and there is now a byte-by-byte (!) copy instead of memcpy. So not only does GCC use superfluous copies, but it generates code such that these copies are the slowest possible. On the other hand, there is only one copy left. So this is better than GCC 4.0, but still worse than GCC 3.4. pushl %ebp movl %esp, %ebp pushl %ebx subl $8004, %esp leal -4004(%ebp), %ebx movl %ebx, (%esp) call f xorl %edx, %edx subl $4, %esp .L3: cmpl $4000, %edx jb .L2 call g movl -4(%ebp), %ebx leave ret .p2align 4,,7 .L2: movzbl (%ebx,%edx), %eax movb %al, (%esp,%edx) incl %edx jmp .L3
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Subject: Bug 23372 Author: jason Date: Wed Aug 23 04:27:43 2006 New Revision: 116342 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116342 Log: PR c++/23372 * call.c (build_over_call): Don't make a copy here if build_call will make one too. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/call.c
Subject: Bug 23372 Author: jason Date: Wed Aug 23 14:22:41 2006 New Revision: 116351 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116351 Log: PR c++/23372 * call.c (build_over_call): Don't make a copy here if build_call will make one too. Modified: branches/gcc-4_0-branch/gcc/cp/ChangeLog branches/gcc-4_0-branch/gcc/cp/call.c
Subject: Bug 23372 Author: jason Date: Wed Aug 23 14:22:49 2006 New Revision: 116352 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116352 Log: PR c++/23372 * call.c (build_over_call): Don't make a copy here if build_call will make one too. Modified: branches/gcc-4_1-branch/gcc/cp/ChangeLog branches/gcc-4_1-branch/gcc/cp/call.c
fixed in 4.0 and 4.1 as well.
*** Bug 29375 has been marked as a duplicate of this bug. ***
Author: jason Date: Fri Mar 18 15:06:51 2011 New Revision: 171146 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171146 Log: PR c++/23372 * gimplify.c (gimplify_arg): Strip redundant TARGET_EXPR. Added: trunk/gcc/testsuite/g++.dg/opt/pr23372.C Modified: trunk/gcc/ChangeLog trunk/gcc/gimplify.c trunk/gcc/testsuite/ChangeLog