Bug 23372 - [4.0/4.1 Regression] Temporary aggregate copy not elided when passing parameters by value
Summary: [4.0/4.1 Regression] Temporary aggregate copy not elided when passing paramet...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.0.2
: P2 minor
Target Milestone: 4.0.4
Assignee: Jason Merrill
URL:
Keywords: missed-optimization
: 29375 (view as bug list)
Depends on:
Blocks: 25505
  Show dependency treegraph
 
Reported: 2005-08-13 08:02 UTC by Guillaume Melquiond
Modified: 2011-03-18 15:06 UTC (History)
9 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.4.0 3.3.3 3.2.3 3.0.4 2.95.3 4.2.0 4.0.4 4.1.2
Known to fail: 4.0.0 4.1.0
Last reconfirmed: 2006-08-22 21:08:50


Attachments
patch for aggregate copyprop (423 bytes, patch)
2006-02-08 15:11 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Guillaume Melquiond 2005-08-13 08:02:57 UTC
This bug is similar to bug 16405, but although 16405 was fixed in 4.0, this one
is still present and is a regression from GCC 3.4 (not from 3.3 as was the
previous one). So I prefer opening a new bug-report.

The testcase simply calls a function f by passing the parameter by value:

struct A {
  int a[1000];
  //A(A const &);
};
void f(A);
void g(A *a) { f(*a); }

When compiled with gcc 3.3 and 3.4, the generated code for g is optimal: the
value *a is directly copied in the stack frame that will be used by f. With gcc
4.0, there is first a temporary copy in the stack frame of g, before copying the
value in the stack frame of f (two memcpys instead of one).

When putting a dummy copy constructor, both memcpys disappear: the code is
optimal. So the problem seems to be with the default trivial copy constructor.

The testcase is compiled with "g++-4.0 -O3", Debian package:

Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib --enable-nls
--without-included-gettext --enable-threads=posix --program-suffix=-4.0
--enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu
--enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk
--enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre
--enable-mpfr --disable-werror --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.0.2 20050806 (prerelease) (Debian 4.0.1-4)
Comment 1 Richard Biener 2005-08-13 14:17:46 UTC
The problem is, we end up with

void g(A*) (a)
{
  struct A D.1608;

<bb 0>:
  D.1608 = *a;
  f (D.1608) [tail call];
  return;

}

after the tree optimizers.  f (*a) would not be gimple, so we create
the temporary in the first place.  TER does not remove this wart,
neither does expand - so we start with two memcpys after RTL expansion.

This is definitively different from PR16405.
Comment 2 Andrew Pinski 2005-08-13 17:56:39 UTC
Confirmed.

(In reply to comment #1)
> after the tree optimizers.  f (*a) would not be gimple, so we create
> the temporary in the first place.  TER does not remove this wart,
> neither does expand - so we start with two memcpys after RTL expansion.
TER only works on scalars so it cannot work.
Comment 3 Giovanni Bajo 2005-08-13 18:00:12 UTC
Why doesn't this happen with the copy constructor, then? there we should be 
calling the copyctor with *a, which would have the same problem.
Comment 4 Richard Biener 2005-08-13 18:11:24 UTC
With the copy ctor we end up with

void g(A*) (a)
{
  struct A D.1603;

<bb 0>:
  __comp_ctor  (&D.1603, a);
  f (&D.1603);
  return;

}

which confuses me a bit, because here the prototype of f looks like
effectively

void f(A*);

do we use ABI information here, but not in the other case?  The C++
frontend in this case presents us with

{
  <<cleanup_point <<< Unknown tree: expr_stmt
  f (&TARGET_EXPR <D.1603, <<< Unknown tree: aggr_init_expr
  __comp_ctor
  0B, (struct A &) (struct A *) NON_LVALUE_EXPR <a>
  D.1603 >>>
>) >>>
>>;
}

where in the case w/o the copy ctor we have

  <<cleanup_point <<< Unknown tree: expr_stmt
  f (TARGET_EXPR <D.1608, *(struct A &) (struct A *) NON_LVALUE_EXPR <a>>) >>>
>>;

is there some different wording about by-value parameter passing
with or without explicit copy ctor in the C++ standard?!  I.e., why
isn't the above

  <<cleanup_point <<< Unknown tree: expr_stmt
  f (&TARGET_EXPR <D.1608, *(struct A &) (struct A *) NON_LVALUE_EXPR <a>>) >>>
>>;

?
Comment 5 Andrew Pinski 2005-08-13 18:12:58 UTC
(In reply to comment #4)
> which confuses me a bit, because here the prototype of f looks like
> effectively
> 
> void f(A*);

No that is correct as it turns the class into a non pod and non pods are always passed via reference and 
not via value.
Comment 6 Richard Biener 2005-08-13 18:16:55 UTC
Indeed - adding a destructor (or anything else that makes it a non-POD) "fixes"
the problem, too.
Comment 7 Richard Biener 2005-08-13 21:21:41 UTC
The best place to fix this is probably still the expander or TER.  Or
out-of-ssa, where the necessary information is best present.  Or fix gimple and
gimplification.
Comment 8 Guillaume Melquiond 2005-08-14 06:45:50 UTC
Looking at it again, I found an even worse regression with respect to g++ 3.4.
Consider this testcase:

struct A { int a[1000]; }
A f();
void g(A);
void h() { g(f()); }

Ideally, h will allocate a stack frame for g and ask f to directly dump its
result in it. No temporary nor memcpy will be used at all. g++ 3.4 behaves this way.

g++ 4.0 however will first allocate some space for the result of f, then call f
and copy its result in another temporary, and finally it will allocate the stack
frame for g and copy the temporary in it. Two temporaries and two memcpys are
needed for g++ 4.0.

So the same issue arises when returning a result by value.
Comment 9 Richard Biener 2005-08-26 14:30:37 UTC
For your last testcase,

struct A { int a[1000]; }
A f();
void g(A);
void h() { g(f()); }

all of 3.4 and 4.1 produce exactly two temporaries.
One to dump the result of f(), which get's copied to a new temp passed to g().

4.0 though produces one extra unnecessary copy.

The same holds true for C testcases.
Comment 10 Guillaume Melquiond 2005-08-26 17:38:22 UTC
> all of 3.4 and 4.1 produce exactly two temporaries.

Yet I said that g++ 3.4 did not produce any temporary, and I still think so. No
temporaries, only g's stack frame. See the following assembly code for the C
testcase (the generated assembly is the same as for C++, but easier to read
since there is no name mangling nor local labels).

h:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4008, %esp
        movl    %esp, %eax
        subl    $12, %esp
        pushl   %eax
        call    f
        addl    $12, %esp
        call    g
        leave
        ret

For the sake of completeness, I'm also writing the assembly output for GCC 4.0,
so that the regression with respect to GCC 3.4 is clearly visible. Two
temporaries and two memory copies:

h:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %esi
        pushl   %ebx
        subl    $8012, %esp
        leal    -8008(%ebp), %ebx
        pushl   %ebx
        call    f
        leal    -4008(%ebp), %esi
        subl    $8, %esp
        pushl   $4000
        pushl   %ebx
        pushl   %esi
        call    memcpy
        subl    $3968, %esp
        movl    %esp, %eax
        pushl   %edx
        pushl   $4000
        pushl   %esi
        pushl   %eax
        call    memcpy
        addl    $16, %esp
        call    g
        addl    $4000, %esp
        leal    -8(%ebp), %esp
        popl    %ebx
        popl    %esi
        popl    %ebp
        ret

The C testcase is almost identical to the C++ testcase:

typedef struct A { int a[1000]; } A;
A f();
void g(A);
void h() { g(f()); }

And this is my version of GCC 3.4:

$ LANG=C gcc-3.4 -v
Reading specs from /usr/lib/gcc/i486-linux-gnu/3.4.5/specs
Configured with: ../src/configure -v
--enable-languages=c,c++,f77,pascal,objc,ada --prefix=/usr --libexecdir=/usr/lib
--with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared --with-system-zlib
--enable-nls --without-included-gettext --program-suffix=-3.4
--enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu
--enable-libstdcxx-debug i486-linux-gnu
Thread model: posix
gcc version 3.4.5 20050821 (prerelease) (Debian 3.4.4-8)

Hope it helps.
Comment 11 Richard Biener 2005-08-29 12:15:36 UTC
I may have a patch^Whack to fix the first testcase.  Let's see if it passes
testing...
Comment 12 Richard Biener 2005-08-29 13:13:09 UTC
One possibility would be to hack out-of-ssa to coalesce single use variables
with their defs in the case of aggregates.  The real fix would involve
expanding to rtl from ssa, so we have this information ready and need not
create these useless memcpy's.  Or whatever solution is more "correct" here
("fixing" the frontends will not work for the second testcase until we allow
function calls as arguments in gimple).

Anyway, here's the hack that passed bootstrapping and regtesting for C and C++
with only some tr1 tests failing:

Index: gimplify.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/gimplify.c,v
retrieving revision 2.113.2.11
diff -c -3 -p -r2.113.2.11 gimplify.c
*** gimplify.c  16 Aug 2005 22:16:52 -0000      2.113.2.11
--- gimplify.c  29 Aug 2005 12:04:33 -0000
*************** gimplify_target_expr (tree *expr_p, tree
*** 3628,3633 ****
--- 3628,3641 ----

    if (init)
      {
+       /* Try to avoid the temporary if possible.  */
+       if (TREE_CODE (init) == INDIRECT_REF
+         && !TARGET_EXPR_CLEANUP (targ))
+       {
+         *expr_p = init;
+         return GS_OK;
+       }
+
        /* TARGET_EXPR temps aren't part of the enclosing block, so add it
         to the temps list.  */
       gimple_add_tmp_var (temp);

Comment 13 Andrew Pinski 2005-09-14 06:36:16 UTC
Another way to fix this would have copy-propagation for aggregates, see PR 14295.
Comment 14 Mark Mitchell 2005-10-31 05:00:07 UTC
Leaving as P2.
Comment 15 Richard Biener 2006-01-14 12:40:36 UTC
I decided to give this another look.  My hack is surely a progression on this issue and maybe even appropriate for the branches.  Now trying to figure out what goes wrong with it.
Comment 16 Richard Biener 2006-01-14 17:12:25 UTC
I have a fix which improves the situation by modifying the gimplifier.
Comment 17 Richard Biener 2006-01-30 13:46:34 UTC
Subject: Bug 23372

Author: rguenth
Date: Mon Jan 30 13:46:30 2006
New Revision: 110396

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=110396
Log:
2006-01-30  Richard Guenther  <rguenther@suse.de>

	PR c++/23372
	* gimplify.c (gimplify_target_expr): Handle easy cases
	without creating a temporary.

	* gcc.dg/pr23372-1.C: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/pr23372-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimplify.c
    trunk/gcc/testsuite/ChangeLog

Comment 18 Richard Biener 2006-01-30 13:48:23 UTC
The original testcase is now fixed on the mainline.
Comment 19 Andrew Pinski 2006-01-31 15:24:38 UTC
The test for this PR (gcc.dg/pr23372-1.c) fails on powerpc-darwin because there is no memcpy outputted in the asm.
There is a loop:
L2:
        lbzx r0,r9,r2
        stbx r0,r11,r2
        addi r2,r2,1
L3:
        bdnz L2

But no memcpy.
Comment 20 Richard Biener 2006-01-31 16:03:33 UTC
So, { xfail powerpc*-darwin* } the test?
Comment 21 Andrew Pinski 2006-01-31 16:05:05 UTC
(In reply to comment #20)
> So, { xfail powerpc*-darwin* } the test?

More like
{ xfail powerpc*-*-darwin* powerpc*-*-aix* rs6000-*-* powerpc64-*-linux* && lp64 }
Comment 22 Andrew Pinski 2006-01-31 16:05:41 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > So, { xfail powerpc*-darwin* } the test?
> 
> More like
> { xfail powerpc*-*-darwin* powerpc*-*-aix* rs6000-*-* powerpc64-*-linux* &&
> lp64 }

Or even better just don't run the test there as it is passing just not the way you are expecting.
Comment 23 Richard Biener 2006-02-02 09:16:06 UTC
I cannot get a target selector work that would exclude the patterns you mention.  This seems to work though:

/* { dg-do compile { xfail { powerpc*-*-darwin* powerpc*-*-aix* rs6000-*-* } || { powerpc64-*-linux* && lp64 } } } */
Comment 24 Steve Ellcey 2006-02-02 17:16:31 UTC
This test is also failing on hppa*-*-hpux* and ia64-*-hpux*.
Comment 25 Hans-Peter Nilsson 2006-02-03 01:49:20 UTC
Also fails for mmix-knuth-mmixware.  This is an ABI thing; callee copies if
it needs to modify (for MMIX, it's f() that does the memcpy).
Add testsuite framework or run only on specific targets, please.
Comment 26 Richard Biener 2006-02-03 09:16:22 UTC
Ok, I'll skim through the posted testresults and will restrict the test to working targets.
Comment 27 Richard Biener 2006-02-07 15:36:50 UTC
Subject: Bug 23372

Author: rguenth
Date: Tue Feb  7 15:36:44 2006
New Revision: 110699

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=110699
Log:
2006-02-07  Richard Guenther  <rguenther@suse.de>

	PR c++/26140
	Revert
	2006-01-30  Richard Guenther  <rguenther@suse.de>
 	PR c++/23372
 	* gimplify.c (gimplify_target_expr): Handle easy cases
 	without creating a temporary.

	Revert
	2006-01-30  Richard Guenther  <rguenther@suse.de>
 	PR c++/23372
 	* gcc.dg/pr23372-1.C: New testcase.

	* g++.dg/tree-ssa/pr26140.C: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr26140.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimplify.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/pr23372-1.c

Comment 28 Richard Biener 2006-02-07 15:39:53 UTC
So we regress for 4.2 again as the patch caused problems.
Comment 29 Richard Biener 2006-02-08 15:11:34 UTC
Created attachment 10802 [details]
patch for aggregate copyprop

This patch (on top of infrastructure provided by the general copyprop improvements) modifies forwprop to do copy propagation of aggregates.  Untested apart from the fact it fixes all the testcases here.
Comment 30 Andrew Pinski 2006-02-08 15:16:19 UTC
(In reply to comment #29)
> This patch (on top of infrastructure provided by the general copyprop
> improvements) modifies forwprop to do copy propagation of aggregates.  Untested
> apart from the fact it fixes all the testcases here.

Forward prop is really a semi hack waiting for a true combiner.  And this seems like the wrong spot anyways as forward prop is only really for scalars really.
Comment 31 Richard Biener 2006-02-08 15:16:52 UTC
For reference, I talk about
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg00669.html
excluding the tree-ssa-copy.c parts.
Comment 32 Richard Biener 2006-02-08 15:18:42 UTC
Of course you are right.  But backporting this to 4.1 may be the only chance to get the stackspace / extra temporaries regressions solved there, as using forwprop for this hack is the most easiest (and frankly forwprop has become a place exactly for such hacks already ;)).
Comment 33 Richard Biener 2006-02-14 15:40:58 UTC
The attached patch is bogus, and a correct one doesn't fix the first testcase (as the attached one didn't, too).  As analyzed previously, expand does not deal with

void g(A*) (a)
{
  struct A D.2007;

<bb 2>:
  D.2007 = *a_1;
  f (D.2007) [tail call];
  return;

}

and TER doesn't produce (non-gimple) f (*a).  Still TER looks like the only place where we could get this fixed, because we still have dataflow
information left.  Also a real struct copyprop pass will not help here.
As TER / outof-ssa is not something I want to look into, unassigning this.
Comment 34 Richard Biener 2006-02-14 15:41:49 UTC
We're not depending on struct copyprop here.
Comment 35 Mark Mitchell 2006-02-24 00:26:12 UTC
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Comment 36 Guillaume Melquiond 2006-04-06 10:59:10 UTC
The generated code is getting both better and worse. I just tested with GCC 4.1, and there is now a byte-by-byte (!) copy instead of memcpy. So not only does GCC use superfluous copies, but it generates code such that these copies are the slowest possible. On the other hand, there is only one copy left. So this is better than GCC 4.0, but still worse than GCC 3.4.

        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        subl    $8004, %esp
        leal    -4004(%ebp), %ebx
        movl    %ebx, (%esp)
        call    f
        xorl    %edx, %edx
        subl    $4, %esp
.L3:
        cmpl    $4000, %edx
        jb      .L2
        call    g
        movl    -4(%ebp), %ebx
        leave
        ret
        .p2align 4,,7
.L2:
        movzbl  (%ebx,%edx), %eax
        movb    %al, (%esp,%edx)
        incl    %edx
        jmp     .L3
Comment 37 Mark Mitchell 2006-05-25 02:33:16 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 38 Jason Merrill 2006-08-23 04:27:59 UTC
Subject: Bug 23372

Author: jason
Date: Wed Aug 23 04:27:43 2006
New Revision: 116342

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116342
Log:
        PR c++/23372
        * call.c (build_over_call): Don't make a copy here if build_call
        will make one too.

Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/call.c

Comment 39 Jason Merrill 2006-08-23 14:22:53 UTC
Subject: Bug 23372

Author: jason
Date: Wed Aug 23 14:22:41 2006
New Revision: 116351

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116351
Log:
        PR c++/23372
        * call.c (build_over_call): Don't make a copy here if build_call
        will make one too.

Modified:
    branches/gcc-4_0-branch/gcc/cp/ChangeLog
    branches/gcc-4_0-branch/gcc/cp/call.c

Comment 40 Jason Merrill 2006-08-23 14:22:59 UTC
Subject: Bug 23372

Author: jason
Date: Wed Aug 23 14:22:49 2006
New Revision: 116352

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116352
Log:
        PR c++/23372
        * call.c (build_over_call): Don't make a copy here if build_call
        will make one too.

Modified:
    branches/gcc-4_1-branch/gcc/cp/ChangeLog
    branches/gcc-4_1-branch/gcc/cp/call.c

Comment 41 Jason Merrill 2006-08-23 14:33:32 UTC
fixed in 4.0 and 4.1 as well.
Comment 42 Andrew Pinski 2006-10-07 06:16:24 UTC
*** Bug 29375 has been marked as a duplicate of this bug. ***
Comment 43 Jason Merrill 2011-03-18 15:06:54 UTC
Author: jason
Date: Fri Mar 18 15:06:51 2011
New Revision: 171146

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171146
Log:
	PR c++/23372
	* gimplify.c (gimplify_arg): Strip redundant TARGET_EXPR.

Added:
    trunk/gcc/testsuite/g++.dg/opt/pr23372.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimplify.c
    trunk/gcc/testsuite/ChangeLog