Bug 16405 - [3.4 Regression] Temporary aggregate copy not elided
Summary: [3.4 Regression] Temporary aggregate copy not elided
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 3.4.0
: P3 normal
Target Milestone: 4.1.0
Assignee: Jason Merrill
URL:
Keywords: missed-optimization
Depends on: 25979
Blocks:
  Show dependency treegraph
 
Reported: 2004-07-07 11:56 UTC by Guillaume Melquiond
Modified: 2006-02-28 09:35 UTC (History)
2 users (show)

See Also:
Host: i486-linux-gnu
Target:
Build:
Known to work: 3.3.4 3.3.2 4.0.0 4.2.0
Known to fail: 3.4.3 4.1.0
Last reconfirmed: 2006-01-06 15:26:25


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Guillaume Melquiond 2004-07-07 11:56:16 UTC
I have been compiling the following snippet with g++ version 3.3.4, 3.4.0,
3.5.0. They are Debian packages, so the 3.4.0 version is a bit old (and a
prerelease), sorry. So maybe the bug has already been fixed in g++ 3.4. However
it is still present in a recent snapshot of the trunk. In the following, the
function goes from using only one temporary in g++ 3.3 to using four temporaries
in g++ 3.5.

--------------------------------------------------------------------
struct T {
  int a[128];
  // T(T const &v);
  T &operator+=(T const &v);
  T operator+(T const &v) const { T t = *this; t += v; return t; }
};

extern T a, b, c;
void f() { a = b + c; }
--------------------------------------------------------------------

So the code just describes a big plain structure and its operator+ is defined
with its operator+=. The function f computes an addition and operator+ is
inlined into it.

When compiling with g++ 3.3, the generated assembly is optimal: there is only
one temporary, b is copied into this temporary, c is added to this temporary,
the result is then assigned to a.

When compiling with g++ 3.4, it is no more optimal: there are two temporaries, b
is copied into one, c is added to this temporary holding b, the result is copied
into the other temporary, and the temporary is then assigned to a. Maybe the
NRVO did not kick in?

When compiling with g++ 3.5, it is even worse: there are four temporaries, b is
copied into one, c is added to this temporary holding b, the result then goes
into two successive temporaries before being assigned to a. So there is one copy
that does not mean anything with respect to the code? And please note there is
one temporary that just lies on the stack and is never used (since only three
temporaries are used, not four).

If a copy constructor is added (just comment out the line), the three compilers
generate a similar code: only one temporary is ever used. It is the expected and
optimal code generation. So it seems g++ has some kind of problem with the
default copy constructor starting with 3.4.

As a side note, even with a non-default copy constructor, g++ 3.5 still
allocates a temporary that is never used. Maybe it is an unrelated bug?

The snippet is each time compiled with
$ g++-... -Wall -O3 -S test.cpp

Here are the exact versions used.

$ LANG=C g++-3.3 -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib
--enable-nls --without-included-gettext --enable-__cxa_atexit
--enable-clocale=gnu --enable-debug --enable-java-gc=boehm
--enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.4 (Debian 1:3.3.4-2)

$ LANG=C g++-3.4 -v
Reading specs from /usr/lib/gcc/i486-linux/3.4.0/specs
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr
--libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4
--enable-shared --with-system-zlib --enable-nls --enable-threads=posix
--without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit
--enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm
--enable-java-awt=gtk --disable-werror i486-linux
Thread model: posix
gcc version 3.4.0 20040403 (prerelease) (Debian)

$ LANG=C g++-snapshot -v
Reading specs from /usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/3.5.0/specs
Configured with: ../src/configure -v --enable-languages=c,c++,java,f95,objc
--prefix=/usr/lib/gcc-snapshot --enable-shared --with-system-zlib --enable-nls
--enable-threads=posix --without-included-gettext --disable-werror
--enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug
--enable-java-gc=boehm --enable-java-awt=gtk i486-linux-gnu
Thread model: posix
gcc version 3.5.0 20040704 (experimental)
Comment 1 Guillaume Melquiond 2004-07-19 09:19:41 UTC
As a follow-up, I also tested with g++ 3.4.1 and 3.5-20040717.

The problem still exists with 3.4.1. There are still one more temporary and one
more copy than with 3.3.4.

The situation is a bit better with 3.5 than it was with the previous tested
snapshot. It is almost as good as with 3.4.1. There are only two more
temporaries and one more copy than with 3.3.4.

So both gcc 3.4 and 3.5 produce one more copy than gcc 3.3. I timed the cost of
this extra copy on the code below (-O3 compilation as before). It incurs a ~25%
slowdown of the whole program on an Intel P4 processor.

-------------------------------------------------------
struct T {
  int a[500];
  T &operator+=(T const &v) { for(int i = 0; i < 500; ++i) a[i] += v.a[i];
return *this; }
  T operator+(T const &v) const { T t = *this; t += v; return t; }
};

T a, b, c;

int main() {
  for(int i = 0; i < 10000000; ++i) a = b + c;
  return 0;
}
Comment 2 Giovanni Bajo 2004-10-07 11:51:17 UTC
Confirmed. This won't probably be fixed in the 3.4 branch, but it should in 4.0 
at least. 

Testcase (same of original description):
------------------------------------
struct T {
  int a[128];
  T &operator+=(T const &v);
  T operator+(T const &v) const { T t = *this; t += v; return t; }
};

extern T a, b, c;
void f() { a = b + c; }
------------------------------------


The optimized dump (without the copy constructor) from mainline is:

------------------------------------
void f() ()
{
  struct T t;
  struct T * const this;
  struct T & v;
  struct T * D.1598;
  struct T D.1594;
  struct T t;

<bb 0>:
  t = b;
  operator+= (&t, &c);
  D.1594 = t;
  a = D.1594;
  return;

}
------------------------------------

So there is one additional temporary which is not removed.

I cannot see a regression anymore with the copy constructor, so this bug only 
tracks this now. Guillaume, if you still see a regression with the copy 
constructor please open a new bug report. It is wrong to track two different 
testcases (although similar) in the same report.
Comment 3 Mark Mitchell 2004-11-01 00:45:33 UTC
Postponed until GCC 3.4.4.
Comment 4 Mark Mitchell 2004-12-23 04:45:42 UTC
Working on a fix.
Comment 5 GCC Commits 2004-12-23 08:16:10 UTC
Subject: Bug 16405

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	mmitchel@gcc.gnu.org	2004-12-23 08:14:34

Modified files:
	gcc            : ChangeLog gimplify.c 
	gcc/testsuite  : ChangeLog 
Added files:
	gcc/testsuite/g++.dg/opt: temp1.C 

Log message:
	PR c++/16405
	* gimplify.c (gimplify_modify_expr_rhs): Handle
	INDIRECT_REF/ADDR_EXPR combinations.
	
	PR c++/16405
	* g++.dg/opt/temp1.C: New test.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6937&r2=2.6938
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/gimplify.c.diff?cvsroot=gcc&r1=2.96&r2=2.97
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.4801&r2=1.4802
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/g++.dg/opt/temp1.C.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 6 Mark Mitchell 2004-12-23 08:17:01 UTC
Fixed in GCC 4.0.
Comment 7 GCC Commits 2004-12-23 16:27:31 UTC
Subject: Bug 16405

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	mmitchel@gcc.gnu.org	2004-12-23 16:27:12

Modified files:
	gcc/testsuite  : ChangeLog 
	gcc/testsuite/g++.dg/opt: temp1.C 

Log message:
	PR c++/16405
	* gimplify.c (gimplify_modify_expr_rhs): Handle
	INDIRECT_REF/ADDR_EXPR combinations.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.4805&r2=1.4806
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/g++.dg/opt/temp1.C.diff?cvsroot=gcc&r1=1.1&r2=1.2

Comment 8 GCC Commits 2005-02-13 06:44:05 UTC
Subject: Bug 16405

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	jason@gcc.gnu.org	2005-02-13 06:43:58

Modified files:
	gcc            : ChangeLog fold-const.c gimplify.c tree.h 

Log message:
	PR mudflap/19319
	* gimplify.c (gimplify_modify_expr_rhs) [CALL_EXPR]: Make return
	slot explicit.
	
	PR c++/16405
	* fold-const.c (fold_indirect_ref_1): Split out from...
	(build_fold_indirect_ref): Here.
	(fold_indirect_ref): New fn.
	* tree.h: Declare it.
	* gimplify.c (gimplify_compound_lval): Call fold_indirect_ref.
	(gimplify_modify_expr_rhs): Likewise.
	(gimplify_expr): Likewise.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7461&r2=2.7462
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/fold-const.c.diff?cvsroot=gcc&r1=1.507&r2=1.508
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/gimplify.c.diff?cvsroot=gcc&r1=2.108&r2=2.109
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.h.diff?cvsroot=gcc&r1=1.689&r2=1.690

Comment 9 Jason Merrill 2006-02-03 21:56:18 UTC
Disabling this optimization for now, so this is again a regression in 4.1 and 4.2.
Comment 10 Jason Merrill 2006-02-09 09:54:42 UTC
Subject: Bug 16405

Author: jason
Date: Thu Feb  9 09:54:36 2006
New Revision: 110789

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=110789
Log:
        PR c++/25979
        * tree.def: Elaborate on difference from MODIFY_EXPR.
        * doc/c-tree.texi (INIT_EXPR): Likewise.
        * gimplify.c (internal_get_tmp_var): Use INIT_EXPR.
        (gimplify_decl_expr, gimplify_init_ctor_eval): Likewise.
        (gimplify_target_expr): Likewise.
        (gimplify_cond_expr): Remove target handling.
        (gimplify_modify_expr): Don't clobber INIT_EXPR code here.
        (gimplify_expr): Clobber it here.
        (gimplify_modify_expr_rhs): Push assignment into COND_EXPR here.
        Do return slot optimization if we have an INIT_EXPR.

        PR tree-opt/24365
        * tree-inline.c (declare_return_variable): Also clear
        DECL_COMPLEX_GIMPLE_REG_P as needed in the modify_dest case.

        PR c++/16405
        * gimplify.c (gimplify_modify_expr_rhs): Re-enable *& handling.

        PR middle-end/22439
        * gimplify.c (gimplify_one_sizepos): Fix typo.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/cp-gimplify.c
    trunk/gcc/cp/typeck2.c
    trunk/gcc/doc/c-tree.texi
    trunk/gcc/gimplify.c
    trunk/gcc/testsuite/g++.dg/opt/temp1.C
    trunk/gcc/tree-inline.c
    trunk/gcc/tree.def

Comment 11 Andrew Pinski 2006-02-09 13:35:46 UTC
Fixed on the mainline at least for now.
Comment 12 Jason Merrill 2006-02-10 17:32:15 UTC
Subject: Bug 16405

Author: jason
Date: Fri Feb 10 17:32:10 2006
New Revision: 110838

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=110838
Log:
        PR c++/25979
        * tree.def: Elaborate on difference from MODIFY_EXPR.
        * doc/c-tree.texi (INIT_EXPR): Likewise.
        * gimplify.c (internal_get_tmp_var): Use INIT_EXPR.
        (gimplify_decl_expr, gimplify_init_ctor_eval): Likewise.
        (gimplify_target_expr): Likewise.
        (gimplify_cond_expr): Remove target handling.
        (gimplify_modify_expr): Don't clobber INIT_EXPR code here.
        (gimplify_expr): Clobber it here.
        (gimplify_modify_expr_rhs): Push assignment into COND_EXPR here.
        Do return slot optimization if we have an INIT_EXPR.

        PR tree-opt/24365
        * tree-inline.c (declare_return_variable): Also clear
        DECL_COMPLEX_GIMPLE_REG_P as needed in the modify_dest case.

        PR c++/16405
        * gimplify.c (gimplify_modify_expr_rhs): Re-enable *& handling.

        PR middle-end/22439
        * gimplify.c (gimplify_one_sizepos): Fix typo.

Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/cp/ChangeLog
    branches/gcc-4_1-branch/gcc/cp/cp-gimplify.c
    branches/gcc-4_1-branch/gcc/cp/typeck2.c
    branches/gcc-4_1-branch/gcc/doc/c-tree.texi
    branches/gcc-4_1-branch/gcc/gimplify.c
    branches/gcc-4_1-branch/gcc/testsuite/g++.dg/opt/temp1.C
    branches/gcc-4_1-branch/gcc/tree-inline.c
    branches/gcc-4_1-branch/gcc/tree.def

Comment 13 Andrew Pinski 2006-02-10 23:08:51 UTC
Fixed now in 4.1.0 also.
Comment 14 Gabriel Dos Reis 2006-02-28 09:35:52 UTC
Fixed in 4.1.0.  won't fix for 3.4.6.