------------------------------------------------------------------- typedef int Type; void check(int); template <typename To, typename From> inline int assign(To& a, From b) { a = b; return 0; } template <typename To, typename From> inline int add(To& a, From b, From c) { a = b + c; return 0; } template <typename T> class Template; template <typename T> class Template { private: T v; public: Template() : v(0) { } #if 0 Template(const Template& y) : v(y.v) { } #endif Template(const T y) : v(y) { } template <typename T1> Template(const T1 y) { check(assign(v, y)); } const T& value() const { return v; } }; template <typename T> inline Template<T> operator+(const Template<T> x, const Template<T> y) { T r; check(add(r, x.value(), y.value())); return r; } template <typename T, typename T1> inline Template<T> operator+(const T1 x, const Template<T> y) { return Template<T>(x) + y; } Type s(int v) { Template<Type> a = v; Template<Type> c = 3 + a; return c.value(); } ------------------------------------------------------------------- Try compiling this with -O3, comparing the outputs after changing #if 0 to #if 1. The .s diff is the following (excluding changes related to a label which changes name): --- bind.s 2004-08-11 06:56:43.000000000 +0200 +++ bind_ctor.s 2004-08-11 06:56:23.000000000 +0200 @@ -6,25 +6,27 @@ .LCFI2: - subl $16, %esp + subl $64, %esp .LCFI3: - movl 8(%ebp), %ebx - addl $3, %ebx + movl 8(%ebp), %eax + leal 3(%eax), %ebx pushl $0 + movl %eax, -40(%ebp) + movl $3, -56(%ebp) .LCFI4: call _Z5checki movl %ebx, %eax movl -4(%ebp), %ebx leave ret (- is #if 0, + is #if 1). There are two problems here: first, there is a slight code pessimization which should not happen because the code is semantically identical between the two cases. Second, there is a *big* increase in stack allocation, for no good reason (16 -> 64). Seems like tree-ssa is not able to clean up something.
Confirmed, I could not figure out what variables would overlap to save the stack space in 3.4.0 but there is a missed optimization here for 3.5.0: x.v = 3; y.v = v; T.3 = (const Type *)&y; T.5 = (const Type *)&x; y = *T.3 + *T.5; check (0); c.v = y; T.1 = (const Type *)&c; return *T.1; but this is another place where the C++ (and C) front-ends are lowering &a->b (to (const Type *)&y) too soon.
I should note that with the optimization SRA should be happening to cause the structs to no longer be there.
The stack allocation problem is present in GCC since at least GCC 3.2.2, and the tree optimizers didn't fix it. 2.95.3 allocated more stack space without the constructor but less stack space with the constructor: - subl $20,%esp + subl $36,%esp So we now allocate 64 bytes instead of 36. I flag this as a regression, which is surely important for any kind of embedded targets. The missed optimization which Andrew speaks about in comment #1 is a known problem already tracked elsewhere (as he said).
Will not be fixed in GCC 3.4.x.
is this related to Bug 9997 ???
Who can tell? The optimization problem has been addressed, which means that we're left with Type s(int) (v) { <bb 0>: check (0); return v + 3; } at the end of tree optimization. Which I guess is "fixed", by one definition. File a new bug report with a less reduced test case if you see this again.
Yes this was fixed by the non-lowering of &a.b into &a + offsetof(b).