Given an empty struct (ie struct X { };), even at high optimisation levels g++ will insist on always allocating and zeroing some memory for empty structs. This actually effects C++ code, including libstdc++-v3, as empty structs are often used as a means of passing around functions. EXAMPLE ------------------ struct X {}; void foo(X); void call_foo() { foo(X()); } ------------------ generates (from -O3, t70.final_cleanup is:) ------------------------------------------------- ;; Function void call_foo() (_Z8call_foov) void call_foo() () { struct X D.1597; <bb 0>: D.1597 = 0; foo (D.1597) [tail call]; return; } ---------------------------------------------
Note this has no effect on the generated code. But anyways: struct X D.1574; struct X D.1590; D.1574 = {}; D.1590 = 0; That is not right 0 is an INTEREGER_CST which cannot be an aggregate, that is just wrong. If we had more type checking we would be an ICE.
ignore my random changing of severity..
When you say "has no effect in final code", are you talking about the problem you noticed, or the problem as a whole? I find for each extra X I add to the type of foo I get a line much like: movb %al, 28(%esp) appearing in the assembler output, which I assume isn't necessary..
Werid on PPC, we get the most optimal code of __Z8call_foov: LFB2: b __Z3foo1X
Andrew, is Daniel Berlin struct aliasing work likely to help, here?
(In reply to comment #5) > Andrew, is Daniel Berlin struct aliasing work likely to help, here? No, fixing the front-end so it no longer produces a = 0 will most likely fix this as that is just wrong.
From call.c: /* Don't pass empty class objects by value. This is useful for tags in STL, which are used to control overload resolution. We don't need to handle other cases of copying empty classes. */ if (! decl || ! DECL_BUILT_IN (decl)) for (tmp = parms; tmp; tmp = TREE_CHAIN (tmp)) if (is_empty_class (TREE_TYPE (TREE_VALUE (tmp))) && ! TREE_ADDRESSABLE (TREE_TYPE (TREE_VALUE (tmp)))) { tree t = build0 (EMPTY_CLASS_EXPR, TREE_TYPE (TREE_VALUE (tmp))); TREE_VALUE (tmp) = build2 (COMPOUND_EXPR, TREE_TYPE (t), TREE_VALUE (tmp), t); }
From cp-gimplifier.c: case EMPTY_CLASS_EXPR: /* We create an INTEGER_CST with RECORD_TYPE and value zero. */ *expr_p = build_int_cst (TREE_TYPE (*expr_p), 0); This is wrong.
Eh, I was looking at the very same code... Can't we deal with EMPTY_CLASS_EXPR similarly to USING_STMT? ;) ;)
(In reply to comment #9) > Eh, I was looking at the very same code... Can't we deal with EMPTY_CLASS_EXPR > similarly to USING_STMT? ;) ;) I think we should produce an empty CONSTRUCTOR (which I am testing right now).
(In reply to comment #10) > I think we should produce an empty CONSTRUCTOR (which I am testing right now). That did not fix it, we still get code generated for the "empty" struct: ;; D.1594 = {} (nil) ;; foo (D.1594) [tail call] (insn 10 8 11 (set (mem:QI (reg/f:SI 56 virtual-outgoing-args) [0 S1 A32]) (reg:QI 58 [ D.1594 ])) -1 (nil) (nil)) (call_insn 11 10 0 (call (mem:QI (symbol_ref:SI ("_Z3foo1X") [flags 0x41] <function_decl 0x41e9ea6c foo>) [0 S1 A8]) (const_int 4 [0x4])) -1 (nil) (nil) (nil)) But on PPC we get: ;; D.1588 = {} (nil) ;; foo (D.1588) [tail call] (insn 10 8 11 0 (set (reg:QI 3 r3) (reg:QI 118 [ D.1588 ])) -1 (nil) (nil)) (call_insn/j 11 10 12 0 (parallel [ (call (mem:SI (symbol_ref:SI ("_Z3foo1X") [flags 0x41] <function_decl 0x41d9f984 foo>) [0 S4 A8]) (const_int 32 [0x20])) (use (const_int 0 [0x0])) (use (reg:SI 119)) (return) ]) -1 (nil) (nil) (expr_list:REG_DEP_TRUE (use (reg:QI 3 r3)) (nil))) See the difference is that we pass on ppc via a register but on x86 we pass via the stack. I don't know a way to fix this with a front-end change.
The change to cp-gimplifier.c should still happen as it makes not create a INTEGER_CST for an aggregate.
For struct Foo {}; void foo(const Foo&); void bar(Foo); void fooc(void) { foo(Foo()); } void barc(void) { bar(Foo()); } we get different initializers for the Foo& and the Foo case: void fooc() () { struct Foo D.1594; <bb 0>: D.1594 = {}; foo (&D.1594); return; } void barc() () { struct Foo D.1613; <bb 0>: D.1613 = 0; bar (D.1613) [tail call]; return; } The former looks correct and does not produce initialization code for the temporary. The latter produces an unneccessary (uninitialized) initialization of the pass-by-value stack slot on x86.
pinskia posted a patch for the =0 "bug" http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01054.html
I will look at this. Is this by any chance a regression?
Pinskias patch fixes one part of the problem. For x86 there remains the issue that we are passing X on the stack and generate _Z8call_foov: .LFB2: subl $26, %esp .LCFI4: pushw %ax .LCFI5: .LCFI6: call _Z3foo1X addl $28, %esp ret which is correct, but I wonder where we figure out to use %ax as source for the tmp X. Also no RTL optimizer sees that the pushw %ax can be safely combined with the stack adjust before because the contents of %ax are unknown and we don't care about what value we pass on the stack. The call expander produces (insn 9 8 10 1 (parallel [ (set (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int -2 [0xfffffffe]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) (insn 10 9 11 1 (set (mem/s:QI (pre_modify:SI (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int -2 [0xfffffffe]))) [0 S1 A8]) (reg:QI 58 [ D.1755 ])) -1 (nil) (nil)) (call_insn 11 10 12 1 (call (mem:QI (symbol_ref:SI ("_Z3foo1X") [flags 0x41] <function_decl 0x40230e00 foo>) [0 S1 A8]) (const_int 16 [0x10])) -1 (nil) (nil) (nil)) where it should not use reg 58 as the source to push, but rather the stack slot we assigned to the D.1755 tmp.
Unfortunately we start with D.1755 allocated to a register. This may be solved at the tree level if we fix PR23372. Or we need to be smarter at allocating space for D.1755.
Life analysis should figure out, that for (insn 10 9 11 1 (set (mem/s:QI (pre_modify:SI (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int -2 [0xfffffffe]))) [0 S1 A8]) (reg:QI 58 [ D.1755 ])) -1 (nil) (nil)) where it notes that reg:QI 58 is dead after the instruction, never became life before and so remove the set completely, only preserving the side-effects (pre_modify:SI (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int -2 [0xfffffffe]))) someone familiar with flow.c should be able to hack this into mark_used_reg() in a few minutes :)
The recent improvements to the dataflow module and ra-conflicts may have fixed this. Richi, you were the last to look at this bug report. Can you check if there still is an issue to fix here?
For the testcase in comment #13 we now generate two(!) temporaries: void barc() () { struct Foo D.2027; struct Foo D.2028; D.2027 = {}; D.2028 = {}; bar (D.2028); } via gimplification of ;; Function void barc() (_Z4barcv) ;; enabled by -tree-original <<cleanup_point <<< Unknown tree: expr_stmt bar (TARGET_EXPR <D.2027, {}>;, <<< Unknown tree: empty_class_expr >>> ;) >>> >>; though the generated assembler looks like we cannot do better: _Z4barcv: .LFB3: subq $8, %rsp .LCFI0: movb $0, (%rsp) call _Z3bar3Foo addq $8, %rsp ret _Z4foocv: .LFB2: subq $24, %rsp .LCFI1: leaq 23(%rsp), %rdi call _Z3fooRK3Foo addq $24, %rsp ret On the tree level the first simple DSE pass gets rid of the extra temporary and its initialization.
Created attachment 45900 [details] fix Patch waiting for stage 1.
Author: jason Date: Wed May 22 21:39:08 2019 New Revision: 271523 URL: https://gcc.gnu.org/viewcvs?rev=271523&root=gcc&view=rev Log: PR c++/20408 - unnecessary code for empty struct. Here initializing the argument from a TARGET_EXPR isn't an empty class copy even though the type is !TREE_ADDRESSABLE, so we should check simple_empty_class_p. * call.c (build_call_a): Use simple_empty_class_p. Added: trunk/gcc/testsuite/g++.dg/tree-ssa/empty-3.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/call.c trunk/gcc/cp/cp-gimplify.c trunk/gcc/cp/cp-tree.h
Fixed in GCC 10.
The master branch has been updated by Jason Merrill <jason@gcc.gnu.org>: https://gcc.gnu.org/g:bdbca69e0720fa9062fe71782235141f629ae006 commit r11-7142-gbdbca69e0720fa9062fe71782235141f629ae006 Author: Jason Merrill <jason@redhat.com> Date: Mon Feb 8 17:04:03 2021 -0500 c++: generic lambda, fn* conv, empty class [PR98326] Here, in the thunk returned from the captureless lambda conversion to pointer-to-function, we try to pass through invisible reference parameters by reference, without doing a copy. The empty class copy optimization was messing that up. gcc/cp/ChangeLog: PR c++/98326 PR c++/20408 * cp-gimplify.c (simple_empty_class_p): Don't touch an invisiref parm. gcc/testsuite/ChangeLog: PR c++/98326 * g++.dg/cpp1y/lambda-generic-empty1.C: New test.
The releases/gcc-10 branch has been updated by Jason Merrill <jason@gcc.gnu.org>: https://gcc.gnu.org/g:031e97207463710797625382baff112b6c3ade51 commit r10-9362-g031e97207463710797625382baff112b6c3ade51 Author: Jason Merrill <jason@redhat.com> Date: Mon Feb 8 17:04:03 2021 -0500 c++: generic lambda, fn* conv, empty class [PR98326] Here, in the thunk returned from the captureless lambda conversion to pointer-to-function, we try to pass through invisible reference parameters by reference, without doing a copy. The empty class copy optimization was messing that up. gcc/cp/ChangeLog: PR c++/98326 PR c++/20408 * cp-gimplify.c (simple_empty_class_p): Don't touch an invisiref parm. gcc/testsuite/ChangeLog: PR c++/98326 * g++.dg/cpp1y/lambda-generic-empty1.C: New test.