Bug 20408 - Unnecessary code generated for empty structs
Summary: Unnecessary code generated for empty structs
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.1.0
: P2 normal
Target Milestone: 10.0
Assignee: Jason Merrill
URL:
Keywords: missed-optimization
Depends on:
Blocks: 21796
  Show dependency treegraph
 
Reported: 2005-03-10 15:48 UTC by Chris Jefferson
Modified: 2021-02-12 02:30 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-02-03 00:00:00


Attachments
fix (1.54 KB, patch)
2019-03-05 22:47 UTC, Jason Merrill
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Jefferson 2005-03-10 15:48:15 UTC
Given an empty struct (ie struct X { };), even at high optimisation levels g++
will insist on always allocating and zeroing some memory for empty structs.

This actually effects C++ code, including libstdc++-v3, as empty structs are
often used as a means of passing around functions.

EXAMPLE
------------------
struct X {};

void foo(X);

void call_foo()
{ foo(X()); }
------------------

generates (from -O3, t70.final_cleanup is:)
-------------------------------------------------
;; Function void call_foo() (_Z8call_foov)

void call_foo() ()
{
  struct X D.1597;

<bb 0>:
  D.1597 = 0;
  foo (D.1597) [tail call];
  return;

}
---------------------------------------------
Comment 1 Andrew Pinski 2005-03-10 15:56:21 UTC
Note this has no effect on the generated code.

But anyways:
  struct X D.1574;
  struct X D.1590;

  D.1574 = {};
  D.1590 = 0;

That is not right 0 is an INTEREGER_CST which cannot be an aggregate, that is just wrong.

If we had more type checking we would be an ICE.
Comment 2 Chris Jefferson 2005-03-10 16:00:18 UTC
ignore my random changing of severity..
Comment 3 Chris Jefferson 2005-03-10 16:05:39 UTC
When you say "has no effect in final code", are you talking about the problem
you noticed, or the problem as a whole?

I find for each extra X I add to the type of foo I get a line much like:

movb %al, 28(%esp)

appearing in the assembler output, which I assume isn't necessary..
Comment 4 Andrew Pinski 2005-03-10 16:10:10 UTC
Werid on PPC, we get the most optimal code of
__Z8call_foov:
LFB2:
        b __Z3foo1X
Comment 5 Paolo Carlini 2005-03-10 16:19:10 UTC
Andrew, is Daniel Berlin struct aliasing work likely to help, here?
Comment 6 Andrew Pinski 2005-03-10 16:21:49 UTC
(In reply to comment #5)
> Andrew, is Daniel Berlin struct aliasing work likely to help, here?

No, fixing the front-end so it no longer produces a = 0 will most likely fix this as that is just wrong.
Comment 7 Andrew Pinski 2005-03-10 17:06:02 UTC
From call.c:
  /* Don't pass empty class objects by value.  This is useful
     for tags in STL, which are used to control overload resolution.
     We don't need to handle other cases of copying empty classes.  */
  if (! decl || ! DECL_BUILT_IN (decl))
    for (tmp = parms; tmp; tmp = TREE_CHAIN (tmp))
      if (is_empty_class (TREE_TYPE (TREE_VALUE (tmp)))
	  && ! TREE_ADDRESSABLE (TREE_TYPE (TREE_VALUE (tmp))))
	{
	  tree t = build0 (EMPTY_CLASS_EXPR, TREE_TYPE (TREE_VALUE (tmp)));
	  TREE_VALUE (tmp) = build2 (COMPOUND_EXPR, TREE_TYPE (t),
				     TREE_VALUE (tmp), t);
	}
Comment 8 Andrew Pinski 2005-03-10 17:07:18 UTC
From cp-gimplifier.c:
    case EMPTY_CLASS_EXPR:
      /* We create an INTEGER_CST with RECORD_TYPE and value zero.  */
      *expr_p = build_int_cst (TREE_TYPE (*expr_p), 0);

This is wrong.
Comment 9 Paolo Carlini 2005-03-10 17:13:48 UTC
Eh, I was looking at the very same code... Can't we deal with EMPTY_CLASS_EXPR
similarly to USING_STMT? ;) ;)
Comment 10 Andrew Pinski 2005-03-10 17:15:43 UTC
(In reply to comment #9)
> Eh, I was looking at the very same code... Can't we deal with EMPTY_CLASS_EXPR
> similarly to USING_STMT? ;) ;)

I think we should produce an empty CONSTRUCTOR (which I am testing right now).
Comment 11 Andrew Pinski 2005-03-10 17:30:52 UTC
(In reply to comment #10)
> I think we should produce an empty CONSTRUCTOR (which I am testing right now).
That did not fix it, we still get code generated for the "empty" struct:
;; D.1594 = {}
(nil)

;; foo (D.1594) [tail call]
(insn 10 8 11 (set (mem:QI (reg/f:SI 56 virtual-outgoing-args) [0 S1 A32])
        (reg:QI 58 [ D.1594 ])) -1 (nil)
    (nil))

(call_insn 11 10 0 (call (mem:QI (symbol_ref:SI ("_Z3foo1X") [flags 0x41] <function_decl 0x41e9ea6c 
foo>) [0 S1 A8])
        (const_int 4 [0x4])) -1 (nil)
    (nil)
    (nil))


But on PPC we get:
;; D.1588 = {}
(nil)

;; foo (D.1588) [tail call]
(insn 10 8 11 0 (set (reg:QI 3 r3)
        (reg:QI 118 [ D.1588 ])) -1 (nil)
    (nil))

(call_insn/j 11 10 12 0 (parallel [
            (call (mem:SI (symbol_ref:SI ("_Z3foo1X") [flags 0x41] <function_decl 0x41d9f984 foo>) [0 S4 
A8])
                (const_int 32 [0x20]))
            (use (const_int 0 [0x0]))
            (use (reg:SI 119))
            (return)
        ]) -1 (nil)
    (nil)
    (expr_list:REG_DEP_TRUE (use (reg:QI 3 r3))
        (nil)))


See the difference is that we pass on ppc via a register but on x86 we pass via the stack.  I don't know a 
way to fix this with a front-end change.
Comment 12 Andrew Pinski 2005-03-10 17:35:16 UTC
The change to cp-gimplifier.c should still happen as it makes not create a INTEGER_CST for an 
aggregate.
Comment 13 Richard Biener 2005-09-04 16:25:55 UTC
For

struct Foo {};
void foo(const Foo&);
void bar(Foo);

void fooc(void)
{
        foo(Foo());
}
void barc(void)
{
        bar(Foo());
}

we get different initializers for the Foo& and the Foo case:

void fooc() ()
{
  struct Foo D.1594;

<bb 0>:
  D.1594 = {};
  foo (&D.1594);
  return;

}

void barc() ()
{
  struct Foo D.1613;

<bb 0>:
  D.1613 = 0;
  bar (D.1613) [tail call];
  return;

}

The former looks correct and does not produce initialization code
for the temporary.  The latter produces an unneccessary (uninitialized)
initialization of the pass-by-value stack slot on x86.
Comment 14 Richard Biener 2005-09-04 16:37:28 UTC
pinskia posted a patch for the =0 "bug"

http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01054.html
Comment 15 Richard Biener 2005-09-04 17:12:03 UTC
I will look at this.  Is this by any chance a regression?
Comment 16 Richard Biener 2005-09-05 10:36:31 UTC
Pinskias patch fixes one part of the problem.  For x86 there remains the
issue that we are passing X on the stack and generate

_Z8call_foov:
.LFB2:
        subl    $26, %esp
.LCFI4:
        pushw   %ax
.LCFI5:
.LCFI6:
        call    _Z3foo1X
        addl    $28, %esp
        ret

which is correct, but I wonder where we figure out to use %ax as source
for the tmp X.  Also no RTL optimizer sees that the pushw %ax can be
safely combined with the stack adjust before because the contents of %ax
are unknown and we don't care about what value we pass on the stack.

The call expander produces

(insn 9 8 10 1 (parallel [
            (set (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -2 [0xfffffffe])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 10 9 11 1 (set (mem/s:QI (pre_modify:SI (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -2 [0xfffffffe]))) [0 S1 A8])
        (reg:QI 58 [ D.1755 ])) -1 (nil)
    (nil))

(call_insn 11 10 12 1 (call (mem:QI (symbol_ref:SI ("_Z3foo1X") [flags 0x41]
<function_decl 0x40230e00 foo>) [0 S1 A8])
        (const_int 16 [0x10])) -1 (nil)
    (nil)
    (nil))

where it should not use reg 58 as the source to push, but rather
the stack slot we assigned to the D.1755 tmp.
Comment 17 Richard Biener 2005-09-05 10:48:43 UTC
Unfortunately we start with D.1755 allocated to a register.  This may be solved
at the tree level if we fix PR23372.  Or we need to be smarter at allocating
space for D.1755.
Comment 18 Richard Biener 2005-09-05 12:38:06 UTC
Life analysis should figure out, that for

(insn 10 9 11 1 (set (mem/s:QI (pre_modify:SI (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -2 [0xfffffffe]))) [0 S1 A8])
        (reg:QI 58 [ D.1755 ])) -1 (nil)
    (nil))

where it notes that reg:QI 58 is dead after the instruction, never
became life before and so remove the set completely, only preserving
the side-effects 

                       (pre_modify:SI (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -2 [0xfffffffe])))

someone familiar with flow.c should be able to hack this into
mark_used_reg() in a few minutes :)
Comment 19 Steven Bosscher 2007-11-26 14:17:01 UTC
The recent improvements to the dataflow module and ra-conflicts may have fixed this.  Richi, you were the last to look at this bug report.  Can you check if there still is an issue to fix here?
Comment 20 Richard Biener 2007-11-27 09:40:15 UTC
For the testcase in comment #13 we now generate two(!) temporaries:

void barc() ()
{
  struct Foo D.2027;
  struct Foo D.2028;

  D.2027 = {};
  D.2028 = {};
  bar (D.2028);
}

via gimplification of

;; Function void barc() (_Z4barcv)
;; enabled by -tree-original

<<cleanup_point <<< Unknown tree: expr_stmt
  bar (TARGET_EXPR <D.2027, {}>;, <<< Unknown tree: empty_class_expr >>>
;) >>>
>>;

though the generated assembler looks like we cannot do better:

_Z4barcv:
.LFB3:
        subq    $8, %rsp
.LCFI0:
        movb    $0, (%rsp)
        call    _Z3bar3Foo
        addq    $8, %rsp
        ret

_Z4foocv:
.LFB2:
        subq    $24, %rsp
.LCFI1:
        leaq    23(%rsp), %rdi
        call    _Z3fooRK3Foo
        addq    $24, %rsp
        ret

On the tree level the first simple DSE pass gets rid of the extra temporary
and its initialization.
Comment 21 Jason Merrill 2019-03-05 22:47:33 UTC
Created attachment 45900 [details]
fix

Patch waiting for stage 1.
Comment 22 Jason Merrill 2019-05-22 21:39:43 UTC
Author: jason
Date: Wed May 22 21:39:08 2019
New Revision: 271523

URL: https://gcc.gnu.org/viewcvs?rev=271523&root=gcc&view=rev
Log:
	PR c++/20408 - unnecessary code for empty struct.

Here initializing the argument from a TARGET_EXPR isn't an empty class
copy even though the type is !TREE_ADDRESSABLE, so we should check
simple_empty_class_p.

	* call.c (build_call_a): Use simple_empty_class_p.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/empty-3.C
Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/call.c
    trunk/gcc/cp/cp-gimplify.c
    trunk/gcc/cp/cp-tree.h
Comment 23 Jason Merrill 2020-12-09 13:53:09 UTC
Fixed in GCC 10.
Comment 24 GCC Commits 2021-02-09 01:52:39 UTC
The master branch has been updated by Jason Merrill <jason@gcc.gnu.org>:

https://gcc.gnu.org/g:bdbca69e0720fa9062fe71782235141f629ae006

commit r11-7142-gbdbca69e0720fa9062fe71782235141f629ae006
Author: Jason Merrill <jason@redhat.com>
Date:   Mon Feb 8 17:04:03 2021 -0500

    c++: generic lambda, fn* conv, empty class [PR98326]
    
    Here, in the thunk returned from the captureless lambda conversion to
    pointer-to-function, we try to pass through invisible reference parameters
    by reference, without doing a copy.  The empty class copy optimization was
    messing that up.
    
    gcc/cp/ChangeLog:
    
            PR c++/98326
            PR c++/20408
            * cp-gimplify.c (simple_empty_class_p): Don't touch an invisiref
            parm.
    
    gcc/testsuite/ChangeLog:
    
            PR c++/98326
            * g++.dg/cpp1y/lambda-generic-empty1.C: New test.
Comment 25 GCC Commits 2021-02-12 02:30:16 UTC
The releases/gcc-10 branch has been updated by Jason Merrill <jason@gcc.gnu.org>:

https://gcc.gnu.org/g:031e97207463710797625382baff112b6c3ade51

commit r10-9362-g031e97207463710797625382baff112b6c3ade51
Author: Jason Merrill <jason@redhat.com>
Date:   Mon Feb 8 17:04:03 2021 -0500

    c++: generic lambda, fn* conv, empty class [PR98326]
    
    Here, in the thunk returned from the captureless lambda conversion to
    pointer-to-function, we try to pass through invisible reference parameters
    by reference, without doing a copy.  The empty class copy optimization was
    messing that up.
    
    gcc/cp/ChangeLog:
    
            PR c++/98326
            PR c++/20408
            * cp-gimplify.c (simple_empty_class_p): Don't touch an invisiref
            parm.
    
    gcc/testsuite/ChangeLog:
    
            PR c++/98326
            * g++.dg/cpp1y/lambda-generic-empty1.C: New test.